Tokilake

command module

v0.5.6 Latest Latest Go to latest Published: Mar 21, 2026 License: Apache-2.0 Imports: 34 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Tokimorphling/Tokilake

Links

Open Source Insights

README ¶

English | 中文

Tokilake & Tokiame

Control your own GPUs like OpenRouter.

Tokilake is a decentralized Large Language Model (LLM) API scheduling gateway built on the One-API ecosystem. It completely flips the traditional API gateway model: instead of the gateway strictly acting as a client that actively requests servers with public IPs, it allows any GPU worker node (Tokiame) located behind NAT/Intranets to actively connect to the central gateway (Hub) via a reverse WebSocket tunnel.

Tokilake is built on top of MartialBE/one-hub and the broader One-API ecosystem that evolved around it.

📖 Quick Start

You can visit the Tokilake Demo to explore the core features. For total data privacy and distribution control, we highly recommend self-hosting following our End-to-End Deployment Guide. Whether you're a first-time user or ready for a full deployment, the guides below are here to help.

🌟 Core Concept

Traditional API proxies typically act as clients, routing requests to servers with public IP addresses. If your high-performance GPUs (like an RTX 4090) are sitting quietly on a local home network, or scattered across temporary Spot Instances from various cloud providers, unifying them into a stable, accessible API is a major challenge.

Tokiame changes the game. Operating as a lightweight daemon, it actively "dials out" to connect to the cloud-based Tokilake gateway. Upon a successful connection, Tokilake seamlessly maps the worker internally to a standard Channel. This means you don't need any tricky intranet penetration tools (like FRP or Ngrok). You get to enjoy the gateway's enterprise-grade load balancing, high-concurrency traffic shaping, authentication, and billing systems right out of the box.

🚀 Perfect Use Cases

1. Distributed GPU Pooling for Individuals & Studios (NAT Penetration)

Tailor-made for home broadband or campus network environments without public IPs. Just run the Tokiame process locally, and it instantly establishes a tunnel with the cloud gateway. The LLMs you deploy locally using Ollama or vLLM can instantly and securely provide standard OpenAI-compatible API services to the outside world.

2. Hybrid Cloud Orchestration

Purchased scattered GPU instances across different compute platforms (e.g., AWS, AliCloud, AutoDL, RunPod)? Skip the complex SD-WAN setups. Simply attach the Tokiame startup script to your new instances, and they automatically register into the load-balancing pool. When instances are destroyed or shut down, the heartbeat mechanism safely takes the node offline, drastically reducing DevOps overhead.

3. Enterprise Data Privacy & "Bring Your Own Model" (BYOM)

SaaS providers handle the business logic frontend, while clients provide the compute backend. Clients only need to deploy Tokiame within their highly secure private server rooms, initiating a one-way outbound connection to the SaaS gateway. The client's server room exposes absolutely zero inbound ports, yet perfectly completes the business scheduling of private models, satisfying the most stringent security audit requirements.

Built around native Private Group and Invite Code mechanisms. User A hooks up their compute node and generates an invite code; User B redeems the code, gains access to User A's private multi-tenant environment, and invokes the compute power. The gateway handles all centralized billing and authentication, making it effortless to build your very own "OpenRouter."

🛠 Architecture Design

graph TB
    subgraph Users ["🌐 API Consumers"]
        U1["Apps / SDKs"]
        U2["curl / ChatUI"]
    end

    subgraph Gateway ["☁️ Tokilake Gateway (Hub)"]
        GIN["Gin HTTP Server"]
        RELAY["Relay Router"]
        PROV["Tokiame Provider"]
        SM["Session Manager"]
        DB[("DB / Channel Table")]
        GIN --> RELAY --> PROV
        PROV -->|"Lookup Session"| SM
        SM -->|"R/W Virtual Channel"| DB
    end

    subgraph Tunnel ["🔒 Multiplexed Reverse Tunnel"]
        direction LR
        CTRL["Control Stream<br/>register / heartbeat / models_sync"]
        DATA["Data Streams<br/>TunnelRequest ↔ TunnelResponse"]
    end

    subgraph Workers ["🖥️ Tokiame Edge Nodes (Behind NAT)"]
        W1["Tokiame Client A"]
        W2["Tokiame Client B"]
        B1["Ollama / vLLM<br/>Local GPU"]
        B2["SGLang / ComfyUI<br/>Local GPU"]
        W1 --> B1
        W2 --> B2
    end

    U1 & U2 -->|"Standard OpenAI HTTP API"| GIN
    PROV <-->|"Multiplexed Tunnel"| Tunnel
    Tunnel <-->|"Outbound-Only Connection"| W1 & W2

Tokilake (Gateway/Hub Level): The unified ingress for traffic. It receives standard HTTP API requests from end-users and multiplexes them to the corresponding edge nodes.
Tokiame (Node/Worker Level): The lightweight client on the edge. It maintains an ultra-low latency reverse WebSocket tunnel via the xtaci/smux multiplexing protocol.

Simplified Workflow

The Tokiame client initiates a WebSocket connection request to Tokilake using a standard user API token.
Upon successful gateway verification, it automatically creates/binds a virtual Channel (type=100) in the database and assigns it to a specific Private Group.
When a user sends an LLM HTTP request through the gateway, the gateway treats it like any normal channel, transparently streaming it to the edge node for processing via the smux tunnel.
Relies on real-time heartbeat keepalives. If an edge node loses its connection, the gateway automatically disables its virtual Channel, achieving zero-downtime Failover.

Acknowledgements

songquanpeng/one-api: the architectural foundation of this project.
Calcium-Ion/new-api: reference for some provider integrations and async task patterns.
codedthemes/berry-free-react-admin-template: visual base for the admin frontend.
minimal-ui-kit/material-kit-react: reference for parts of the UI styling.
zeromicro/go-zero: reference for rate limiting and related implementations.

Legacy README

Legacy README (historical introduction and compatibility notes)

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cli
cmd
tokiame command
common
cache
config
database
image
limit
logger
notify
notify/channel
oidc
redis
requester
scheduler
search
search/channel
search/search_type
stmp
storage
storage/drives
telegram
test
test/init
utils
webauthn
controller
check_channel
cron
mcp
tools
tools/available_model
tools/calculator Package calculator 提供了一个基本的计算器工具 Package calculator provides a basic calculator tool	Package calculator 提供了一个基本的计算器工具 Package calculator provides a basic calculator tool
tools/current_time Package current_time 提供了一个获取当前时间的工具 Package current_time provides a tool to get the current time	Package current_time 提供了一个获取当前时间的工具 Package current_time provides a tool to get the current time
tools/dashboard
metrics
middleware
model
payment
gateway/alipay
gateway/epay
gateway/stripe
gateway/wxpay
types
pkg
log
providers
ali
azure
azureSpeech
azure_v1
azuredatabricks
baichuan
baidu
base
bedrock
bedrock/category
bedrock/sigv4
claude
cloudflareAI
cohere
coze
deepseek
gemini
github
groq
hunyuan
jina
kling
lingyi
midjourney
minimax
mistral
moonshot
ollama
openai
openrouter
palm
recraftAI
replicate
siliconflow
stabilityAI
suno
tencent
tokiame
vertexai
vertexai/category
xAI
xunfei
zhipu
relay
midjourney
relay_util
task
task/base
task/kling
task/suno
task/tokiamevideo
router
safty
providers/keyword
types
service
tokilake
types

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL