English | 中文
Tokilake & Tokiame
Control your own GPUs like OpenRouter.
Tokilake is a decentralized Large Language Model (LLM) API scheduling gateway built on the One-API ecosystem. It completely flips the traditional API gateway model: instead of the gateway strictly acting as a client that actively requests servers with public IPs, it allows any GPU worker node (Tokiame) located behind NAT/Intranets to actively connect to the central gateway (Hub) via a reverse WebSocket tunnel.
Tokilake is built on top of MartialBE/one-hub and the broader One-API ecosystem that evolved around it.
📖 Quick Start
You can visit the Tokilake Demo to explore the core features. For total data privacy and distribution control, we highly recommend self-hosting following our End-to-End Deployment Guide. Whether you're a first-time user or ready for a full deployment, the guides below are here to help.
🌟 Core Concept
Traditional API proxies typically act as clients, routing requests to servers with public IP addresses. If your high-performance GPUs (like an RTX 4090) are sitting quietly on a local home network, or scattered across temporary Spot Instances from various cloud providers, unifying them into a stable, accessible API is a major challenge.
Tokiame changes the game. Operating as a lightweight daemon, it actively "dials out" to connect to the cloud-based Tokilake gateway. Upon a successful connection, Tokilake seamlessly maps the worker internally to a standard Channel. This means you don't need any tricky intranet penetration tools (like FRP or Ngrok). You get to enjoy the gateway's enterprise-grade load balancing, high-concurrency traffic shaping, authentication, and billing systems right out of the box.
🚀 Perfect Use Cases
1. Distributed GPU Pooling for Individuals & Studios (NAT Penetration)
Tailor-made for home broadband or campus network environments without public IPs. Just run the Tokiame process locally, and it instantly establishes a tunnel with the cloud gateway. The LLMs you deploy locally using Ollama or vLLM can instantly and securely provide standard OpenAI-compatible API services to the outside world.
2. Hybrid Cloud Orchestration
Purchased scattered GPU instances across different compute platforms (e.g., AWS, AliCloud, AutoDL, RunPod)? Skip the complex SD-WAN setups. Simply attach the Tokiame startup script to your new instances, and they automatically register into the load-balancing pool. When instances are destroyed or shut down, the heartbeat mechanism safely takes the node offline, drastically reducing DevOps overhead.
3. Enterprise Data Privacy & "Bring Your Own Model" (BYOM)
SaaS providers handle the business logic frontend, while clients provide the compute backend. Clients only need to deploy Tokiame within their highly secure private server rooms, initiating a one-way outbound connection to the SaaS gateway. The client's server room exposes absolutely zero inbound ports, yet perfectly completes the business scheduling of private models, satisfying the most stringent security audit requirements.
Built around native Private Group and Invite Code mechanisms. User A hooks up their compute node and generates an invite code; User B redeems the code, gains access to User A's private multi-tenant environment, and invokes the compute power. The gateway handles all centralized billing and authentication, making it effortless to build your very own "OpenRouter."
🛠 Architecture Design
graph TB
subgraph Users ["🌐 API Consumers"]
U1["Apps / SDKs"]
U2["curl / ChatUI"]
end
subgraph Gateway ["☁️ Tokilake Gateway (Hub)"]
GIN["Gin HTTP Server"]
RELAY["Relay Router"]
PROV["Tokiame Provider"]
SM["Session Manager"]
DB[("DB / Channel Table")]
GIN --> RELAY --> PROV
PROV -->|"Lookup Session"| SM
SM -->|"R/W Virtual Channel"| DB
end
subgraph Tunnel ["🔒 Multiplexed Reverse Tunnel"]
direction LR
CTRL["Control Stream<br/>register / heartbeat / models_sync"]
DATA["Data Streams<br/>TunnelRequest ↔ TunnelResponse"]
end
subgraph Workers ["🖥️ Tokiame Edge Nodes (Behind NAT)"]
W1["Tokiame Client A"]
W2["Tokiame Client B"]
B1["Ollama / vLLM<br/>Local GPU"]
B2["SGLang / ComfyUI<br/>Local GPU"]
W1 --> B1
W2 --> B2
end
U1 & U2 -->|"Standard OpenAI HTTP API"| GIN
PROV <-->|"Multiplexed Tunnel"| Tunnel
Tunnel <-->|"Outbound-Only Connection"| W1 & W2
Tokilake (Gateway/Hub Level): The unified ingress for traffic. It receives standard HTTP API requests from end-users and multiplexes them to the corresponding edge nodes.
Tokiame (Node/Worker Level): The lightweight client on the edge. It maintains an ultra-low latency reverse WebSocket tunnel via the xtaci/smux multiplexing protocol.
Simplified Workflow
- The
Tokiame client initiates a WebSocket connection request to Tokilake using a standard user API token.
- Upon successful gateway verification, it automatically creates/binds a virtual
Channel (type=100) in the database and assigns it to a specific Private Group.
- When a user sends an LLM HTTP request through the gateway, the gateway treats it like any normal channel, transparently streaming it to the edge node for processing via the
smux tunnel.
- Relies on real-time heartbeat keepalives. If an edge node loses its connection, the gateway automatically disables its virtual Channel, achieving zero-downtime Failover.
Acknowledgements
Legacy README
Legacy README (historical introduction and compatibility notes)