talker

A fast, OpenAI-compatible Chat Completion API wrapping local LLM inference using hugot.
๐ก Goal of this project
talker provides a lightweight, entirely local backend that mimics the OpenAI Chat Completion API (POST /v1/chat/completions) and Embeddings API (POST /v1/embeddings). It enables you to point your existing OpenAI-compatible AI applications directly to a local, privacy-preserving server running ONNX-based language models without needing complex Python setups.
๐ ๏ธ Installation
To run talker, clone the repository and run it via Go:
git clone https://github.com/siherrmann/talker.git
cd talker
go mod tidy
The server requires:
- Go 1.25+
- ONNX-formatted language or embedding models (which can be downloaded automatically!)
๐ Getting Started
Basic Usage
The simplest way to start the API for testing is by using the built-in mock engine. If no model parameters are specified, the server will default to the mock engine, allowing you to test endpoints immediately.
go run main.go
To run with real models and have them download automatically if they are missing:
MODEL_FOLDER=./models CHAT_MODEL=HuggingFaceTB/SmolLM-135M-Instruct EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 PORT=8080 go run main.go
Environment Variables
The API behavior can be configured via environment variables:
MODEL_FOLDER=./models # Required for auto-download: The base directory to store models.
CHAT_MODEL=HuggingFaceTB/SmolLM-135M-Instruct # Optional: The Hugging Face repo name for the text generation model.
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 # Optional: The Hugging Face repo name for the embeddings model.
PORT=8080 # Optional: Sets the port for the Echo server (default is 8080).
If neither CHAT_MODEL nor EMBEDDING_MODEL is provided, the mock engine is used.
โญ Features
Local LLM Inference
- hugot Integration: Native Go inference using the high-performance
hugot library (which wraps ONNX runtime).
- Automatic Downloading: Automatically downloads the requested models from Hugging Face directly into your
MODEL_FOLDER on startup.
OpenAI Compatibility
- Standard Endpoints: Strict implementation of both
POST /v1/chat/completions and POST /v1/embeddings.
- Request/Response Models: Fully conforms to the standard OpenAI request and response schemas.
- SSE Streaming: Fully supports Server-Sent Events for real-time streaming when
stream: true is passed.
- Strict JSON Enforcement: Supports
response_format: {"type": "json_object"} with automatic struct validation via github.com/siherrmann/validator. If the LLM generates invalid JSON, the engine automatically retries up to 3 times, passing the validation errors back to the model as a prompt.
Robust Architecture
- Echo v5 Framework: Built on top of Echo for rapid and robust HTTP routing.
- Test-Driven: Designed with a highly mockable architecture.
๐ฅ๏ธ API Interface
API Endpoints
POST /v1/chat/completions - Generates chat completions.
POST /v1/embeddings - Generates vector embeddings for a given input.
Example request (Non-streaming Chat):
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
Example request (Embeddings):
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "local-embedding-model",
"input": ["First sentence", "Second sentence"]
}'
๐๏ธ Architecture
talker is built with:
- Echo v5 - Fast HTTP framework for Go
- hugot - Golang wrapper around ONNX Runtime for local inference pipelines
The application follows a clean architecture with:
- Handlers (
handler/): Contains ChatHandler and EmbeddingsHandler for the HTTP lifecycle.
- Core Engine (
core/): Abstracts underlying hugot pipeline calls (HugotEngine). It seamlessly supports TextGenerationPipeline and FeatureExtractionPipeline concurrently.
- Models (
model/): Native Go structs matching the exact schema required by client libraries expecting an OpenAI backend. Includes custom unmarshaling logic for robust handling of dynamic OpenAI fields (e.g., embeddings input as string vs array).
๐ง Development
Prerequisites
Development Commands
Run the test suite to verify handlers and data parsing logic:
# Run all tests
go test ./...
# Run server with Mock Engine
go run main.go