talker

command module
v0.0.0-...-446f4d4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

README ยถ

talker

Go Reference Go Report Card

A fast, OpenAI-compatible Chat Completion API wrapping local LLM inference using hugot.


๐Ÿ’ก Goal of this project

talker provides a lightweight, entirely local backend that mimics the OpenAI Chat Completion API (POST /v1/chat/completions) and Embeddings API (POST /v1/embeddings). It enables you to point your existing OpenAI-compatible AI applications directly to a local, privacy-preserving server running ONNX-based language models without needing complex Python setups.


๐Ÿ› ๏ธ Installation

To run talker, clone the repository and run it via Go:

git clone https://github.com/siherrmann/talker.git
cd talker
go mod tidy

The server requires:

  • Go 1.25+
  • ONNX-formatted language or embedding models (which can be downloaded automatically!)

๐Ÿš€ Getting Started

Basic Usage

The simplest way to start the API for testing is by using the built-in mock engine. If no model parameters are specified, the server will default to the mock engine, allowing you to test endpoints immediately.

go run main.go

To run with real models and have them download automatically if they are missing:

MODEL_FOLDER=./models CHAT_MODEL=HuggingFaceTB/SmolLM-135M-Instruct EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 PORT=8080 go run main.go
Environment Variables

The API behavior can be configured via environment variables:

MODEL_FOLDER=./models                        # Required for auto-download: The base directory to store models.
CHAT_MODEL=HuggingFaceTB/SmolLM-135M-Instruct # Optional: The Hugging Face repo name for the text generation model.
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5        # Optional: The Hugging Face repo name for the embeddings model.
PORT=8080                                    # Optional: Sets the port for the Echo server (default is 8080).

If neither CHAT_MODEL nor EMBEDDING_MODEL is provided, the mock engine is used.


โญ Features

Local LLM Inference
  • hugot Integration: Native Go inference using the high-performance hugot library (which wraps ONNX runtime).
  • Automatic Downloading: Automatically downloads the requested models from Hugging Face directly into your MODEL_FOLDER on startup.
OpenAI Compatibility
  • Standard Endpoints: Strict implementation of both POST /v1/chat/completions and POST /v1/embeddings.
  • Request/Response Models: Fully conforms to the standard OpenAI request and response schemas.
  • SSE Streaming: Fully supports Server-Sent Events for real-time streaming when stream: true is passed.
  • Strict JSON Enforcement: Supports response_format: {"type": "json_object"} with automatic struct validation via github.com/siherrmann/validator. If the LLM generates invalid JSON, the engine automatically retries up to 3 times, passing the validation errors back to the model as a prompt.
Robust Architecture
  • Echo v5 Framework: Built on top of Echo for rapid and robust HTTP routing.
  • Test-Driven: Designed with a highly mockable architecture.

๐Ÿ–ฅ๏ธ API Interface

API Endpoints
  • POST /v1/chat/completions - Generates chat completions.
  • POST /v1/embeddings - Generates vector embeddings for a given input.

Example request (Non-streaming Chat):

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-model",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example request (Embeddings):

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-embedding-model",
    "input": ["First sentence", "Second sentence"]
  }'

๐Ÿ—๏ธ Architecture

talker is built with:

  • Echo v5 - Fast HTTP framework for Go
  • hugot - Golang wrapper around ONNX Runtime for local inference pipelines

The application follows a clean architecture with:

  • Handlers (handler/): Contains ChatHandler and EmbeddingsHandler for the HTTP lifecycle.
  • Core Engine (core/): Abstracts underlying hugot pipeline calls (HugotEngine). It seamlessly supports TextGenerationPipeline and FeatureExtractionPipeline concurrently.
  • Models (model/): Native Go structs matching the exact schema required by client libraries expecting an OpenAI backend. Includes custom unmarshaling logic for robust handling of dynamic OpenAI fields (e.g., embeddings input as string vs array).

๐Ÿ”ง Development

Prerequisites
  • Go 1.25+
Development Commands

Run the test suite to verify handlers and data parsing logic:

# Run all tests
go test ./...

# Run server with Mock Engine
go run main.go

Documentation ยถ

The Go Gopher

There is no documentation for this package.

Directories ยถ

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL