Documentation
¶
Overview ¶
Package local provides a client for local LLM models via a Python sidecar process.
The local provider communicates with a Python sidecar via JSON-RPC 2.0 over stdio. This enables llmkit to work with local model backends like Ollama, llama.cpp, vLLM, and HuggingFace transformers without requiring direct Go bindings.
Architecture ¶
The local client manages a long-running Python sidecar process:
Go Client <--JSON-RPC/stdio--> Python Sidecar <--Backend API--> Local Model
Supported Backends ¶
- ollama: Ollama API server
- llama.cpp: llama.cpp server
- vllm: vLLM server
- transformers: HuggingFace transformers
Usage ¶
client, err := provider.New("local", provider.Config{
Model: "llama3.2:latest",
Options: map[string]any{
"backend": "ollama",
"sidecar_path": "/path/to/sidecar.py",
},
})
Package local provides a client for local LLM models via a Python sidecar process.
The local provider communicates with a Python sidecar via JSON-RPC 2.0 over stdio. This enables llmkit to work with local model backends like Ollama, llama.cpp, vLLM, and HuggingFace transformers without requiring direct Go bindings.
Architecture ¶
The local client manages a long-running Python sidecar process:
Go Client <--JSON-RPC/stdio--> Python Sidecar <--Backend API--> Local Model
The sidecar process is started lazily on the first request and is kept running for subsequent requests. The sidecar handles all communication with the local model backend and can optionally connect to MCP servers.
Supported Backends ¶
- ollama: Ollama API server (default host: localhost:11434)
- llama.cpp: llama.cpp server (default host: localhost:8000)
- vllm: vLLM server (default host: localhost:8000)
- transformers: HuggingFace transformers (runs in-process in sidecar)
JSON-RPC Protocol ¶
Communication uses JSON-RPC 2.0 over stdio with newline-delimited messages:
Request (client -> sidecar):
{"jsonrpc": "2.0", "method": "complete", "params": {...}, "id": 1}
Response (sidecar -> client):
{"jsonrpc": "2.0", "result": {"content": "...", "usage": {...}}, "id": 1}
Streaming uses notifications (no ID, no response expected):
{"jsonrpc": "2.0", "method": "stream.chunk", "params": {"content": "...", "done": false}}
{"jsonrpc": "2.0", "method": "stream.done", "params": {"usage": {...}}}
Usage ¶
Using the provider registry:
import _ "github.com/randalmurphal/llmkit/local"
client, err := provider.New("local", provider.Config{
Model: "llama3.2:latest",
Options: map[string]any{
"backend": "ollama",
"sidecar_path": "/path/to/sidecar.py",
},
})
if err != nil {
log.Fatal(err)
}
defer client.Close()
Direct instantiation:
client := local.NewClient(
local.WithBackend(local.BackendOllama),
local.WithSidecarPath("/path/to/sidecar.py"),
local.WithModel("llama3.2:latest"),
)
defer client.Close()
resp, err := client.Complete(ctx, provider.Request{
Messages: []provider.Message{
{Role: provider.RoleUser, Content: "Hello!"},
},
})
Capabilities ¶
The local provider has the following capabilities:
- Streaming: true (via JSON-RPC notifications)
- Tools: false (local models don't have native tool support)
- MCP: true (the sidecar can connect to MCP servers)
- Sessions: false (no persistent sessions)
- Images: false (multimodal not currently supported)
- NativeTools: none
Sidecar Implementation ¶
The Python sidecar script must implement the following RPC methods:
- init: Initialize the backend connection
- complete: Perform a completion request
- shutdown: Clean shutdown
For streaming, the sidecar should send stream.chunk and stream.done notifications. See the protocol.go file for detailed message formats.
Index ¶
- Constants
- func IsNotification(data []byte) bool
- type Backend
- type Client
- func (c *Client) Capabilities() provider.Capabilities
- func (c *Client) Close() error
- func (c *Client) Complete(ctx context.Context, req provider.Request) (*provider.Response, error)
- func (c *Client) Provider() string
- func (c *Client) Stream(ctx context.Context, req provider.Request) (<-chan provider.StreamChunk, error)
- type CompleteParams
- type CompleteResult
- type Config
- type InitParams
- type InitResult
- type MCPServerConfig
- type MessageParam
- type Notification
- type Option
- func WithBackend(backend Backend) Option
- func WithEnv(env map[string]string) Option
- func WithHost(host string) Option
- func WithMCPServers(servers map[string]MCPServerConfig) Option
- func WithModel(model string) Option
- func WithPythonPath(path string) Option
- func WithRequestTimeout(d time.Duration) Option
- func WithSidecarPath(path string) Option
- func WithStartupTimeout(d time.Duration) Option
- func WithWorkDir(dir string) Option
- type Protocol
- type RPCError
- type Request
- type Response
- type ShutdownResult
- type Sidecar
- type StreamChunkParams
- type StreamDoneParams
- type UsageResult
Constants ¶
const ( CodeParseError = -32700 CodeInvalidRequest = -32600 CodeMethodNotFound = -32601 CodeInvalidParams = -32602 CodeInternalError = -32603 )
Standard JSON-RPC 2.0 error codes.
const ( CodeBackendError = -32000 // Backend API error CodeModelNotFound = -32001 // Model not found/loaded CodeStreamError = -32002 // Streaming error CodeConnectionError = -32003 // Backend connection failed )
Application-specific error codes (range -32000 to -32099).
Variables ¶
This section is empty.
Functions ¶
func IsNotification ¶
IsNotification checks if a message is a notification (no ID).
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client implements provider.Client for local LLM models via a Python sidecar.
func NewClient ¶
NewClient creates a new local model client. The sidecar process is not started until the first request.
func NewClientWithConfig ¶
NewClientWithConfig creates a new local model client from a Config.
func (*Client) Capabilities ¶
func (c *Client) Capabilities() provider.Capabilities
Capabilities implements provider.Client.
type CompleteParams ¶
type CompleteParams struct {
Messages []MessageParam `json:"messages"`
Model string `json:"model,omitempty"`
SystemPrompt string `json:"system_prompt,omitempty"`
MaxTokens int `json:"max_tokens,omitempty"`
Temperature float64 `json:"temperature,omitempty"`
Stream bool `json:"stream,omitempty"`
Options map[string]any `json:"options,omitempty"`
}
CompleteParams are the parameters for the "complete" RPC method.
type CompleteResult ¶
type CompleteResult struct {
Content string `json:"content"`
Model string `json:"model,omitempty"`
FinishReason string `json:"finish_reason,omitempty"`
Usage UsageResult `json:"usage"`
}
CompleteResult is the result of a "complete" RPC call.
type Config ¶
type Config struct {
// Backend specifies which local model backend to use.
// Required. One of: "ollama", "llama.cpp", "vllm", "transformers"
Backend Backend `json:"backend" yaml:"backend"`
// SidecarPath is the path to the Python sidecar script.
// Required.
SidecarPath string `json:"sidecar_path" yaml:"sidecar_path"`
// Model is the local model name to use.
// Format depends on backend (e.g., "llama3.2:latest" for Ollama).
Model string `json:"model" yaml:"model"`
// Host is the API server address for applicable backends.
// Default: "localhost:11434" for Ollama, "localhost:8000" for vLLM.
Host string `json:"host" yaml:"host"`
// PythonPath is the path to the Python interpreter.
// Default: "python3"
PythonPath string `json:"python_path" yaml:"python_path"`
// StartupTimeout is how long to wait for sidecar to become ready.
// Default: 30 seconds.
StartupTimeout time.Duration `json:"startup_timeout" yaml:"startup_timeout"`
// RequestTimeout is the default timeout for completion requests.
// Default: 5 minutes.
RequestTimeout time.Duration `json:"request_timeout" yaml:"request_timeout"`
// WorkDir is the working directory for the sidecar process.
WorkDir string `json:"work_dir" yaml:"work_dir"`
// Env provides additional environment variables for the sidecar.
Env map[string]string `json:"env" yaml:"env"`
// MCPServers configures MCP servers to pass through to the sidecar.
// The sidecar is responsible for connecting to MCP servers.
MCPServers map[string]MCPServerConfig `json:"mcp_servers" yaml:"mcp_servers"`
}
Config holds local provider configuration.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a Config with sensible defaults.
func (Config) WithDefaults ¶
WithDefaults returns a copy of the config with defaults applied for unset fields.
type InitParams ¶
type InitParams struct {
Backend string `json:"backend"`
Model string `json:"model"`
Host string `json:"host,omitempty"`
MCPServers map[string]MCPServerConfig `json:"mcp_servers,omitempty"`
Options map[string]any `json:"options,omitempty"`
}
InitParams are the parameters for the "init" RPC method.
type InitResult ¶
type InitResult struct {
Ready bool `json:"ready"`
Version string `json:"version,omitempty"`
Message string `json:"message,omitempty"`
}
InitResult is the result of an "init" RPC call.
type MCPServerConfig ¶
type MCPServerConfig struct {
Type string `json:"type"` // "stdio", "http", "sse"
Command string `json:"command,omitempty"` // For stdio transport
Args []string `json:"args,omitempty"`
Env map[string]string `json:"env,omitempty"`
URL string `json:"url,omitempty"` // For http/sse transport
Headers []string `json:"headers,omitempty"`
}
MCPServerConfig defines an MCP server to enable.
type MessageParam ¶
type MessageParam struct {
Role string `json:"role"`
Content string `json:"content"`
Name string `json:"name,omitempty"` // For tool results
}
MessageParam is a message in the conversation.
type Notification ¶
type Notification struct {
JSONRPC string `json:"jsonrpc"`
Method string `json:"method"`
Params any `json:"params,omitempty"`
}
Notification is a JSON-RPC 2.0 notification (no ID, no response expected).
func ParseNotification ¶
func ParseNotification(data []byte) (*Notification, error)
ParseNotification attempts to parse a message as a notification. Returns nil, nil if the message is not a notification.
type Option ¶
type Option func(*Client)
Option configures a local Client.
func WithMCPServers ¶
func WithMCPServers(servers map[string]MCPServerConfig) Option
WithMCPServers configures MCP servers for the sidecar.
func WithPythonPath ¶
WithPythonPath sets the Python interpreter path.
func WithRequestTimeout ¶
WithRequestTimeout sets the default request timeout.
func WithSidecarPath ¶
WithSidecarPath sets the path to the sidecar script.
func WithStartupTimeout ¶
WithStartupTimeout sets the sidecar startup timeout.
func WithWorkDir ¶
WithWorkDir sets the working directory for the sidecar.
type Protocol ¶
type Protocol struct {
// contains filtered or unexported fields
}
Protocol handles JSON-RPC encoding/decoding over stdio.
func NewProtocol ¶
NewProtocol creates a new JSON-RPC protocol handler.
func (*Protocol) Call ¶
Call sends a request and waits for a response. The result is unmarshaled into the provided value. This method is safe for concurrent use, but callers waiting for responses will be serialized.
func (*Protocol) ReadMessage ¶
ReadMessage reads a single message (response or notification). Returns the raw JSON for further processing. This method is safe for concurrent use.
type RPCError ¶
type RPCError struct {
Code int `json:"code"`
Message string `json:"message"`
Data json.RawMessage `json:"data,omitempty"`
}
RPCError is a JSON-RPC 2.0 error object.
type Request ¶
type Request struct {
JSONRPC string `json:"jsonrpc"`
Method string `json:"method"`
Params any `json:"params,omitempty"`
ID int64 `json:"id"`
}
Request is a JSON-RPC 2.0 request.
type Response ¶
type Response struct {
JSONRPC string `json:"jsonrpc"`
Result json.RawMessage `json:"result,omitempty"`
Error *RPCError `json:"error,omitempty"`
ID int64 `json:"id"`
}
Response is a JSON-RPC 2.0 response.
type ShutdownResult ¶
type ShutdownResult struct {
Success bool `json:"success"`
Message string `json:"message,omitempty"`
}
ShutdownResult is the result of a "shutdown" RPC call.
type Sidecar ¶
type Sidecar struct {
// contains filtered or unexported fields
}
Sidecar manages the Python sidecar process lifecycle.
func (*Sidecar) Done ¶
func (s *Sidecar) Done() <-chan struct{}
Done returns a channel that's closed when the sidecar process exits.
func (*Sidecar) Protocol ¶
Protocol returns the JSON-RPC protocol handler. Returns nil if the sidecar is not running.
type StreamChunkParams ¶
StreamChunkParams are the parameters for stream.chunk notifications.
func ParseStreamChunk ¶
func ParseStreamChunk(data json.RawMessage) (*StreamChunkParams, error)
ParseStreamChunk parses a stream.chunk notification payload.
type StreamDoneParams ¶
type StreamDoneParams struct {
Usage UsageResult `json:"usage"`
FinishReason string `json:"finish_reason,omitempty"`
Model string `json:"model,omitempty"`
}
StreamDoneParams are the parameters for stream.done notifications.
func ParseStreamDone ¶
func ParseStreamDone(data json.RawMessage) (*StreamDoneParams, error)
ParseStreamDone parses a stream.done notification payload.
type UsageResult ¶
type UsageResult struct {
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
TotalTokens int `json:"total_tokens"`
}
UsageResult tracks token usage.