local

package
v1.4.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package local provides a client for local LLM models via a Python sidecar process.

The local provider communicates with a Python sidecar via JSON-RPC 2.0 over stdio. This enables llmkit to work with local model backends like Ollama, llama.cpp, vLLM, and HuggingFace transformers without requiring direct Go bindings.

Architecture

The local client manages a long-running Python sidecar process:

Go Client <--JSON-RPC/stdio--> Python Sidecar <--Backend API--> Local Model

Supported Backends

  • ollama: Ollama API server
  • llama.cpp: llama.cpp server
  • vllm: vLLM server
  • transformers: HuggingFace transformers

Usage

client, err := provider.New("local", provider.Config{
    Model: "llama3.2:latest",
    Options: map[string]any{
        "backend":      "ollama",
        "sidecar_path": "/path/to/sidecar.py",
    },
})

Package local provides a client for local LLM models via a Python sidecar process.

The local provider communicates with a Python sidecar via JSON-RPC 2.0 over stdio. This enables llmkit to work with local model backends like Ollama, llama.cpp, vLLM, and HuggingFace transformers without requiring direct Go bindings.

Architecture

The local client manages a long-running Python sidecar process:

Go Client <--JSON-RPC/stdio--> Python Sidecar <--Backend API--> Local Model

The sidecar process is started lazily on the first request and is kept running for subsequent requests. The sidecar handles all communication with the local model backend and can optionally connect to MCP servers.

Supported Backends

  • ollama: Ollama API server (default host: localhost:11434)
  • llama.cpp: llama.cpp server (default host: localhost:8000)
  • vllm: vLLM server (default host: localhost:8000)
  • transformers: HuggingFace transformers (runs in-process in sidecar)

JSON-RPC Protocol

Communication uses JSON-RPC 2.0 over stdio with newline-delimited messages:

Request (client -> sidecar):

{"jsonrpc": "2.0", "method": "complete", "params": {...}, "id": 1}

Response (sidecar -> client):

{"jsonrpc": "2.0", "result": {"content": "...", "usage": {...}}, "id": 1}

Streaming uses notifications (no ID, no response expected):

{"jsonrpc": "2.0", "method": "stream.chunk", "params": {"content": "...", "done": false}}
{"jsonrpc": "2.0", "method": "stream.done", "params": {"usage": {...}}}

Usage

Using the provider registry:

import _ "github.com/randalmurphal/llmkit/local"

client, err := provider.New("local", provider.Config{
    Model: "llama3.2:latest",
    Options: map[string]any{
        "backend":      "ollama",
        "sidecar_path": "/path/to/sidecar.py",
    },
})
if err != nil {
    log.Fatal(err)
}
defer client.Close()

Direct instantiation:

client := local.NewClient(
    local.WithBackend(local.BackendOllama),
    local.WithSidecarPath("/path/to/sidecar.py"),
    local.WithModel("llama3.2:latest"),
)
defer client.Close()

resp, err := client.Complete(ctx, provider.Request{
    Messages: []provider.Message{
        {Role: provider.RoleUser, Content: "Hello!"},
    },
})

Capabilities

The local provider has the following capabilities:

  • Streaming: true (via JSON-RPC notifications)
  • Tools: false (local models don't have native tool support)
  • MCP: true (the sidecar can connect to MCP servers)
  • Sessions: false (no persistent sessions)
  • Images: false (multimodal not currently supported)
  • NativeTools: none

Sidecar Implementation

The Python sidecar script must implement the following RPC methods:

  • init: Initialize the backend connection
  • complete: Perform a completion request
  • shutdown: Clean shutdown

For streaming, the sidecar should send stream.chunk and stream.done notifications. See the protocol.go file for detailed message formats.

Index

Constants

View Source
const (
	CodeParseError     = -32700
	CodeInvalidRequest = -32600
	CodeMethodNotFound = -32601
	CodeInvalidParams  = -32602
	CodeInternalError  = -32603
)

Standard JSON-RPC 2.0 error codes.

View Source
const (
	CodeBackendError    = -32000 // Backend API error
	CodeModelNotFound   = -32001 // Model not found/loaded
	CodeStreamError     = -32002 // Streaming error
	CodeConnectionError = -32003 // Backend connection failed
)

Application-specific error codes (range -32000 to -32099).

Variables

This section is empty.

Functions

func IsNotification

func IsNotification(data []byte) bool

IsNotification checks if a message is a notification (no ID).

Types

type Backend

type Backend string

Backend identifies the local model backend.

const (
	BackendOllama       Backend = "ollama"
	BackendLlamaCpp     Backend = "llama.cpp"
	BackendVLLM         Backend = "vllm"
	BackendTransformers Backend = "transformers"
)

Supported backends.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client implements provider.Client for local LLM models via a Python sidecar.

func NewClient

func NewClient(opts ...Option) *Client

NewClient creates a new local model client. The sidecar process is not started until the first request.

func NewClientWithConfig

func NewClientWithConfig(cfg Config) *Client

NewClientWithConfig creates a new local model client from a Config.

func (*Client) Capabilities

func (c *Client) Capabilities() provider.Capabilities

Capabilities implements provider.Client.

func (*Client) Close

func (c *Client) Close() error

Close implements provider.Client. Stops the sidecar process if running.

func (*Client) Complete

func (c *Client) Complete(ctx context.Context, req provider.Request) (*provider.Response, error)

Complete implements provider.Client. Starts the sidecar if not already running.

func (*Client) Provider

func (c *Client) Provider() string

Provider implements provider.Client.

func (*Client) Stream

func (c *Client) Stream(ctx context.Context, req provider.Request) (<-chan provider.StreamChunk, error)

Stream implements provider.Client. Starts the sidecar if not already running.

type CompleteParams

type CompleteParams struct {
	Messages     []MessageParam `json:"messages"`
	Model        string         `json:"model,omitempty"`
	SystemPrompt string         `json:"system_prompt,omitempty"`
	MaxTokens    int            `json:"max_tokens,omitempty"`
	Temperature  float64        `json:"temperature,omitempty"`
	Stream       bool           `json:"stream,omitempty"`
	Options      map[string]any `json:"options,omitempty"`
}

CompleteParams are the parameters for the "complete" RPC method.

type CompleteResult

type CompleteResult struct {
	Content      string      `json:"content"`
	Model        string      `json:"model,omitempty"`
	FinishReason string      `json:"finish_reason,omitempty"`
	Usage        UsageResult `json:"usage"`
}

CompleteResult is the result of a "complete" RPC call.

type Config

type Config struct {
	// Backend specifies which local model backend to use.
	// Required. One of: "ollama", "llama.cpp", "vllm", "transformers"
	Backend Backend `json:"backend" yaml:"backend"`

	// SidecarPath is the path to the Python sidecar script.
	// Required.
	SidecarPath string `json:"sidecar_path" yaml:"sidecar_path"`

	// Model is the local model name to use.
	// Format depends on backend (e.g., "llama3.2:latest" for Ollama).
	Model string `json:"model" yaml:"model"`

	// Host is the API server address for applicable backends.
	// Default: "localhost:11434" for Ollama, "localhost:8000" for vLLM.
	Host string `json:"host" yaml:"host"`

	// PythonPath is the path to the Python interpreter.
	// Default: "python3"
	PythonPath string `json:"python_path" yaml:"python_path"`

	// StartupTimeout is how long to wait for sidecar to become ready.
	// Default: 30 seconds.
	StartupTimeout time.Duration `json:"startup_timeout" yaml:"startup_timeout"`

	// RequestTimeout is the default timeout for completion requests.
	// Default: 5 minutes.
	RequestTimeout time.Duration `json:"request_timeout" yaml:"request_timeout"`

	// WorkDir is the working directory for the sidecar process.
	WorkDir string `json:"work_dir" yaml:"work_dir"`

	// Env provides additional environment variables for the sidecar.
	Env map[string]string `json:"env" yaml:"env"`

	// MCPServers configures MCP servers to pass through to the sidecar.
	// The sidecar is responsible for connecting to MCP servers.
	MCPServers map[string]MCPServerConfig `json:"mcp_servers" yaml:"mcp_servers"`
}

Config holds local provider configuration.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a Config with sensible defaults.

func (*Config) Validate

func (c *Config) Validate() error

Validate checks if the configuration is valid.

func (Config) WithDefaults

func (c Config) WithDefaults() Config

WithDefaults returns a copy of the config with defaults applied for unset fields.

type InitParams

type InitParams struct {
	Backend    string                     `json:"backend"`
	Model      string                     `json:"model"`
	Host       string                     `json:"host,omitempty"`
	MCPServers map[string]MCPServerConfig `json:"mcp_servers,omitempty"`
	Options    map[string]any             `json:"options,omitempty"`
}

InitParams are the parameters for the "init" RPC method.

type InitResult

type InitResult struct {
	Ready   bool   `json:"ready"`
	Version string `json:"version,omitempty"`
	Message string `json:"message,omitempty"`
}

InitResult is the result of an "init" RPC call.

type MCPServerConfig

type MCPServerConfig struct {
	Type    string            `json:"type"`              // "stdio", "http", "sse"
	Command string            `json:"command,omitempty"` // For stdio transport
	Args    []string          `json:"args,omitempty"`
	Env     map[string]string `json:"env,omitempty"`
	URL     string            `json:"url,omitempty"` // For http/sse transport
	Headers []string          `json:"headers,omitempty"`
}

MCPServerConfig defines an MCP server to enable.

type MessageParam

type MessageParam struct {
	Role    string `json:"role"`
	Content string `json:"content"`
	Name    string `json:"name,omitempty"` // For tool results
}

MessageParam is a message in the conversation.

type Notification

type Notification struct {
	JSONRPC string `json:"jsonrpc"`
	Method  string `json:"method"`
	Params  any    `json:"params,omitempty"`
}

Notification is a JSON-RPC 2.0 notification (no ID, no response expected).

func ParseNotification

func ParseNotification(data []byte) (*Notification, error)

ParseNotification attempts to parse a message as a notification. Returns nil, nil if the message is not a notification.

type Option

type Option func(*Client)

Option configures a local Client.

func WithBackend

func WithBackend(backend Backend) Option

WithBackend sets the backend type.

func WithEnv

func WithEnv(env map[string]string) Option

WithEnv adds environment variables for the sidecar process.

func WithHost

func WithHost(host string) Option

WithHost sets the backend API server address.

func WithMCPServers

func WithMCPServers(servers map[string]MCPServerConfig) Option

WithMCPServers configures MCP servers for the sidecar.

func WithModel

func WithModel(model string) Option

WithModel sets the model name.

func WithPythonPath

func WithPythonPath(path string) Option

WithPythonPath sets the Python interpreter path.

func WithRequestTimeout

func WithRequestTimeout(d time.Duration) Option

WithRequestTimeout sets the default request timeout.

func WithSidecarPath

func WithSidecarPath(path string) Option

WithSidecarPath sets the path to the sidecar script.

func WithStartupTimeout

func WithStartupTimeout(d time.Duration) Option

WithStartupTimeout sets the sidecar startup timeout.

func WithWorkDir

func WithWorkDir(dir string) Option

WithWorkDir sets the working directory for the sidecar.

type Protocol

type Protocol struct {
	// contains filtered or unexported fields
}

Protocol handles JSON-RPC encoding/decoding over stdio.

func NewProtocol

func NewProtocol(r io.Reader, w io.Writer) *Protocol

NewProtocol creates a new JSON-RPC protocol handler.

func (*Protocol) Call

func (p *Protocol) Call(method string, params, result any) error

Call sends a request and waits for a response. The result is unmarshaled into the provided value. This method is safe for concurrent use, but callers waiting for responses will be serialized.

func (*Protocol) Notify

func (p *Protocol) Notify(method string, params any) error

Notify sends a notification (no response expected).

func (*Protocol) ReadMessage

func (p *Protocol) ReadMessage() ([]byte, error)

ReadMessage reads a single message (response or notification). Returns the raw JSON for further processing. This method is safe for concurrent use.

type RPCError

type RPCError struct {
	Code    int             `json:"code"`
	Message string          `json:"message"`
	Data    json.RawMessage `json:"data,omitempty"`
}

RPCError is a JSON-RPC 2.0 error object.

func (*RPCError) Error

func (e *RPCError) Error() string

Error implements the error interface.

type Request

type Request struct {
	JSONRPC string `json:"jsonrpc"`
	Method  string `json:"method"`
	Params  any    `json:"params,omitempty"`
	ID      int64  `json:"id"`
}

Request is a JSON-RPC 2.0 request.

type Response

type Response struct {
	JSONRPC string          `json:"jsonrpc"`
	Result  json.RawMessage `json:"result,omitempty"`
	Error   *RPCError       `json:"error,omitempty"`
	ID      int64           `json:"id"`
}

Response is a JSON-RPC 2.0 response.

type ShutdownResult

type ShutdownResult struct {
	Success bool   `json:"success"`
	Message string `json:"message,omitempty"`
}

ShutdownResult is the result of a "shutdown" RPC call.

type Sidecar

type Sidecar struct {
	// contains filtered or unexported fields
}

Sidecar manages the Python sidecar process lifecycle.

func NewSidecar

func NewSidecar(cfg Config) *Sidecar

NewSidecar creates a new sidecar manager.

func (*Sidecar) Done

func (s *Sidecar) Done() <-chan struct{}

Done returns a channel that's closed when the sidecar process exits.

func (*Sidecar) ExitError

func (s *Sidecar) ExitError() error

ExitError returns the error from the sidecar process exit, if any.

func (*Sidecar) IsRunning

func (s *Sidecar) IsRunning() bool

IsRunning returns true if the sidecar is running.

func (*Sidecar) Protocol

func (s *Sidecar) Protocol() *Protocol

Protocol returns the JSON-RPC protocol handler. Returns nil if the sidecar is not running.

func (*Sidecar) Restart

func (s *Sidecar) Restart(ctx context.Context) error

Restart stops and starts the sidecar.

func (*Sidecar) Start

func (s *Sidecar) Start(ctx context.Context) error

Start launches the sidecar process and waits for it to become ready. Returns an error if the process fails to start or doesn't become ready within the configured timeout.

func (*Sidecar) Stop

func (s *Sidecar) Stop() error

Stop gracefully shuts down the sidecar process.

type StreamChunkParams

type StreamChunkParams struct {
	Content string `json:"content"`
	Done    bool   `json:"done"`
}

StreamChunkParams are the parameters for stream.chunk notifications.

func ParseStreamChunk

func ParseStreamChunk(data json.RawMessage) (*StreamChunkParams, error)

ParseStreamChunk parses a stream.chunk notification payload.

type StreamDoneParams

type StreamDoneParams struct {
	Usage        UsageResult `json:"usage"`
	FinishReason string      `json:"finish_reason,omitempty"`
	Model        string      `json:"model,omitempty"`
}

StreamDoneParams are the parameters for stream.done notifications.

func ParseStreamDone

func ParseStreamDone(data json.RawMessage) (*StreamDoneParams, error)

ParseStreamDone parses a stream.done notification payload.

type UsageResult

type UsageResult struct {
	InputTokens  int `json:"input_tokens"`
	OutputTokens int `json:"output_tokens"`
	TotalTokens  int `json:"total_tokens"`
}

UsageResult tracks token usage.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL