omnillm

package module

v0.16.0 Latest Latest Go to latest Published: May 23, 2026 License: MIT Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/plexusone/omnillm-core

Links

Open Source Insights

README ¶

OmniLLM - Unified Go SDK for Large Language Models

OmniLLM is a unified Go SDK that provides a consistent interface for interacting with multiple Large Language Model (LLM) providers including OpenAI, Anthropic (Claude), Google Gemini, X.AI (Grok), GLM (Zhipu AI), Kimi (Moonshot AI), Qwen (Alibaba Cloud), and Ollama. It implements the Chat Completions API pattern and offers both synchronous and streaming capabilities. Additional providers like AWS Bedrock are available as external modules.

✨ Features

🔌 Multi-Provider Support: OpenAI, Anthropic (Claude), Google Gemini, X.AI (Grok), GLM (Zhipu AI), Kimi (Moonshot AI), Qwen (Alibaba Cloud), Ollama, plus external providers (AWS Bedrock, etc.)
🎯 Unified API: Same interface across all providers
📡 Streaming Support: Real-time response streaming for all providers
🔤 Embeddings API: Text-to-vector conversion for semantic search and RAG workflows
🧠 Conversation Memory: Persistent conversation history using Key-Value Stores
🔀 Fallback Providers: Automatic failover to backup providers when primary fails
⚡ Circuit Breaker: Prevent cascading failures by temporarily skipping unhealthy providers
🔢 Token Estimation: Pre-flight token counting to validate requests before sending
💾 Response Caching: Cache identical requests with configurable TTL to reduce costs
📊 Observability Hooks: Extensible hooks for tracing, logging, and metrics without modifying core library
🔄 Retry with Backoff: Automatic retries for transient failures (rate limits, 5xx errors)
🧪 Comprehensive Testing: Unit tests, integration tests, and mock implementations included
🔧 Extensible: Easy to add new LLM providers
📦 Modular: Provider-specific implementations in separate packages
🏗️ Reference Architecture: Internal providers serve as reference implementations for external providers
🔌 3rd Party Friendly: External providers can be injected without modifying core library
⚡ Type Safe: Full Go type safety with comprehensive error handling

🏗️ Architecture

OmniLLM uses a clean, modular architecture that separates concerns and enables easy extensibility:

omnillm/
├── client.go            # Main ChatClient wrapper
├── providers.go         # Factory functions for built-in providers
├── types.go             # Type aliases for backward compatibility
├── memory.go            # Conversation memory management
├── observability.go     # ObservabilityHook interface for tracing/logging/metrics
├── errors.go            # Unified error handling
├── *_test.go            # Comprehensive unit tests
├── provider/            # 🎯 Public interface package for external providers
│   ├── interface.go     # Provider interface that all providers must implement
│   └── types.go         # Unified request/response types
├── providers/           # 📦 Individual provider packages (reference implementations)
│   ├── openai/          # OpenAI implementation
│   │   ├── openai.go    # HTTP client
│   │   ├── types.go     # OpenAI-specific types
│   │   ├── adapter.go   # provider.Provider implementation
│   │   └── *_test.go    # Provider tests
│   ├── anthropic/       # Anthropic implementation
│   │   ├── anthropic.go # HTTP client (SSE streaming)
│   │   ├── types.go     # Anthropic-specific types
│   │   ├── adapter.go   # provider.Provider implementation
│   │   └── *_test.go    # Provider and integration tests
│   ├── gemini/          # Google Gemini implementation
│   ├── xai/             # X.AI Grok implementation
│   ├── glm/             # Zhipu AI GLM implementation
│   ├── kimi/            # Moonshot AI Kimi implementation
│   ├── qwen/            # Alibaba Cloud Qwen implementation
│   └── ollama/          # Ollama implementation
└── testing/             # 🧪 Test utilities
    └── mock_kvs.go      # Mock KVS for memory testing

Key Architecture Benefits

🎯 Public Interface: The provider package exports the Provider interface that external packages can implement
🏗️ Reference Implementation: Internal providers follow the exact same structure that external providers should use
🔌 Direct Injection: External providers are injected via ClientConfig.CustomProvider without modifying core code
📦 Modular Design: Each provider is self-contained with its own HTTP client, types, and adapter
🧪 Testable: Clean interfaces that can be easily mocked and tested
🔧 Extensible: New providers can be added without touching existing code
⚡ Native Implementation: Uses standard net/http for direct API communication (no official SDK dependencies)

🚀 Quick Start

Installation

go get github.com/plexusone/omnillm-core

Basic Usage

package main

import (
    "context"
    "fmt"
    "log"
    
    "github.com/plexusone/omnillm-core"
)

func main() {
    // Create a client for OpenAI
    client, err := omnillm.NewClient(omnillm.ClientConfig{
        Providers: []omnillm.ProviderConfig{
            {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-openai-api-key"},
        },
    })
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // Create a chat completion request
    response, err := client.CreateChatCompletion(context.Background(), &omnillm.ChatCompletionRequest{
        Model: omnillm.ModelGPT4o,
        Messages: []omnillm.Message{
            {
                Role:    omnillm.RoleUser,
                Content: "Hello! How can you help me today?",
            },
        },
        MaxTokens:   &[]int{150}[0],
        Temperature: &[]float64{0.7}[0],
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Response: %s\n", response.Choices[0].Message.Content)
    fmt.Printf("Tokens used: %d\n", response.Usage.TotalTokens)
}

🔧 Supported Providers

OpenAI

Models: GPT-5, GPT-4.1, GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo
Features: Chat completions, streaming, function calling

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-openai-api-key"},
    },
})

Anthropic (Claude)

Models: Claude-Opus-4.1, Claude-Opus-4, Claude-Sonnet-4, Claude-3.7-Sonnet, Claude-3.5-Haiku, Claude-3-Opus, Claude-3-Sonnet, Claude-3-Haiku
Features: Chat completions, streaming, system message support

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameAnthropic, APIKey: "your-anthropic-api-key"},
    },
})

Google Gemini

Models: Gemini-2.5-Pro, Gemini-2.5-Flash, Gemini-1.5-Pro, Gemini-1.5-Flash
Features: Chat completions, streaming

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameGemini, APIKey: "your-gemini-api-key"},
    },
})

AWS Bedrock (External Provider)

AWS Bedrock is available as an external module to avoid pulling AWS SDK dependencies for users who don't need it.

go get github.com/plexusone/omnillm-bedrock

import (
    "github.com/plexusone/omnillm-core"
    "github.com/plexusone/omnillm-bedrock"
)

// Create the Bedrock provider
bedrockProvider, err := bedrock.NewProvider("us-east-1")
if err != nil {
    log.Fatal(err)
}

// Use it with omnillm via CustomProvider
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {CustomProvider: bedrockProvider},
    },
})

See External Providers for more details.

X.AI (Grok)

Models: Grok-4.1-Fast (Reasoning/Non-Reasoning), Grok-4 (0709), Grok-4-Fast (Reasoning/Non-Reasoning), Grok-Code-Fast, Grok-3, Grok-3-Mini, Grok-2, Grok-2-Vision
Features: Chat completions, streaming, OpenAI-compatible API, 2M context window (4.1/4-Fast models)

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameXAI, APIKey: "your-xai-api-key"},
    },
})

GLM (Zhipu AI)

Models: GLM-5, GLM-4.7, GLM-4.6, GLM-4.5 series
Features: Chat completions, streaming, OpenAI-compatible API, thinking modes, up to 200K context

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameGLM, APIKey: "your-glm-api-key"},
    },
})

Kimi (Moonshot AI)

Models: Kimi K2.5, Kimi K2 series, Moonshot V1 (8k/32k/128k)
Features: Chat completions, streaming, OpenAI-compatible API, up to 256K context, vision support

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameKimi, APIKey: "your-kimi-api-key"},
    },
})

Qwen (Alibaba Cloud)

Models: Qwen3 Max, QwQ, Qwen3.5, Qwen3, Qwen2.5 series
Features: Chat completions, streaming, OpenAI-compatible API, thinking modes, wide global availability

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameQwen, APIKey: "your-qwen-api-key"},
    },
})

Ollama (Local Models)

Models: Llama 3, Mistral, CodeLlama, Gemma, Qwen2.5, DeepSeek-Coder
Features: Local inference, no API keys required, optimized for Apple Silicon

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOllama, BaseURL: "http://localhost:11434"},
    },
})

🔌 External Providers

Some providers with heavy SDK dependencies are available as separate modules to keep the core library lightweight. These are injected via ClientConfig.CustomProvider.

Provider	Module	Why External
AWS Bedrock	github.com/plexusone/omnillm-bedrock	AWS SDK v2 adds 17+ transitive dependencies

Using External Providers

import (
    "github.com/plexusone/omnillm-core"
    "github.com/plexusone/omnillm-bedrock"  // or your custom provider
)

// Create the external provider
bedrockProv, err := bedrock.NewProvider("us-east-1")
if err != nil {
    log.Fatal(err)
}

// Inject via CustomProvider in Providers slice
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {CustomProvider: bedrockProv},
    },
})

Creating Your Own External Provider

External providers implement the provider.Provider interface:

import "github.com/plexusone/omnillm-core/provider"

type MyProvider struct{}

func (p *MyProvider) Name() string { return "myprovider" }
func (p *MyProvider) Close() error { return nil }

func (p *MyProvider) CreateChatCompletion(ctx context.Context, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error) {
    // Your implementation
}

func (p *MyProvider) CreateChatCompletionStream(ctx context.Context, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error) {
    // Your streaming implementation
}

See the omnillm-bedrock source code as a reference implementation.

📡 Streaming Example

stream, err := client.CreateChatCompletionStream(context.Background(), &omnillm.ChatCompletionRequest{
    Model: omnillm.ModelGPT4o,
    Messages: []omnillm.Message{
        {
            Role:    omnillm.RoleUser,
            Content: "Tell me a short story about AI.",
        },
    },
    MaxTokens:   &[]int{200}[0],
    Temperature: &[]float64{0.8}[0],
})
if err != nil {
    log.Fatal(err)
}
defer stream.Close()

fmt.Print("AI Response: ")
for {
    chunk, err := stream.Recv()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }

    if len(chunk.Choices) > 0 && chunk.Choices[0].Delta != nil {
        fmt.Print(chunk.Choices[0].Delta.Content)
    }
}
fmt.Println()

🔤 Embeddings

OmniLLM supports text embeddings for semantic search, similarity matching, and RAG workflows.

import (
    "github.com/plexusone/omnillm-core"
    "github.com/plexusone/omnillm-core/provider"
)

// Get an embedding provider
embeddingProvider, err := omnillm.GetEmbeddingProvider(
    omnillm.ProviderNameOpenAI,
    omnillm.ProviderConfig{APIKey: "your-api-key"},
)
if err != nil {
    log.Fatal(err)
}
defer embeddingProvider.Close()

// Create embeddings
resp, err := embeddingProvider.CreateEmbedding(ctx, &provider.EmbeddingRequest{
    Model: "text-embedding-3-small",
    Input: []string{"Hello world", "How are you?"},
})
if err != nil {
    log.Fatal(err)
}

// Access the vectors
for _, data := range resp.Data {
    fmt.Printf("Index %d: %d dimensions\n", data.Index, len(data.Embedding))
}

Supported Embedding Models

Provider	Models	Dimensions
OpenAI	text-embedding-3-small	512-1536
OpenAI	text-embedding-3-large	256-3072
OpenAI	text-embedding-ada-002	1536

Custom Dimensions

dims := 512
resp, err := embeddingProvider.CreateEmbedding(ctx, &provider.EmbeddingRequest{
    Model:      "text-embedding-3-small",
    Input:      []string{"Hello world"},
    Dimensions: &dims, // Reduce dimensions for storage efficiency
})

🧠 Conversation Memory

OmniLLM supports persistent conversation memory using any Key-Value Store that implements the Sogo KVS interface. This enables multi-turn conversations that persist across application restarts.

Memory Configuration

// Configure memory settings
memoryConfig := omnillm.MemoryConfig{
    MaxMessages: 50,                    // Keep last 50 messages per session
    TTL:         24 * time.Hour,       // Messages expire after 24 hours
    KeyPrefix:   "myapp:conversations", // Custom key prefix
}

// Create client with memory (using Redis, DynamoDB, etc.)
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-api-key"},
    },
    Memory:       kvsClient,          // Your KVS implementation
    MemoryConfig: &memoryConfig,
})

Memory-Aware Completions

// Create a session with system message
err = client.CreateConversationWithSystemMessage(ctx, "user-123", 
    "You are a helpful assistant that remembers our conversation history.")

// Use memory-aware completion - automatically loads conversation history
response, err := client.CreateChatCompletionWithMemory(ctx, "user-123", &omnillm.ChatCompletionRequest{
    Model: omnillm.ModelGPT4o,
    Messages: []omnillm.Message{
        {Role: omnillm.RoleUser, Content: "What did we discuss last time?"},
    },
    MaxTokens: &[]int{200}[0],
})

// The response will include context from previous conversations in this session

Memory Management

// Load conversation history
conversation, err := client.LoadConversation(ctx, "user-123")

// Get just the messages
messages, err := client.GetConversationMessages(ctx, "user-123")

// Manually append messages
err = client.AppendMessage(ctx, "user-123", omnillm.Message{
    Role:    omnillm.RoleUser,
    Content: "Remember this important fact: I prefer JSON responses.",
})

// Delete conversation
err = client.DeleteConversation(ctx, "user-123")

KVS Backend Support

Memory works with any KVS implementation:

Redis: For high-performance, distributed memory
DynamoDB: For AWS-native storage
In-Memory: For testing and development
Custom: Any implementation of the Sogo KVS interface

// Example with Redis (using a hypothetical Redis KVS implementation)
redisKVS := redis.NewKVSClient("localhost:6379")
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-key"},
    },
    Memory: redisKVS,
})

📊 Observability Hooks

OmniLLM supports observability hooks that allow you to add tracing, logging, and metrics to LLM calls without modifying the core library. This is useful for integrating with observability platforms like OpenTelemetry, Datadog, or custom monitoring solutions.

ObservabilityHook Interface

// LLMCallInfo provides metadata about the LLM call
type LLMCallInfo struct {
    CallID       string    // Unique identifier for correlating BeforeRequest/AfterResponse
    ProviderName string    // e.g., "openai", "anthropic"
    StartTime    time.Time // When the call started
}

// ObservabilityHook allows external packages to observe LLM calls
type ObservabilityHook interface {
    // BeforeRequest is called before each LLM call.
    // Returns a new context for trace/span propagation.
    BeforeRequest(ctx context.Context, info LLMCallInfo, req *provider.ChatCompletionRequest) context.Context

    // AfterResponse is called after each LLM call completes (success or failure).
    AfterResponse(ctx context.Context, info LLMCallInfo, req *provider.ChatCompletionRequest, resp *provider.ChatCompletionResponse, err error)

    // WrapStream wraps a stream for observability of streaming responses.
    // Note: AfterResponse is only called if stream creation fails. For streaming
    // completion timing, handle Close() or EOF detection in your wrapper.
    WrapStream(ctx context.Context, info LLMCallInfo, req *provider.ChatCompletionRequest, stream provider.ChatCompletionStream) provider.ChatCompletionStream
}

Basic Usage

// Create a simple logging hook
type LoggingHook struct{}

func (h *LoggingHook) BeforeRequest(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest) context.Context {
    log.Printf("[%s] LLM call started: provider=%s model=%s", info.CallID, info.ProviderName, req.Model)
    return ctx
}

func (h *LoggingHook) AfterResponse(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest, resp *omnillm.ChatCompletionResponse, err error) {
    duration := time.Since(info.StartTime)
    if err != nil {
        log.Printf("[%s] LLM call failed: provider=%s duration=%v error=%v", info.CallID, info.ProviderName, duration, err)
    } else {
        log.Printf("[%s] LLM call completed: provider=%s duration=%v tokens=%d", info.CallID, info.ProviderName, duration, resp.Usage.TotalTokens)
    }
}

func (h *LoggingHook) WrapStream(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest, stream omnillm.ChatCompletionStream) omnillm.ChatCompletionStream {
    return stream // Return unwrapped for simple logging, or wrap for streaming metrics
}

// Use the hook when creating a client
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-api-key"},
    },
    ObservabilityHook: &LoggingHook{},
})

OpenTelemetry Integration Example

type OTelHook struct {
    tracer trace.Tracer
}

func (h *OTelHook) BeforeRequest(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest) context.Context {
    ctx, span := h.tracer.Start(ctx, "llm.chat_completion",
        trace.WithAttributes(
            attribute.String("llm.provider", info.ProviderName),
            attribute.String("llm.model", req.Model),
        ),
    )
    return ctx
}

func (h *OTelHook) AfterResponse(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest, resp *omnillm.ChatCompletionResponse, err error) {
    span := trace.SpanFromContext(ctx)
    defer span.End()

    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
    } else if resp != nil {
        span.SetAttributes(
            attribute.Int("llm.tokens.total", resp.Usage.TotalTokens),
            attribute.Int("llm.tokens.prompt", resp.Usage.PromptTokens),
            attribute.Int("llm.tokens.completion", resp.Usage.CompletionTokens),
        )
    }
}

func (h *OTelHook) WrapStream(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest, stream omnillm.ChatCompletionStream) omnillm.ChatCompletionStream {
    return &observableStream{stream: stream, ctx: ctx, info: info}
}

Key Benefits

Non-Invasive: Add observability without modifying core library code
Provider Agnostic: Works with all LLM providers (OpenAI, Anthropic, Gemini, etc.)
Streaming Support: Wrap streams to observe streaming responses
Context Propagation: Pass trace context through the entire call chain
Flexible: Implement only the methods you need; all are called if the hook is set

🔀 Fallback Providers

OmniLLM supports automatic failover to backup providers when the primary provider fails. Fallback only triggers on retryable errors (rate limits, server errors, network issues) - authentication errors and invalid requests do not trigger fallback.

Basic Usage

// Providers[0] is primary, Providers[1+] are fallbacks
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "openai-key"},       // Primary
        {Provider: omnillm.ProviderNameAnthropic, APIKey: "anthropic-key"}, // Fallback 1
        {Provider: omnillm.ProviderNameGemini, APIKey: "gemini-key"},       // Fallback 2
    },
})

// If OpenAI fails with a retryable error, automatically tries Anthropic, then Gemini
response, err := client.CreateChatCompletion(ctx, request)

With Circuit Breaker

Enable circuit breaker to temporarily skip providers that are failing repeatedly:

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "openai-key"},
        {Provider: omnillm.ProviderNameAnthropic, APIKey: "anthropic-key"},
    },
    CircuitBreakerConfig: &omnillm.CircuitBreakerConfig{
        FailureThreshold: 5,               // Open after 5 consecutive failures
        SuccessThreshold: 2,               // Close after 2 successes in half-open
        Timeout:          30 * time.Second, // Wait before trying again
    },
})

Error Classification

Fallback uses intelligent error classification:

Error Type	Triggers Fallback
Rate limits (429)	✅ Yes
Server errors (5xx)	✅ Yes
Network errors	✅ Yes
Auth errors (401/403)	❌ No
Invalid requests (400)	❌ No

⚡ Circuit Breaker

The circuit breaker pattern prevents cascading failures by temporarily skipping providers that are unhealthy.

States

Closed: Normal operation, requests flow through
Open: Provider is failing, requests skip it immediately
Half-Open: Testing if provider has recovered

Configuration

cbConfig := &omnillm.CircuitBreakerConfig{
    FailureThreshold:     5,               // Failures before opening
    SuccessThreshold:     2,               // Successes to close from half-open
    Timeout:              30 * time.Second, // Wait before half-open
    FailureRateThreshold: 0.5,             // 50% failure rate opens circuit
    MinimumRequests:      10,              // Minimum requests for rate calculation
}

🔢 Token Estimation

OmniLLM provides pre-flight token estimation to validate requests before sending them to the API. This helps avoid hitting context window limits.

Basic Usage

// Create estimator with default config
estimator := omnillm.NewTokenEstimator(omnillm.DefaultTokenEstimatorConfig())

// Estimate tokens for messages
tokens, err := estimator.EstimateTokens("gpt-4o", messages)

// Get model's context window
window := estimator.GetContextWindow("gpt-4o") // Returns 128000

Automatic Validation

Enable automatic token validation in client:

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-key"},
    },
    TokenEstimator: omnillm.NewTokenEstimator(omnillm.DefaultTokenEstimatorConfig()),
    ValidateTokens: true, // Rejects requests that exceed context window
})

// Returns TokenLimitError if request exceeds model limits
response, err := client.CreateChatCompletion(ctx, request)
if tlErr, ok := err.(*omnillm.TokenLimitError); ok {
    fmt.Printf("Request has %d tokens, but model only supports %d\n",
        tlErr.EstimatedTokens, tlErr.ContextWindow)
}

Built-in Context Windows

Token estimator includes context windows for 40+ models:

Provider	Models	Context Window
OpenAI	GPT-4o, GPT-4o-mini	128,000
OpenAI	o1	200,000
Anthropic	Claude 3/3.5/4	200,000
Google	Gemini 2.5	1,000,000
Google	Gemini 1.5 Pro	2,000,000
X.AI	Grok 3/4	128,000

Custom Configuration

config := omnillm.TokenEstimatorConfig{
    CharactersPerToken: 3.5, // More conservative estimate
    CustomContextWindows: map[string]int{
        "my-custom-model": 500000,
        "gpt-4o":          200000, // Override built-in
    },
}
estimator := omnillm.NewTokenEstimator(config)

💾 Response Caching

OmniLLM supports response caching to reduce API costs for identical requests. Caching uses the same KVS backend as conversation memory.

Basic Usage

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-key"},
    },
    Cache: kvsClient, // Your KVS implementation (Redis, DynamoDB, etc.)
    CacheConfig: &omnillm.CacheConfig{
        TTL:       1 * time.Hour,        // Cache duration
        KeyPrefix: "myapp:llm-cache",    // Key prefix in KVS
    },
})

// First call hits the API
response1, _ := client.CreateChatCompletion(ctx, request)

// Second identical call returns cached response
response2, _ := client.CreateChatCompletion(ctx, request)

// Check if response was from cache
if response2.ProviderMetadata["cache_hit"] == true {
    fmt.Println("Response was cached!")
}

Cache Configuration

cacheConfig := &omnillm.CacheConfig{
    TTL:                1 * time.Hour,       // Time-to-live
    KeyPrefix:          "omnillm:cache",     // Key prefix
    SkipStreaming:      true,                // Don't cache streaming (default)
    CacheableModels:    []string{"gpt-4o"},  // Only cache specific models (nil = all)
    IncludeTemperature: true,                // Temperature affects cache key
    IncludeSeed:        true,                // Seed affects cache key
}

Cache Key Generation

Cache keys are generated from a SHA-256 hash of:

Model name
Messages (role, content, name, tool_call_id)
MaxTokens, Temperature, TopP, TopK, Seed, Stop sequences

Different parameter values = different cache keys.

🔄 Provider Switching

The unified interface makes it easy to switch between providers:

// Same request works with any provider
request := &omnillm.ChatCompletionRequest{
    Model: omnillm.ModelGPT4o, // or omnillm.ModelClaude3Sonnet, etc.
    Messages: []omnillm.Message{
        {Role: omnillm.RoleUser, Content: "Hello, world!"},
    },
    MaxTokens: &[]int{100}[0],
}

// OpenAI
openaiClient, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "openai-key"},
    },
})

// Anthropic
anthropicClient, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameAnthropic, APIKey: "anthropic-key"},
    },
})

// Gemini
geminiClient, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameGemini, APIKey: "gemini-key"},
    },
})

// Same API call for all providers
response1, _ := openaiClient.CreateChatCompletion(ctx, request)
response2, _ := anthropicClient.CreateChatCompletion(ctx, request)
response3, _ := geminiClient.CreateChatCompletion(ctx, request)

🧪 Testing

OmniLLM includes a comprehensive test suite with both unit tests and integration tests.

Running Tests

# Run all unit tests (no API keys required)
go test ./... -short

# Run with coverage
go test ./... -short -cover

# Run integration tests (requires API keys)
ANTHROPIC_API_KEY=your-key go test ./providers/anthropic -v
OPENAI_API_KEY=your-key go test ./providers/openai -v
XAI_API_KEY=your-key go test ./providers/xai -v
GLM_API_KEY=your-key go test ./providers/glm -v
KIMI_API_KEY=your-key go test ./providers/kimi -v
QWEN_API_KEY=your-key go test ./providers/qwen -v

# Run all tests including integration
ANTHROPIC_API_KEY=your-key OPENAI_API_KEY=your-key XAI_API_KEY=your-key GLM_API_KEY=your-key KIMI_API_KEY=your-key QWEN_API_KEY=your-key go test ./... -v

Test Coverage

Unit Tests: Mock-based tests that run without external dependencies
Integration Tests: Real API tests that skip gracefully when API keys are not set
Memory Tests: Comprehensive conversation memory management tests
Provider Tests: Adapter logic, message conversion, and streaming tests

Writing Tests

The clean interface design makes testing straightforward:

// Mock the Provider interface for testing
type mockProvider struct{}

func (m *mockProvider) CreateChatCompletion(ctx context.Context, req *omnillm.ChatCompletionRequest) (*omnillm.ChatCompletionResponse, error) {
    return &omnillm.ChatCompletionResponse{
        Choices: []omnillm.ChatCompletionChoice{
            {
                Message: omnillm.Message{
                    Role:    omnillm.RoleAssistant,
                    Content: "Mock response",
                },
            },
        },
    }, nil
}

func (m *mockProvider) CreateChatCompletionStream(ctx context.Context, req *omnillm.ChatCompletionRequest) (omnillm.ChatCompletionStream, error) {
    return nil, nil
}

func (m *mockProvider) Close() error { return nil }
func (m *mockProvider) Name() string { return "mock" }

Conditional Integration Tests

Integration tests automatically skip when API keys are not available:

func TestAnthropicIntegration_Streaming(t *testing.T) {
    apiKey := os.Getenv("ANTHROPIC_API_KEY")
    if apiKey == "" {
        t.Skip("Skipping integration test: ANTHROPIC_API_KEY not set")
    }
    // Test code here...
}

Mock KVS for Memory Testing

OmniLLM provides a mock KVS implementation for testing memory functionality:

import omnillmtest "github.com/plexusone/omnillm-core/testing"

// Create mock KVS for testing
mockKVS := omnillmtest.NewMockKVS()

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "test-key"},
    },
    Memory: mockKVS,
})

📚 Examples

The repository includes comprehensive examples:

Basic Usage: Simple chat completions with each provider
Streaming: Real-time response handling
Conversation: Multi-turn conversations with context
Memory Demo: Persistent conversation memory with KVS backend
Architecture Demo: Overview of the provider architecture
Custom Provider: How to create and use 3rd party providers

Run examples:

go run examples/basic/main.go
go run examples/streaming/main.go
go run examples/anthropic_streaming/main.go
go run examples/conversation/main.go
go run examples/memory_demo/main.go
go run examples/providers_demo/main.go
go run examples/xai/main.go
go run examples/ollama/main.go
go run examples/ollama_streaming/main.go
go run examples/gemini/main.go
go run examples/custom_provider/main.go

🔧 Configuration

Environment Variables

OPENAI_API_KEY: Your OpenAI API key
ANTHROPIC_API_KEY: Your Anthropic API key
GEMINI_API_KEY: Your Google Gemini API key
XAI_API_KEY: Your X.AI API key
GLM_API_KEY: Your Zhipu AI GLM API key
KIMI_API_KEY: Your Moonshot AI Kimi API key
QWEN_API_KEY: Your Alibaba Cloud Qwen API key

Advanced Configuration

config := omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {
            Provider: omnillm.ProviderNameOpenAI,
            APIKey:   "your-api-key",
            BaseURL:  "https://custom-endpoint.com/v1",
            Extra: map[string]any{
                "timeout": 60, // Custom provider-specific settings
            },
        },
    },
}

Request Parameters

ChatCompletionRequest supports the following parameters with provider-specific availability:

Parameter	Type	Providers	Description
`Model`	`string`	All	Model identifier (required)
`Messages`	`[]Message`	All	Conversation messages (required)
`MaxTokens`	`*int`	All	Maximum tokens to generate
`Temperature`	`*float64`	All	Randomness (0.0-2.0)
`TopP`	`*float64`	All	Nucleus sampling threshold
`TopK`	`*int`	Anthropic, Gemini, Ollama	Top K token selection
`Stop`	`[]string`	All	Stop sequences
`PresencePenalty`	`*float64`	OpenAI, X.AI	Penalize tokens by presence
`FrequencyPenalty`	`*float64`	OpenAI, X.AI	Penalize tokens by frequency
`Seed`	`*int`	OpenAI, X.AI, Ollama	Reproducible outputs
`N`	`*int`	OpenAI	Number of completions
`ResponseFormat`	`*ResponseFormat`	OpenAI, Gemini	JSON mode (`{"type": "json_object"}`)
`Logprobs`	`*bool`	OpenAI	Return log probabilities
`TopLogprobs`	`*int`	OpenAI	Top logprobs count (0-20)
`User`	`*string`	OpenAI	End-user identifier
`LogitBias`	`map[string]int`	OpenAI	Token bias adjustments

// Helper for pointer values
func ptr[T any](v T) *T { return &v }

// Example: Reproducible outputs with seed
response, err := client.CreateChatCompletion(ctx, &omnillm.ChatCompletionRequest{
    Model:    omnillm.ModelGPT4o,
    Messages: messages,
    Seed:     ptr(42), // Same seed = same output
})

// Example: JSON mode response
response, err := client.CreateChatCompletion(ctx, &omnillm.ChatCompletionRequest{
    Model:    omnillm.ModelGPT4o,
    Messages: messages,
    ResponseFormat: &omnillm.ResponseFormat{Type: "json_object"},
})

// Example: TopK sampling (Anthropic/Gemini/Ollama)
response, err := client.CreateChatCompletion(ctx, &omnillm.ChatCompletionRequest{
    Model:    omnillm.ModelClaude3Sonnet,
    Messages: messages,
    TopK:     ptr(40), // Consider only top 40 tokens
})

Logging Configuration

OmniLLM supports injectable logging via Go's standard log/slog package. If no logger is provided, a null logger is used (no output).

import (
    "log/slog"
    "os"

    "github.com/plexusone/omnillm-core"
)

// Use a custom logger
logger := slog.New(slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{
    Level: slog.LevelDebug,
}))

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-api-key"},
    },
    Logger: logger, // Optional: defaults to null logger if not provided
})

// Access the logger if needed
client.Logger().Info("client initialized", slog.String("provider", "openai"))

The logger is used internally for non-critical errors (e.g., memory save failures) that shouldn't interrupt the main request flow.

Context-Aware Logging

OmniLLM supports request-scoped logging via context. This allows you to attach trace IDs, user IDs, or other request-specific attributes to all log output within a request:

import (
    "log/slog"

    "github.com/plexusone/omnillm-core"
    "github.com/grokify/mogo/log/slogutil"
)

// Create a request-scoped logger with trace/user context
reqLogger := slog.Default().With(
    slog.String("trace_id", traceID),
    slog.String("user_id", userID),
    slog.String("request_id", requestID),
)

// Attach logger to context
ctx = slogutil.ContextWithLogger(ctx, reqLogger)

// All internal logging will now include trace_id, user_id, and request_id
response, err := client.CreateChatCompletionWithMemory(ctx, sessionID, req)

The context-aware logger is retrieved using slogutil.LoggerFromContext(ctx, fallback), which returns the context logger if present, or falls back to the client's configured logger.

Retry with Backoff

OmniLLM supports automatic retries for transient failures (rate limits, 5xx errors) via a custom HTTP client. This uses the retryhttp package from github.com/grokify/mogo.

import (
    "net/http"
    "time"

    "github.com/plexusone/omnillm-core"
    "github.com/grokify/mogo/net/http/retryhttp"
)

// Create retry transport with exponential backoff
rt := retryhttp.NewWithOptions(
    retryhttp.WithMaxRetries(5),                           // Max 5 retries
    retryhttp.WithInitialBackoff(500 * time.Millisecond),  // Start with 500ms
    retryhttp.WithMaxBackoff(30 * time.Second),            // Cap at 30s
    retryhttp.WithOnRetry(func(attempt int, req *http.Request, resp *http.Response, err error, backoff time.Duration) {
        log.Printf("Retry attempt %d, waiting %v", attempt, backoff)
    }),
)

// Create client with retry-enabled HTTP client
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {
            Provider: omnillm.ProviderNameOpenAI,
            APIKey:   os.Getenv("OPENAI_API_KEY"),
            HTTPClient: &http.Client{
                Transport: rt,
                Timeout:   2 * time.Minute, // Allow time for retries
            },
        },
    },
})

Retry Transport Features:

Feature	Default	Description
Max Retries	3	Maximum retry attempts
Initial Backoff	1s	Starting backoff duration
Max Backoff	30s	Cap on backoff duration
Backoff Multiplier	2.0	Exponential growth factor
Jitter	10%	Randomness to prevent thundering herd
Retryable Status Codes	429, 500, 502, 503, 504	Rate limits + 5xx errors

Additional Options:

WithRetryableStatusCodes(codes) - Custom status codes to retry
WithShouldRetry(fn) - Custom retry decision function
WithLogger(logger) - Structured logging for retry events
Respects Retry-After headers from API responses

Provider Support: Works with OpenAI, Anthropic, X.AI, and Ollama providers. Gemini and Bedrock use SDK clients with their own retry mechanisms.

🏗️ Adding New Providers

🎯 3rd Party Providers (Recommended)

External packages can create providers without modifying the core library. This is the recommended approach for most use cases:

Step 1: Create Your Provider Package

// In your external package (e.g., github.com/yourname/omnillm-gemini)
package gemini

import (
    "context"
    "github.com/plexusone/omnillm-core/provider"
)

// Step 1: HTTP Client (like providers/openai/openai.go)
type Client struct {
    apiKey string
    // your HTTP client implementation
}

func New(apiKey string) *Client {
    return &Client{apiKey: apiKey}
}

// Step 2: Provider Adapter (like providers/openai/adapter.go)
type Provider struct {
    client *Client
}

func NewProvider(apiKey string) provider.Provider {
    return &Provider{client: New(apiKey)}
}

func (p *Provider) CreateChatCompletion(ctx context.Context, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error) {
    // Convert provider.ChatCompletionRequest to your API format
    // Make HTTP call via p.client
    // Convert response back to provider.ChatCompletionResponse
}

func (p *Provider) CreateChatCompletionStream(ctx context.Context, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error) {
    // Your streaming implementation
}

func (p *Provider) Close() error { return p.client.Close() }
func (p *Provider) Name() string { return "gemini" }

Step 2: Use Your Provider

import (
    "github.com/plexusone/omnillm-core"
    "github.com/yourname/omnillm-gemini"
)

func main() {
    // Create your custom provider
    customProvider := gemini.NewProvider("your-api-key")

    // Inject it directly into omnillm - no core modifications needed!
    client, err := omnillm.NewClient(omnillm.ClientConfig{
        Providers: []omnillm.ProviderConfig{
            {CustomProvider: customProvider},
        },
    })

    // Use the same omnillm API
    response, err := client.CreateChatCompletion(ctx, &omnillm.ChatCompletionRequest{
        Model: "gemini-pro",
        Messages: []omnillm.Message{{Role: omnillm.RoleUser, Content: "Hello!"}},
    })
}

🔧 Built-in Providers (For Core Contributors)

To add a built-in provider to the core library, follow the same structure as existing providers:

Create Provider Package: providers/newprovider/
- newprovider.go - HTTP client implementation
- types.go - Provider-specific request/response types
- adapter.go - provider.Provider interface implementation
Update Core Files:
- Add factory function in providers.go
- Add provider constant in constants.go
- Add model constants if needed
Reference Implementation: Look at any existing provider (e.g., providers/openai/) as they all follow the exact same pattern that external providers should use

🎯 Why This Architecture?

🔌 No Core Changes: External providers don't require modifying the core library
🏗️ Reference Pattern: Internal providers demonstrate the exact structure external providers should follow
🧪 Easy Testing: Both internal and external providers use the same provider.Provider interface
📦 Self-Contained: Each provider manages its own HTTP client, types, and adapter logic
🔧 Direct Injection: Clean dependency injection via ProviderConfig.CustomProvider

📊 Model Support

Provider	Models	Features
OpenAI	GPT-5, GPT-4.1, GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo	Chat, Streaming, Functions
Anthropic	Claude-Opus-4.1, Claude-Opus-4, Claude-Sonnet-4, Claude-3.7-Sonnet, Claude-3.5-Haiku	Chat, Streaming, System messages
Gemini	Gemini-2.5-Pro, Gemini-2.5-Flash, Gemini-1.5-Pro, Gemini-1.5-Flash	Chat, Streaming
X.AI	Grok-4.1-Fast, Grok-4, Grok-4-Fast, Grok-Code-Fast, Grok-3, Grok-3-Mini, Grok-2	Chat, Streaming, 2M context, Tool calling
GLM	GLM-5, GLM-4.7, GLM-4.6, GLM-4.5 series	Chat, Streaming, Thinking modes
Kimi	Kimi K2.5, K2 series, Moonshot V1	Chat, Streaming, 256K context, Vision
Qwen	Qwen3 Max, QwQ, Qwen3.5, Qwen3, Qwen2.5 series	Chat, Streaming, Thinking modes
Ollama	Llama 3, Mistral, CodeLlama, Gemma, Qwen2.5, DeepSeek-Coder	Chat, Streaming, Local inference
Bedrock*	Claude models, Titan models	Chat, Multiple model families

*Available as external module

🚨 Error Handling

OmniLLM provides comprehensive error handling with provider-specific context:

response, err := client.CreateChatCompletion(ctx, request)
if err != nil {
    if apiErr, ok := err.(*omnillm.APIError); ok {
        fmt.Printf("Provider: %s, Status: %d, Message: %s\n", 
            apiErr.Provider, apiErr.StatusCode, apiErr.Message)
    }
}

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Make your changes

Run tests to ensure everything works:

go test ./... -short        # Run unit tests
go build ./...              # Verify build
go vet ./...                # Run static analysis

Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Adding Tests

When contributing new features:

Add unit tests for core logic
Add integration tests for provider implementations (with API key checks)
Ensure tests pass without API keys using -short flag
Mock external dependencies when possible

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Anthropic Go SDK - Official Anthropic SDK
OpenAI Go SDK - Official OpenAI SDK
AWS SDK for Go - Official AWS SDK

Made with ❤️ for the Go and AI community

Documentation ¶

Index ¶

Constants
Variables
func EstimatePromptTokens(model string, messages []provider.Message) (int, error)
func GetEmbeddingProvider(name ProviderName, config ProviderConfig) (provider.EmbeddingProvider, error)
func GetModelContextWindow(model string) int
func GetProviderPriority(name ProviderName) int
func IsNonRetryableError(err error) bool
func IsRetryableError(err error) bool
func RegisterEmbeddingProvider(name ProviderName, factory EmbeddingProviderFactory, priority int)
func RegisterProvider(name ProviderName, factory ProviderFactory, priority int)
type APIError
- func NewAPIError(provider string, statusCode int, errorType, message string) *APIError
- func NewAPIErrorFull(provider ProviderName, statusCode int, message, errorType, code string) *APIError
- func (e *APIError) Error() string
type CacheConfig
- func DefaultCacheConfig() CacheConfig
type CacheEntry
- func (e *CacheEntry) IsExpired() bool
type CacheHitError
- func (e *CacheHitError) Error() string
type CacheManager
- func NewCacheManager(kvsClient kvs.Client, config CacheConfig) *CacheManager
- func (m *CacheManager) BuildCacheKey(req *provider.ChatCompletionRequest) string
- func (m *CacheManager) Config() CacheConfig
- func (m *CacheManager) Delete(ctx context.Context, req *provider.ChatCompletionRequest) error
- func (m *CacheManager) Get(ctx context.Context, req *provider.ChatCompletionRequest) (*CacheEntry, error)
- func (m *CacheManager) Set(ctx context.Context, req *provider.ChatCompletionRequest, ...) error
- func (m *CacheManager) ShouldCache(req *provider.ChatCompletionRequest) bool
type CacheStats
type Capabilities
type ChatClient
- func NewClient(config ClientConfig) (*ChatClient, error)
- func (c *ChatClient) AppendMessage(ctx context.Context, sessionID string, message provider.Message) error
- func (c *ChatClient) Cache() *CacheManager
- func (c *ChatClient) Close() error
- func (c *ChatClient) CreateChatCompletion(ctx context.Context, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error)
- func (c *ChatClient) CreateChatCompletionStream(ctx context.Context, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error)
- func (c *ChatClient) CreateChatCompletionStreamWithMemory(ctx context.Context, sessionID string, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error)
- func (c *ChatClient) CreateChatCompletionWithMemory(ctx context.Context, sessionID string, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error)
- func (c *ChatClient) CreateConversationWithSystemMessage(ctx context.Context, sessionID, systemMessage string) error
- func (c *ChatClient) DeleteConversation(ctx context.Context, sessionID string) error
- func (c *ChatClient) GetConversationMessages(ctx context.Context, sessionID string) ([]provider.Message, error)
- func (c *ChatClient) HasCache() bool
- func (c *ChatClient) HasMemory() bool
- func (c *ChatClient) LoadConversation(ctx context.Context, sessionID string) (*ConversationMemory, error)
- func (c *ChatClient) Logger() *slog.Logger
- func (c *ChatClient) Memory() *MemoryManager
- func (c *ChatClient) Provider() provider.Provider
- func (c *ChatClient) SaveConversation(ctx context.Context, conversation *ConversationMemory) error
- func (c *ChatClient) TokenEstimator() TokenEstimator
type ChatCompletionChoice
type ChatCompletionChunk
type ChatCompletionRequest
type ChatCompletionResponse
type ChatCompletionStream
type CircuitBreaker
- func NewCircuitBreaker(config CircuitBreakerConfig) *CircuitBreaker
- func (cb *CircuitBreaker) AllowRequest() bool
- func (cb *CircuitBreaker) RecordFailure()
- func (cb *CircuitBreaker) RecordSuccess()
- func (cb *CircuitBreaker) Reset()
- func (cb *CircuitBreaker) State() CircuitState
- func (cb *CircuitBreaker) Stats() CircuitBreakerStats
type CircuitBreakerConfig
- func DefaultCircuitBreakerConfig() CircuitBreakerConfig
type CircuitBreakerStats
type CircuitOpenError
- func (e *CircuitOpenError) Error() string
type CircuitState
- func (s CircuitState) String() string
type ClientConfig
type ConversationMemory
type EmbeddingProviderFactory
- func GetEmbeddingProviderFactory(name ProviderName) EmbeddingProviderFactory
type ErrorCategory
- func ClassifyError(err error) ErrorCategory
- func (c ErrorCategory) String() string
type FallbackAttempt
type FallbackError
- func (e *FallbackError) Error() string
- func (e *FallbackError) Unwrap() error
type FallbackProvider
- func NewFallbackProvider(primary provider.Provider, fallbacks []provider.Provider, ...) *FallbackProvider
- func (fp *FallbackProvider) CircuitBreaker(providerName string) *CircuitBreaker
- func (fp *FallbackProvider) Close() error
- func (fp *FallbackProvider) CreateChatCompletion(ctx context.Context, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error)
- func (fp *FallbackProvider) CreateChatCompletionStream(ctx context.Context, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error)
- func (fp *FallbackProvider) FallbackProviders() []provider.Provider
- func (fp *FallbackProvider) Name() string
- func (fp *FallbackProvider) PrimaryProvider() provider.Provider
type FallbackProviderConfig
type LLMCallInfo
type MemoryConfig
- func DefaultMemoryConfig() MemoryConfig
type MemoryManager
- func NewMemoryManager(kvsClient kvs.Client, config MemoryConfig) *MemoryManager
- func (m *MemoryManager) AppendMessage(ctx context.Context, sessionID string, message Message) error
- func (m *MemoryManager) AppendMessages(ctx context.Context, sessionID string, messages []Message) error
- func (m *MemoryManager) CreateConversationWithSystemMessage(ctx context.Context, sessionID, systemMessage string) error
- func (m *MemoryManager) DeleteConversation(ctx context.Context, sessionID string) error
- func (m *MemoryManager) GetMessages(ctx context.Context, sessionID string) ([]Message, error)
- func (m *MemoryManager) LoadConversation(ctx context.Context, sessionID string) (*ConversationMemory, error)
- func (m *MemoryManager) SaveConversation(ctx context.Context, conversation *ConversationMemory) error
- func (m *MemoryManager) SetMetadata(ctx context.Context, sessionID string, metadata map[string]any) error
type Message
type ModelInfo
- func GetModelInfo(modelID string) *ModelInfo
type ObservabilityHook
type Provider
type ProviderConfig
type ProviderFactory
- func GetProviderFactory(name ProviderName) ProviderFactory
type ProviderName
- func ListEmbeddingProviders() []ProviderName
- func ListRegisteredProviders() []ProviderName
type ResponseFormat
type Role
type TokenEstimator
- func NewTokenEstimator(config TokenEstimatorConfig) TokenEstimator
type TokenEstimatorConfig
- func DefaultTokenEstimatorConfig() TokenEstimatorConfig
type TokenLimitError
- func (e *TokenLimitError) Error() string
type TokenValidation
- func ValidateTokens(estimator TokenEstimator, model string, messages []provider.Message, ...) (*TokenValidation, error)
type Tool
type ToolCall
type ToolFunction
type ToolSpec
type Usage

Constants ¶

View Source

const (
	EnvVarAnthropicAPIKey = "ANTHROPIC_API_KEY" // #nosec G101
	EnvVarOpenAIAPIKey    = "OPENAI_API_KEY"    // #nosec G101
	EnvVarGeminiAPIKey    = "GEMINI_API_KEY"    // #nosec G101
	EnvVarXAIAPIKey       = "XAI_API_KEY"       // #nosec G101
	EnvVarKimiAPIKey      = "KIMI_API_KEY"      // #nosec G101
	EnvVarGLMAPIKey       = "GLM_API_KEY"       // #nosec G101
	EnvVarQwenAPIKey      = "QWEN_API_KEY"      // #nosec G101
)

View Source

const (
	// Bedrock Models - Re-exported from models package
	ModelBedrockClaude3Opus   = models.BedrockClaude3Opus
	ModelBedrockClaude3Sonnet = models.BedrockClaude3Sonnet
	ModelBedrockClaudeOpus4   = models.BedrockClaudeOpus4
	ModelBedrockTitan         = models.BedrockTitan

	// Claude Models - Re-exported from models package
	ModelClaudeOpus4_1   = models.ClaudeOpus4_1
	ModelClaudeOpus4     = models.ClaudeOpus4
	ModelClaudeSonnet4   = models.ClaudeSonnet4
	ModelClaude3_7Sonnet = models.Claude3_7Sonnet
	ModelClaude3_5Haiku  = models.Claude3_5Haiku
	ModelClaude3Opus     = models.Claude3Opus
	ModelClaude3Sonnet   = models.Claude3Sonnet
	ModelClaude3Haiku    = models.Claude3Haiku

	// Gemini Models - Re-exported from models package
	ModelGemini2_5Pro       = models.Gemini2_5Pro
	ModelGemini2_5Flash     = models.Gemini2_5Flash
	ModelGeminiLive2_5Flash = models.GeminiLive2_5Flash
	ModelGemini1_5Pro       = models.Gemini1_5Pro
	ModelGemini1_5Flash     = models.Gemini1_5Flash
	ModelGeminiPro          = models.GeminiPro

	// Ollama Models - Re-exported from models package
	ModelOllamaLlama3_8B   = models.OllamaLlama3_8B
	ModelOllamaLlama3_70B  = models.OllamaLlama3_70B
	ModelOllamaMistral7B   = models.OllamaMistral7B
	ModelOllamaMixtral8x7B = models.OllamaMixtral8x7B
	ModelOllamaCodeLlama   = models.OllamaCodeLlama
	ModelOllamaGemma2B     = models.OllamaGemma2B
	ModelOllamaGemma7B     = models.OllamaGemma7B
	ModelOllamaQwen2_5     = models.OllamaQwen2_5
	ModelOllamaDeepSeek    = models.OllamaDeepSeek

	// OpenAI Models - Re-exported from models package
	ModelGPT5           = models.GPT5
	ModelGPT5Mini       = models.GPT5Mini
	ModelGPT5Nano       = models.GPT5Nano
	ModelGPT5ChatLatest = models.GPT5ChatLatest
	ModelGPT4_1         = models.GPT4_1
	ModelGPT4_1Mini     = models.GPT4_1Mini
	ModelGPT4_1Nano     = models.GPT4_1Nano
	ModelGPT4o          = models.GPT4o
	ModelGPT4oMini      = models.GPT4oMini
	ModelGPT4Turbo      = models.GPT4Turbo
	ModelGPT35Turbo     = models.GPT35Turbo

	// Vertex AI Models - Re-exported from models package
	ModelVertexClaudeOpus4 = models.VertexClaudeOpus4

	// X.AI Grok Models - Re-exported from models package
	// Grok 4.1 (Latest - November 2025)
	ModelGrok4_1FastReasoning    = models.Grok4_1FastReasoning
	ModelGrok4_1FastNonReasoning = models.Grok4_1FastNonReasoning

	// Grok 4 (July 2025)
	ModelGrok4_0709            = models.Grok4_0709
	ModelGrok4FastReasoning    = models.Grok4FastReasoning
	ModelGrok4FastNonReasoning = models.Grok4FastNonReasoning
	ModelGrokCodeFast1         = models.GrokCodeFast1

	// Grok 3
	ModelGrok3     = models.Grok3
	ModelGrok3Mini = models.Grok3Mini

	// Grok 2
	ModelGrok2_1212   = models.Grok2_1212
	ModelGrok2_Vision = models.Grok2_Vision

	// Deprecated models
	ModelGrokBeta   = models.GrokBeta
	ModelGrokVision = models.GrokVision

	// Kimi / Moonshot AI Models - Re-exported from models package
	ModelKimiK2_5        = models.KimiK2_5
	ModelKimiK2Turbo     = models.KimiK2Turbo
	ModelKimiK2Thinking  = models.KimiK2Thinking
	ModelMoonshotV1_8K   = models.MoonshotV1_8K
	ModelMoonshotV1_32K  = models.MoonshotV1_32K
	ModelMoonshotV1_128K = models.MoonshotV1_128K

	// Zhipu AI GLM Models - Re-exported from models package
	ModelGLM5         = models.GLM5
	ModelGLM4_7       = models.GLM4_7
	ModelGLM4_7FlashX = models.GLM4_7FlashX
	ModelGLM4_7Flash  = models.GLM4_7Flash
	ModelGLM4_5       = models.GLM4_5
	ModelGLM4_5Flash  = models.GLM4_5Flash

	// Alibaba Cloud Qwen Models - Re-exported from models package
	ModelQwen3Max    = models.Qwen3Max
	ModelQwenMax     = models.QwenMax
	ModelQwenPlus    = models.QwenPlus
	ModelQwenFlash   = models.QwenFlash
	ModelQwQ32B      = models.QwQ32B
	ModelQwen3_235B  = models.Qwen3_235B
	ModelQwen3_32B   = models.Qwen3_32B
	ModelQwen2_5_72B = models.Qwen2_5_72B
)

Common model constants for each provider.

NOTE: For new code, prefer importing "github.com/plexusone/omnillm-core/models" directly for better organization and documentation. These constants are maintained for backwards compatibility with existing code.

View Source

const (
	// PriorityThin is the priority for thin (stdlib-only) provider implementations.
	PriorityThin = 0

	// PriorityThick is the priority for thick (official SDK) provider implementations.
	PriorityThick = 10
)

Priority constants for provider registration.

View Source

const (
	RoleSystem    = provider.RoleSystem
	RoleUser      = provider.RoleUser
	RoleAssistant = provider.RoleAssistant
	RoleTool      = provider.RoleTool
)

Role constants for convenience

Variables ¶

View Source

var (
	// Common errors
	ErrUnsupportedProvider  = errors.New("unsupported provider")
	ErrBedrockExternal      = errors.New("bedrock provider moved to github.com/plexusone/omnillm-bedrock; use CustomProvider to inject it")
	ErrInvalidConfiguration = errors.New("invalid configuration")
	ErrNoProviders          = errors.New("at least one provider must be configured")
	ErrEmptyAPIKey          = errors.New("API key cannot be empty")
	ErrEmptyModel           = errors.New("model cannot be empty")
	ErrEmptyMessages        = errors.New("messages cannot be empty")
	ErrStreamClosed         = errors.New("stream is closed")
	ErrInvalidResponse      = errors.New("invalid response format")
	ErrRateLimitExceeded    = errors.New("rate limit exceeded")
	ErrQuotaExceeded        = errors.New("quota exceeded")
	ErrInvalidRequest       = errors.New("invalid request")
	ErrModelNotFound        = errors.New("model not found")
	ErrServerError          = errors.New("server error")
	ErrNetworkError         = errors.New("network error")

	// Aliases for thick provider compatibility
	ErrInvalidAPIKey = ErrEmptyAPIKey
)

Functions ¶

func EstimatePromptTokens ¶

func EstimatePromptTokens(model string, messages []provider.Message) (int, error)

EstimatePromptTokens is a convenience function that creates a default estimator and estimates tokens for a set of messages.

func GetEmbeddingProvider ¶ added in v0.16.0

func GetEmbeddingProvider(name ProviderName, config ProviderConfig) (provider.EmbeddingProvider, error)

GetEmbeddingProvider creates an embedding provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetModelContextWindow ¶

func GetModelContextWindow(model string) int

GetModelContextWindow is a convenience function that returns the context window for a model using the default estimator.

func GetProviderPriority ¶

func GetProviderPriority(name ProviderName) int

GetProviderPriority returns the priority of the registered provider. Returns -1 if the provider is not registered.

func IsNonRetryableError ¶

func IsNonRetryableError(err error) bool

IsNonRetryableError returns true if the error is permanent and retrying won't help.

func IsRetryableError ¶

func IsRetryableError(err error) bool

IsRetryableError returns true if the error is transient and the request can be retried. This is useful for fallback provider logic - only retry on retryable errors.

func RegisterEmbeddingProvider ¶ added in v0.16.0

func RegisterEmbeddingProvider(name ProviderName, factory EmbeddingProviderFactory, priority int)

RegisterEmbeddingProvider registers an embedding provider factory with the given name and priority. Higher priority values override lower priority registrations. Thin (stdlib) providers should use priority 0. Thick (SDK) providers should use priority 10.

func RegisterProvider ¶

func RegisterProvider(name ProviderName, factory ProviderFactory, priority int)

RegisterProvider registers a provider factory with the given name and priority. Higher priority values override lower priority registrations. Thin (stdlib) providers should use priority 0. Thick (SDK) providers should use priority 10.

Example:

// In omnillm-core/providers/openai/init.go (thin, priority 0)
func init() {
    omnillm.RegisterProvider(omnillm.ProviderNameOpenAI, NewProvider, 0)
}

// In omnillm-openai/init.go (thick, priority 10)
func init() {
    omnillm.RegisterProvider(omnillm.ProviderNameOpenAI, NewProvider, 10)
}

Types ¶

type APIError ¶

type APIError struct {
	StatusCode int          `json:"status_code"`
	Message    string       `json:"message"`
	Type       string       `json:"type"`
	Code       string       `json:"code"`
	Provider   ProviderName `json:"provider"`
}

APIError represents an error response from the API

func NewAPIError ¶

func NewAPIError(provider string, statusCode int, errorType, message string) *APIError

NewAPIError creates a new API error. This signature is compatible with thick providers that pass (provider, statusCode, errorType, message).

func NewAPIErrorFull ¶

func NewAPIErrorFull(provider ProviderName, statusCode int, message, errorType, code string) *APIError

NewAPIErrorFull creates a new API error with all fields.

func (*APIError) Error ¶

func (e *APIError) Error() string

type CacheConfig ¶

type CacheConfig struct {
	// TTL is the time-to-live for cached responses.
	// Default: 1 hour
	TTL time.Duration

	// KeyPrefix is the prefix for cache keys in the KVS.
	// Default: "omnillm:cache"
	KeyPrefix string

	// SkipStreaming skips caching for streaming requests.
	// Default: true (streaming responses are not cached)
	SkipStreaming bool

	// CacheableModels limits caching to specific models.
	// If nil or empty, all models are cached.
	CacheableModels []string

	// ExcludeParameters lists parameters to exclude from cache key calculation.
	// Common exclusions: "user" (user ID shouldn't affect cache)
	// Default: ["user"]
	ExcludeParameters []string

	// IncludeTemperature includes temperature in cache key.
	// Set to false if you want to cache regardless of temperature setting.
	// Default: true
	IncludeTemperature bool

	// IncludeSeed includes seed in cache key.
	// Default: true
	IncludeSeed bool
}

CacheConfig configures response caching behavior

func DefaultCacheConfig ¶

func DefaultCacheConfig() CacheConfig

DefaultCacheConfig returns a CacheConfig with sensible defaults

type CacheEntry ¶

type CacheEntry struct {
	// Response is the cached chat completion response
	Response *provider.ChatCompletionResponse `json:"response"`

	// CachedAt is when the response was cached
	CachedAt time.Time `json:"cached_at"`

	// ExpiresAt is when the cache entry expires
	ExpiresAt time.Time `json:"expires_at"`

	// Model is the model used for the request
	Model string `json:"model"`

	// RequestHash is the hash of the request (for verification)
	RequestHash string `json:"request_hash"`
}

CacheEntry represents a cached response with metadata

func (*CacheEntry) IsExpired ¶

func (e *CacheEntry) IsExpired() bool

IsExpired returns true if the cache entry has expired

type CacheHitError ¶

type CacheHitError struct {
	Entry *CacheEntry
}

CacheHitError is a marker type to indicate a cache hit (not an actual error)

func (*CacheHitError) Error ¶

func (e *CacheHitError) Error() string

type CacheManager ¶

type CacheManager struct {
	// contains filtered or unexported fields
}

CacheManager handles response caching using a KVS backend

func NewCacheManager ¶

func NewCacheManager(kvsClient kvs.Client, config CacheConfig) *CacheManager

NewCacheManager creates a new cache manager with the given KVS client and configuration. If config has zero values, defaults are used for those fields.

func (*CacheManager) BuildCacheKey ¶

func (m *CacheManager) BuildCacheKey(req *provider.ChatCompletionRequest) string

BuildCacheKey generates a deterministic cache key for a request. The key is a hash of the normalized request parameters.

func (*CacheManager) Config ¶

func (m *CacheManager) Config() CacheConfig

Config returns the cache configuration

func (*CacheManager) Delete ¶

func (m *CacheManager) Delete(ctx context.Context, req *provider.ChatCompletionRequest) error

Delete removes a cache entry for the given request.

func (*CacheManager) Get ¶

func (m *CacheManager) Get(ctx context.Context, req *provider.ChatCompletionRequest) (*CacheEntry, error)

Get retrieves a cached response for the given request. Returns nil if no valid cache entry exists.

func (*CacheManager) Set ¶

func (m *CacheManager) Set(ctx context.Context, req *provider.ChatCompletionRequest, resp *provider.ChatCompletionResponse) error

Set stores a response in the cache for the given request.

func (*CacheManager) ShouldCache ¶

func (m *CacheManager) ShouldCache(req *provider.ChatCompletionRequest) bool

ShouldCache determines if a request should be cached. Returns false for streaming requests (if configured), non-cacheable models, etc.

type CacheStats ¶

type CacheStats struct {
	Hits   int64
	Misses int64
}

CacheStats contains statistics about cache usage

type Capabilities ¶

type Capabilities struct {
	// Tools indicates support for tool/function calling.
	Tools bool

	// Streaming indicates support for streaming responses.
	Streaming bool

	// Vision indicates support for image inputs in messages.
	Vision bool

	// JSON indicates support for JSON response format.
	JSON bool

	// SystemRole indicates support for system messages.
	SystemRole bool

	// MaxContextWindow is the maximum context window size in tokens.
	MaxContextWindow int

	// SupportsMaxTokens indicates if the provider supports the max_tokens parameter.
	SupportsMaxTokens bool
}

Capabilities describes the features supported by a provider. Thick providers can implement a Capabilities() method returning this struct. Note: This is not part of the Provider interface but useful for feature detection.

type ChatClient ¶

type ChatClient struct {
	// contains filtered or unexported fields
}

ChatClient is the main client interface that wraps a Provider

func NewClient ¶

func NewClient(config ClientConfig) (*ChatClient, error)

NewClient creates a new ChatClient based on the provider

func (*ChatClient) AppendMessage ¶

func (c *ChatClient) AppendMessage(ctx context.Context, sessionID string, message provider.Message) error

AppendMessage appends a message to a conversation in memory

func (*ChatClient) Cache ¶

func (c *ChatClient) Cache() *CacheManager

Cache returns the cache manager (nil if not configured)

func (*ChatClient) Close ¶

func (c *ChatClient) Close() error

Close closes the client

func (*ChatClient) CreateChatCompletion ¶

func (c *ChatClient) CreateChatCompletion(ctx context.Context, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error)

CreateChatCompletion creates a chat completion

func (*ChatClient) CreateChatCompletionStream ¶

func (c *ChatClient) CreateChatCompletionStream(ctx context.Context, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error)

CreateChatCompletionStream creates a streaming chat completion

func (*ChatClient) CreateChatCompletionStreamWithMemory ¶

func (c *ChatClient) CreateChatCompletionStreamWithMemory(ctx context.Context, sessionID string, req *provider.ChatCompletionRequest) (provider.ChatCompletionStream, error)

CreateChatCompletionStreamWithMemory creates a streaming chat completion using conversation memory

func (*ChatClient) CreateChatCompletionWithMemory ¶

func (c *ChatClient) CreateChatCompletionWithMemory(ctx context.Context, sessionID string, req *provider.ChatCompletionRequest) (*provider.ChatCompletionResponse, error)

CreateChatCompletionWithMemory creates a chat completion using conversation memory

func (*ChatClient) CreateConversationWithSystemMessage ¶

func (c *ChatClient) CreateConversationWithSystemMessage(ctx context.Context, sessionID, systemMessage string) error

CreateConversationWithSystemMessage creates a new conversation with a system message

func (*ChatClient) DeleteConversation ¶

func (c *ChatClient) DeleteConversation(ctx context.Context, sessionID string) error

DeleteConversation removes a conversation from memory

func (*ChatClient) GetConversationMessages ¶

func (c *ChatClient) GetConversationMessages(ctx context.Context, sessionID string) ([]provider.Message, error)

GetConversationMessages retrieves messages from a conversation

func (*ChatClient) HasCache ¶

func (c *ChatClient) HasCache() bool

HasCache returns true if caching is configured

func (*ChatClient) HasMemory ¶

func (c *ChatClient) HasMemory() bool

HasMemory returns true if memory is configured

func (*ChatClient) LoadConversation ¶

func (c *ChatClient) LoadConversation(ctx context.Context, sessionID string) (*ConversationMemory, error)

LoadConversation loads a conversation from memory

func (*ChatClient) Logger ¶

func (c *ChatClient) Logger() *slog.Logger

Logger returns the client's logger

func (*ChatClient) Memory ¶

func (c *ChatClient) Memory() *MemoryManager

Memory returns the memory manager (nil if not configured)

func (*ChatClient) Provider ¶

func (c *ChatClient) Provider() provider.Provider

Provider returns the underlying provider

func (*ChatClient) SaveConversation ¶

func (c *ChatClient) SaveConversation(ctx context.Context, conversation *ConversationMemory) error

SaveConversation saves a conversation to memory

func (*ChatClient) TokenEstimator ¶

func (c *ChatClient) TokenEstimator() TokenEstimator

TokenEstimator returns the token estimator (nil if not configured)

type ChatCompletionChoice ¶

type ChatCompletionChoice = provider.ChatCompletionChoice

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type ChatCompletionChunk ¶

type ChatCompletionChunk = provider.ChatCompletionChunk

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type ChatCompletionRequest ¶

type ChatCompletionRequest = provider.ChatCompletionRequest

Request/Response types

type ChatCompletionResponse ¶

type ChatCompletionResponse = provider.ChatCompletionResponse

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type ChatCompletionStream ¶

type ChatCompletionStream = provider.ChatCompletionStream

ChatCompletionStream is an alias to the provider.ChatCompletionStream interface for backward compatibility

type CircuitBreaker ¶

type CircuitBreaker struct {
	// contains filtered or unexported fields
}

CircuitBreaker implements the circuit breaker pattern for provider health tracking

func NewCircuitBreaker ¶

func NewCircuitBreaker(config CircuitBreakerConfig) *CircuitBreaker

NewCircuitBreaker creates a new circuit breaker with the given configuration. If config has zero values, defaults are used for those fields.

func (*CircuitBreaker) AllowRequest ¶

func (cb *CircuitBreaker) AllowRequest() bool

AllowRequest returns true if the request should be allowed to proceed. In closed state, always allows. In open state, allows only after timeout. In half-open state, allows a limited number of test requests.

func (*CircuitBreaker) RecordFailure ¶

func (cb *CircuitBreaker) RecordFailure()

RecordFailure records a failed request. May open the circuit if thresholds are exceeded.

func (*CircuitBreaker) RecordSuccess ¶

func (cb *CircuitBreaker) RecordSuccess()

RecordSuccess records a successful request. In half-open state, may close the circuit if enough successes.

func (*CircuitBreaker) Reset ¶

func (cb *CircuitBreaker) Reset()

Reset resets the circuit breaker to closed state with cleared counters

func (*CircuitBreaker) State ¶

func (cb *CircuitBreaker) State() CircuitState

State returns the current state of the circuit breaker

func (*CircuitBreaker) Stats ¶

func (cb *CircuitBreaker) Stats() CircuitBreakerStats

Stats returns current statistics for monitoring

type CircuitBreakerConfig ¶

type CircuitBreakerConfig struct {
	// FailureThreshold is the number of consecutive failures before opening the circuit.
	// Default: 5
	FailureThreshold int

	// SuccessThreshold is the number of consecutive successes in half-open state
	// required to close the circuit.
	// Default: 2
	SuccessThreshold int

	// Timeout is how long to wait in open state before transitioning to half-open.
	// Default: 30 seconds
	Timeout time.Duration

	// FailureRateThreshold triggers circuit open when the failure rate exceeds this value (0-1).
	// Only evaluated after MinimumRequests is reached.
	// Default: 0.5 (50%)
	FailureRateThreshold float64

	// MinimumRequests is the minimum number of requests before failure rate is evaluated.
	// Default: 10
	MinimumRequests int
}

CircuitBreakerConfig configures circuit breaker behavior

func DefaultCircuitBreakerConfig ¶

func DefaultCircuitBreakerConfig() CircuitBreakerConfig

DefaultCircuitBreakerConfig returns a CircuitBreakerConfig with sensible defaults

type CircuitBreakerStats ¶

type CircuitBreakerStats struct {
	State                CircuitState
	ConsecutiveFailures  int
	ConsecutiveSuccesses int
	TotalRequests        int
	TotalFailures        int
	FailureRate          float64
	LastFailure          time.Time
	LastStateChange      time.Time
}

CircuitBreakerStats contains statistics about the circuit breaker

type CircuitOpenError ¶

type CircuitOpenError struct {
	Provider    string
	State       CircuitState
	LastFailure time.Time
	RetryAfter  time.Duration
}

CircuitOpenError is returned when a request is rejected due to open circuit

func (*CircuitOpenError) Error ¶

func (e *CircuitOpenError) Error() string

type CircuitState ¶

type CircuitState int

CircuitState represents the state of a circuit breaker

const (
	// CircuitClosed indicates normal operation - requests pass through
	CircuitClosed CircuitState = iota
	// CircuitOpen indicates the circuit is open - requests fail fast
	CircuitOpen
	// CircuitHalfOpen indicates the circuit is testing recovery
	CircuitHalfOpen
)

func (CircuitState) String ¶

func (s CircuitState) String() string

String returns the string representation of the circuit state

type ClientConfig ¶

type ClientConfig struct {
	// Providers is an ordered list of providers. Index 0 is the primary provider,
	// and indices 1+ are fallback providers tried in order on retryable errors.
	// This is the preferred way to configure providers.
	//
	// Example:
	//   Providers: []ProviderConfig{
	//       {Provider: ProviderNameOpenAI, APIKey: "openai-key"},      // Primary
	//       {Provider: ProviderNameAnthropic, APIKey: "anthropic-key"}, // Fallback 1
	//       {Provider: ProviderNameGemini, APIKey: "gemini-key"},       // Fallback 2
	//   }
	//
	// For custom providers, use CustomProvider field in ProviderConfig:
	//   Providers: []ProviderConfig{
	//       {CustomProvider: myCustomProvider},
	//   }
	Providers []ProviderConfig

	// CircuitBreakerConfig configures circuit breaker behavior for fallback providers.
	// If nil (default), circuit breaker is disabled.
	// When enabled, providers that fail repeatedly are temporarily skipped.
	CircuitBreakerConfig *CircuitBreakerConfig

	// Memory configuration (optional)
	Memory       kvs.Client
	MemoryConfig *MemoryConfig

	// ObservabilityHook is called before/after LLM calls (optional)
	ObservabilityHook ObservabilityHook

	// Logger for internal logging (optional, defaults to null logger)
	Logger *slog.Logger

	// TokenEstimator enables pre-flight token estimation (optional).
	// Use NewTokenEstimator() to create one with custom configuration.
	TokenEstimator TokenEstimator

	// ValidateTokens enables automatic token validation before requests.
	// When true and TokenEstimator is set, requests that would exceed
	// the model's context window are rejected with TokenLimitError.
	// Default: false
	ValidateTokens bool

	// Cache is the KVS client for response caching (optional).
	// If provided, identical requests will return cached responses.
	// Uses the same kvs.Client interface as Memory.
	Cache kvs.Client

	// CacheConfig configures response caching behavior.
	// If nil, DefaultCacheConfig() is used when Cache is provided.
	CacheConfig *CacheConfig
}

ClientConfig holds configuration for creating a client

type ConversationMemory ¶

type ConversationMemory struct {
	SessionID string         `json:"session_id"`
	Messages  []Message      `json:"messages"`
	CreatedAt time.Time      `json:"created_at"`
	UpdatedAt time.Time      `json:"updated_at"`
	Metadata  map[string]any `json:"metadata,omitempty"`
}

ConversationMemory represents stored conversation data

type EmbeddingProviderFactory ¶ added in v0.16.0

type EmbeddingProviderFactory func(config ProviderConfig) (provider.EmbeddingProvider, error)

EmbeddingProviderFactory is a function that creates an embedding provider from config.

func GetEmbeddingProviderFactory ¶ added in v0.16.0

func GetEmbeddingProviderFactory(name ProviderName) EmbeddingProviderFactory

GetEmbeddingProviderFactory returns the registered factory for the given provider name. Returns nil if no embedding provider is registered with that name.

type ErrorCategory ¶

type ErrorCategory int

ErrorCategory classifies errors for retry/fallback logic

const (
	// ErrorCategoryUnknown indicates the error type could not be determined
	ErrorCategoryUnknown ErrorCategory = iota
	// ErrorCategoryRetryable indicates the error is transient and the request can be retried
	// Examples: rate limits (429), server errors (5xx), network errors
	ErrorCategoryRetryable
	// ErrorCategoryNonRetryable indicates the error is permanent and retrying won't help
	// Examples: auth errors (401/403), invalid requests (400), not found (404)
	ErrorCategoryNonRetryable
)

func ClassifyError ¶

func ClassifyError(err error) ErrorCategory

ClassifyError determines the category of an error for retry/fallback decisions

func (ErrorCategory) String ¶

func (c ErrorCategory) String() string

String returns the string representation of the error category

type FallbackAttempt ¶

type FallbackAttempt struct {
	// Provider is the name of the provider that was tried
	Provider string

	// Error is the error returned, or nil on success
	Error error

	// Duration is how long the attempt took
	Duration time.Duration

	// Skipped indicates the provider was skipped (e.g., circuit open)
	Skipped bool
}

FallbackAttempt records information about a single fallback attempt

type FallbackError ¶

type FallbackError struct {
	// Attempts contains information about each provider attempt
	Attempts []FallbackAttempt

	// LastError is the last error encountered
	LastError error
}

FallbackError is returned when all providers fail

func (*FallbackError) Error ¶

func (e *FallbackError) Error() string

func (*FallbackError) Unwrap ¶

func (e *FallbackError) Unwrap() error

type FallbackProvider ¶

type FallbackProvider struct {
	// contains filtered or unexported fields
}

FallbackProvider wraps multiple providers with fallback logic. It implements provider.Provider and tries providers in order until one succeeds.

func NewFallbackProvider ¶

func NewFallbackProvider(
	primary provider.Provider,
	fallbacks []provider.Provider,
	config *FallbackProviderConfig,
) *FallbackProvider

NewFallbackProvider creates a provider that tries fallbacks on failure. The primary provider is tried first, then fallbacks in order.

func (*FallbackProvider) CircuitBreaker ¶

func (fp *FallbackProvider) CircuitBreaker(providerName string) *CircuitBreaker

CircuitBreaker returns the circuit breaker for a provider, or nil if not configured

func (*FallbackProvider) Close ¶

func (fp *FallbackProvider) Close() error

Close closes all providers

func (*FallbackProvider) CreateChatCompletion ¶

func (fp *FallbackProvider) CreateChatCompletion(
	ctx context.Context,
	req *provider.ChatCompletionRequest,
) (*provider.ChatCompletionResponse, error)

CreateChatCompletion tries the primary provider first, then fallbacks on retryable errors.

func (*FallbackProvider) CreateChatCompletionStream ¶

func (fp *FallbackProvider) CreateChatCompletionStream(
	ctx context.Context,
	req *provider.ChatCompletionRequest,
) (provider.ChatCompletionStream, error)

CreateChatCompletionStream tries the primary provider first, then fallbacks on retryable errors.

func (*FallbackProvider) FallbackProviders ¶

func (fp *FallbackProvider) FallbackProviders() []provider.Provider

FallbackProviders returns the fallback providers

func (*FallbackProvider) Name ¶

func (fp *FallbackProvider) Name() string

Name returns a composite name indicating fallback configuration

func (*FallbackProvider) PrimaryProvider ¶

func (fp *FallbackProvider) PrimaryProvider() provider.Provider

PrimaryProvider returns the primary provider

type FallbackProviderConfig ¶

type FallbackProviderConfig struct {
	// CircuitBreakerConfig configures circuit breaker behavior.
	// If nil, circuit breaker is disabled.
	CircuitBreakerConfig *CircuitBreakerConfig

	// Logger for logging fallback events
	Logger *slog.Logger
}

FallbackProviderConfig configures the fallback provider behavior

type LLMCallInfo ¶

type LLMCallInfo struct {
	CallID       string    // Unique identifier for correlating BeforeRequest/AfterResponse
	ProviderName string    // e.g., "openai", "anthropic"
	StartTime    time.Time // When the call started
}

LLMCallInfo provides metadata about the LLM call for observability

type MemoryConfig ¶

type MemoryConfig struct {
	// MaxMessages limits the number of messages to keep in memory per session
	MaxMessages int
	// TTL sets the time-to-live for stored conversations (0 for no expiration)
	TTL time.Duration
	// KeyPrefix allows customizing the key prefix for stored conversations
	KeyPrefix string
}

MemoryConfig holds configuration for conversation memory

func DefaultMemoryConfig ¶

func DefaultMemoryConfig() MemoryConfig

DefaultMemoryConfig returns sensible defaults for memory configuration

type MemoryManager ¶

type MemoryManager struct {
	// contains filtered or unexported fields
}

MemoryManager handles conversation persistence using KVS

func NewMemoryManager ¶

func NewMemoryManager(kvsClient kvs.Client, config MemoryConfig) *MemoryManager

NewMemoryManager creates a new memory manager with the given KVS client and config

func (*MemoryManager) AppendMessage ¶

func (m *MemoryManager) AppendMessage(ctx context.Context, sessionID string, message Message) error

AppendMessage adds a message to the conversation and saves it

func (*MemoryManager) AppendMessages ¶

func (m *MemoryManager) AppendMessages(ctx context.Context, sessionID string, messages []Message) error

AppendMessages adds multiple messages to the conversation and saves it

func (*MemoryManager) CreateConversationWithSystemMessage ¶

func (m *MemoryManager) CreateConversationWithSystemMessage(ctx context.Context, sessionID, systemMessage string) error

CreateConversationWithSystemMessage creates a new conversation with a system message

func (*MemoryManager) DeleteConversation ¶

func (m *MemoryManager) DeleteConversation(ctx context.Context, sessionID string) error

DeleteConversation removes a conversation from memory

func (*MemoryManager) GetMessages ¶

func (m *MemoryManager) GetMessages(ctx context.Context, sessionID string) ([]Message, error)

GetMessages returns just the messages from a conversation

func (*MemoryManager) LoadConversation ¶

func (m *MemoryManager) LoadConversation(ctx context.Context, sessionID string) (*ConversationMemory, error)

LoadConversation retrieves a conversation from memory

func (*MemoryManager) SaveConversation ¶

func (m *MemoryManager) SaveConversation(ctx context.Context, conversation *ConversationMemory) error

SaveConversation stores a conversation in memory

func (*MemoryManager) SetMetadata ¶

func (m *MemoryManager) SetMetadata(ctx context.Context, sessionID string, metadata map[string]any) error

SetMetadata sets metadata for a conversation

type Message ¶

type Message = provider.Message

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type ModelInfo ¶

type ModelInfo struct {
	ID        string       `json:"id"`
	Provider  ProviderName `json:"provider"`
	Name      string       `json:"name"`
	MaxTokens int          `json:"max_tokens"`
}

ModelInfo represents information about a model

func GetModelInfo ¶

func GetModelInfo(modelID string) *ModelInfo

GetModelInfo returns model information

type ObservabilityHook ¶

type ObservabilityHook interface {
	// BeforeRequest is called before each LLM call.
	// Returns a new context for trace/span propagation.
	// The hook should not modify the request.
	BeforeRequest(ctx context.Context, info LLMCallInfo, req *provider.ChatCompletionRequest) context.Context

	// AfterResponse is called after each LLM call completes.
	// This is called for both successful and failed requests.
	AfterResponse(ctx context.Context, info LLMCallInfo, req *provider.ChatCompletionRequest, resp *provider.ChatCompletionResponse, err error)

	// WrapStream wraps a stream for observability.
	// This allows the hook to observe streaming responses.
	// The returned stream must implement the same interface as the input.
	//
	// Note: For streaming, AfterResponse is only called if stream creation fails.
	// To track streaming completion timing and content, the wrapper returned here
	// should handle Close() or detect EOF in Recv() to finalize metrics/traces.
	WrapStream(ctx context.Context, info LLMCallInfo, req *provider.ChatCompletionRequest, stream provider.ChatCompletionStream) provider.ChatCompletionStream
}

ObservabilityHook allows external packages to observe LLM calls. Implementations can use this to add tracing, logging, or metrics without modifying the core OmniLLM library.

type Provider ¶

type Provider = provider.Provider

Provider is an alias to the provider.Provider interface for backward compatibility

type ProviderConfig ¶

type ProviderConfig struct {
	// Provider is the provider type (e.g., ProviderNameOpenAI).
	// Ignored if CustomProvider is set.
	Provider ProviderName

	// APIKey is the API key for the provider
	APIKey string //nolint:gosec // G117: config field for API key, not a hardcoded credential

	// BaseURL is an optional custom base URL
	BaseURL string

	// Region is for providers that require a region (e.g., AWS Bedrock)
	Region string

	// Timeout sets the HTTP client timeout for this provider
	Timeout time.Duration

	// HTTPClient is an optional custom HTTP client
	HTTPClient *http.Client

	// Extra holds provider-specific configuration
	Extra map[string]any

	// CustomProvider allows injecting a custom provider implementation.
	// When set, Provider, APIKey, BaseURL, etc. are ignored.
	CustomProvider provider.Provider
}

ProviderConfig holds configuration for a single provider instance. Used in the Providers slice where index 0 is primary and 1+ are fallbacks.

type ProviderFactory ¶

type ProviderFactory func(config ProviderConfig) (provider.Provider, error)

ProviderFactory is a function that creates a provider from config.

func GetProviderFactory ¶

func GetProviderFactory(name ProviderName) ProviderFactory

GetProviderFactory returns the registered factory for the given provider name. Returns nil if no provider is registered with that name.

type ProviderName ¶

type ProviderName string

ProviderName represents the different LLM provider names

const (
	ProviderNameOpenAI    ProviderName = "openai"
	ProviderNameAnthropic ProviderName = "anthropic"
	ProviderNameBedrock   ProviderName = "bedrock"
	ProviderNameOllama    ProviderName = "ollama"
	ProviderNameGemini    ProviderName = "gemini"
	ProviderNameXAI       ProviderName = "xai"
	ProviderNameKimi      ProviderName = "kimi"
	ProviderNameGLM       ProviderName = "glm"
	ProviderNameQwen      ProviderName = "qwen"
)

func ListEmbeddingProviders ¶ added in v0.16.0

func ListEmbeddingProviders() []ProviderName

ListEmbeddingProviders returns a list of all registered embedding provider names.

func ListRegisteredProviders ¶

func ListRegisteredProviders() []ProviderName

ListRegisteredProviders returns a list of all registered provider names.

type ResponseFormat ¶

type ResponseFormat = provider.ResponseFormat

Response format

type Role ¶

type Role = provider.Role

Message types

type TokenEstimator ¶

type TokenEstimator interface {
	// EstimateTokens estimates the token count for a set of messages.
	// The estimate may not be exact but should be reasonably close.
	EstimateTokens(model string, messages []provider.Message) (int, error)

	// GetContextWindow returns the maximum context window size for a model.
	// Returns 0 if the model is unknown.
	GetContextWindow(model string) int
}

TokenEstimator estimates token counts for messages before sending to the API. This is useful for validating requests won't exceed model limits.

func NewTokenEstimator ¶

func NewTokenEstimator(config TokenEstimatorConfig) TokenEstimator

NewTokenEstimator creates a new token estimator with the given configuration. If config has zero values, defaults are used for those fields.

type TokenEstimatorConfig ¶

type TokenEstimatorConfig struct {
	// CharactersPerToken is the average number of characters per token.
	// Default: 4.0 (reasonable for English text)
	// Lower values (e.g., 3.0) give more conservative estimates.
	CharactersPerToken float64

	// CustomContextWindows allows overriding context window sizes for specific models.
	// Keys should be model IDs (e.g., "gpt-4o", "claude-3-opus").
	CustomContextWindows map[string]int

	// TokenOverheadPerMessage is extra tokens added per message for formatting.
	// Default: 4 (accounts for role, separators, etc.)
	TokenOverheadPerMessage int
}

TokenEstimatorConfig configures token estimation behavior

func DefaultTokenEstimatorConfig ¶

func DefaultTokenEstimatorConfig() TokenEstimatorConfig

DefaultTokenEstimatorConfig returns a TokenEstimatorConfig with sensible defaults

type TokenLimitError ¶

type TokenLimitError struct {
	// EstimatedTokens is the estimated prompt token count
	EstimatedTokens int

	// ContextWindow is the model's maximum context window
	ContextWindow int

	// AvailableTokens is how many tokens are available (may be negative)
	AvailableTokens int

	// Model is the model ID
	Model string
}

TokenLimitError is returned when a request exceeds token limits

func (*TokenLimitError) Error ¶

func (e *TokenLimitError) Error() string

type TokenValidation ¶

type TokenValidation struct {
	// EstimatedTokens is the estimated prompt token count
	EstimatedTokens int

	// ContextWindow is the model's maximum context window
	ContextWindow int

	// MaxCompletionTokens is the requested max completion tokens
	MaxCompletionTokens int

	// AvailableTokens is how many tokens are available for completion
	// (ContextWindow - EstimatedTokens)
	AvailableTokens int

	// ExceedsLimit is true if the prompt exceeds the context window
	ExceedsLimit bool

	// ExceedsWithCompletion is true if prompt + max_tokens exceeds context
	ExceedsWithCompletion bool
}

TokenValidation contains the result of token validation

func ValidateTokens ¶

func ValidateTokens(
	estimator TokenEstimator,
	model string,
	messages []provider.Message,
	maxCompletionTokens int,
) (*TokenValidation, error)

ValidateTokens checks if the request fits within model limits. Returns validation details including whether limits are exceeded.

type Tool ¶

type Tool = provider.Tool

Tool types

type ToolCall ¶

type ToolCall = provider.ToolCall

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type ToolFunction ¶

type ToolFunction = provider.ToolFunction

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type ToolSpec ¶

type ToolSpec = provider.ToolSpec

Type aliases for backward compatibility and convenience. These allow thick providers to import from omnillm-core root package. Note: Provider and ChatCompletionStream are defined in provider.go

type Usage ¶

type Usage = provider.Usage

Usage

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
anthropic_streaming command
architecture_demo command
basic command
conversation command
custom_provider command
memory_demo command
ollama command
ollama_streaming command
providers_demo command
streaming command
xai command
models Package models provides a comprehensive catalog of LLM model identifiers and documentation references for all supported providers.	Package models provides a comprehensive catalog of LLM model identifiers and documentation references for all supported providers.
provider Package provider defines the core interfaces that external LLM providers must implement.	Package provider defines the core interfaces that external LLM providers must implement.
providertest Package providertest provides conformance tests for LLM provider implementations.	Package providertest provides conformance tests for LLM provider implementations.
providers
anthropic Package anthropic provides Anthropic provider adapter for the OmniLLM unified interface	Package anthropic provides Anthropic provider adapter for the OmniLLM unified interface
glm Package glm provides GLM (Zhipu AI) provider adapter for the OmniLLM unified interface	Package glm provides GLM (Zhipu AI) provider adapter for the OmniLLM unified interface
kimi Package kimi provides Kimi (Moonshot AI) provider adapter for the OmniLLM unified interface	Package kimi provides Kimi (Moonshot AI) provider adapter for the OmniLLM unified interface
ollama Package ollama provides Ollama provider adapter for the OmniLLM unified interface	Package ollama provides Ollama provider adapter for the OmniLLM unified interface
openai Package openai provides OpenAI provider adapter for the OmniLLM unified interface	Package openai provides OpenAI provider adapter for the OmniLLM unified interface
qwen Package qwen provides Qwen (Alibaba Cloud) provider adapter for the OmniLLM unified interface	Package qwen provides Qwen (Alibaba Cloud) provider adapter for the OmniLLM unified interface
xai Package xai provides X.AI Grok provider adapter for the OmniLLM unified interface	Package xai provides X.AI Grok provider adapter for the OmniLLM unified interface
testing Package testing provides mock implementations for testing	Package testing provides mock implementations for testing

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

OmniLLM - Unified Go SDK for Large Language Models

✨ Features

🏗️ Architecture

Key Architecture Benefits

🚀 Quick Start

Installation

Basic Usage

🔧 Supported Providers

OpenAI

Anthropic (Claude)

Google Gemini

AWS Bedrock (External Provider)

X.AI (Grok)

GLM (Zhipu AI)

Kimi (Moonshot AI)

Qwen (Alibaba Cloud)

Ollama (Local Models)

🔌 External Providers

Using External Providers

Creating Your Own External Provider

📡 Streaming Example

🔤 Embeddings

Supported Embedding Models

Custom Dimensions

🧠 Conversation Memory

Memory Configuration

Memory-Aware Completions

Memory Management

KVS Backend Support

📊 Observability Hooks

ObservabilityHook Interface

Basic Usage

OpenTelemetry Integration Example

Key Benefits

🔀 Fallback Providers

Basic Usage

With Circuit Breaker

Error Classification

⚡ Circuit Breaker

States

Configuration

🔢 Token Estimation

Basic Usage

Automatic Validation

Built-in Context Windows

Custom Configuration

💾 Response Caching

Basic Usage

Cache Configuration

Cache Key Generation

🔄 Provider Switching

🧪 Testing

Running Tests

Test Coverage

Writing Tests

Conditional Integration Tests

Mock KVS for Memory Testing

📚 Examples

🔧 Configuration

Environment Variables

Advanced Configuration

Request Parameters

Logging Configuration

Context-Aware Logging

Retry with Backoff

🏗️ Adding New Providers

🎯 3rd Party Providers (Recommended)

Step 1: Create Your Provider Package

Step 2: Use Your Provider

🔧 Built-in Providers (For Core Contributors)

🎯 Why This Architecture?

📊 Model Support

🚨 Error Handling

🤝 Contributing

Adding Tests

📄 License

🔗 Related Projects

Documentation ¶

Index ¶