model

package
v0.7.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 28, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultContextLimitFallback = 200000

DefaultContextLimitFallback is the conservative context window assumed when a model's true limit cannot be determined from the registry, built-in tables, or user config. Kept deliberately small so unknown models compact early rather than overflowing the provider's real window. Override per-model via config.ContextLimits or globally via config.DefaultContextLimit.

Variables

View Source
var TokenTracker = &TokenUsage{}

TokenTracker is a global token usage tracker

Functions

func FormatAPIError

func FormatAPIError(err error, attempt, maxRetries int) string

FormatAPIError produces a user-friendly error message with retry context.

func GetModelContextLimit

func GetModelContextLimit(modelName string) int

GetModelContextLimit returns the known context limit for a given model name. Returns 0 if the model is not in the known list.

func IsRetryable

func IsRetryable(_ context.Context, err error) bool

IsRetryable returns true if the error should be retried. It is designed to be used as ModelRetryConfig.IsRetryAble in the Eino framework.

Context overflow errors are NOT retryable — they need compaction. Auth errors are NOT retryable — they need user action.

func ListProviderModelsLive added in v0.7.2

func ListProviderModelsLive(ctx context.Context, apiKey, baseURL string, headers map[string]string) []string

ListProviderModelsLive queries a provider's live /models endpoint and returns the advertised model ids. It is best-effort: any failure (network, auth, non-JSON body) yields an empty slice, since the catalog UI degrades to "no catalog, add manually" rather than erroring.

func NewChatModelFromProvider added in v0.7.2

func NewChatModelFromProvider(ctx context.Context, modelName, baseURL string, pc *config.ProviderConfig) (einomodel.ToolCallingChatModel, error)

NewChatModelFromProvider builds a ChatModel from a provider config, applying its advanced settings (custom headers, thinking depth, explicit thinking toggle, and the vision capability — which defaults to enabled). baseURL is the already-resolved endpoint (config override or registry default). This is the single place that maps ProviderConfig → ChatModelConfig so every entrypoint (web, TUI, ACP, subagents) honors the same settings.

func ParseProviderModel

func ParseProviderModel(s string) (provider, model string, err error)

ParseProviderModel splits "provider/model" into its components.

func ParseRetryAfter

func ParseRetryAfter(err error) time.Duration

ParseRetryAfter extracts a delay from an error message or OpenAI APIError. It looks for Retry-After header patterns in the error text. Returns 0 if no delay information is found.

func ResolveContextLimit added in v0.5.1

func ResolveContextLimit(reg *ModelRegistry, cfg *config.Config, providerID, modelID string) int

ResolveContextLimit determines the effective context window (in tokens) for a provider/model pair. This is the single source of truth for window-size management — all middleware thresholds (compaction, summarization, reduction, reminders) derive from it.

Resolution order (first positive hit wins):

  1. explicit user override: cfg.ContextLimits["provider/model"], then cfg.ContextLimits["model"]
  2. models.dev registry metadata (reg.GetModelContextLimit)
  3. built-in knownModels fallback table (GetModelContextLimit)
  4. cfg.DefaultContextLimit, else DefaultContextLimitFallback

reg and cfg may be nil; the resolver degrades gracefully.

func SmartBackoff

func SmartBackoff(ctx context.Context, attempt int) time.Duration

SmartBackoff returns a delay for the given retry attempt, respecting server-sent Retry-After hints when available. It is designed to be used as ModelRetryConfig.BackoffFunc in the Eino framework.

Strategy (matching Claude-Code & OpenCode patterns):

  1. If the error contains a Retry-After hint, use it (capped at 5 min).
  2. Otherwise fall back to exponential backoff: 500ms × 2^(attempt-1), capped at 32s, plus 0-25% random jitter.

func UsageNotifierFromContext added in v0.6.3

func UsageNotifierFromContext(ctx context.Context) func()

UsageNotifierFromContext retrieves the per-call usage notifier, if any.

func ValidateProvider added in v0.3.10

func ValidateProvider(ctx context.Context, apiKey, baseURL string, headers map[string]string) error

ValidateProvider tests connectivity to a provider by making a lightweight GET /models request. Custom headers (if any) are applied last so they can override the default Authorization for gateways. Returns nil on success, or a descriptive error.

This thin wrapper preserves the original signature for any caller that only cares about success/failure. Callers needing latency, model count, or error classification should use ValidateProviderDetailed instead.

func WithRetryError

func WithRetryError(ctx context.Context, err error) context.Context

WithRetryError stores an error in context for BackoffFunc to inspect.

func WithTokenTracker

func WithTokenTracker(ctx context.Context, t *TokenUsage) context.Context

WithTokenTracker attaches a per-agent TokenUsage to the context. chatModel.Generate/Stream will increment this tracker in addition to the global TokenTracker.

func WithUsageNotifier added in v0.6.3

func WithUsageNotifier(ctx context.Context, fn func()) context.Context

WithUsageNotifier attaches a callback that chatModel.Generate/Stream invokes after each API call's usage has been recorded. UIs use it to refresh the token/context display in real time during a run, not just at turn end. The model layer stays provider/UI-agnostic — it only fires the opaque callback.

Types

type APIErrorCategory

type APIErrorCategory int

APIErrorCategory classifies LLM API errors into actionable categories.

const (
	// ErrCategoryTransient — network blips, timeouts, 5xx; safe to retry.
	ErrCategoryTransient APIErrorCategory = iota
	// ErrCategoryRateLimit — 429 / "overloaded"; retry with back-off.
	ErrCategoryRateLimit
	// ErrCategoryContextOverflow — input too long; needs compaction, NOT retry.
	ErrCategoryContextOverflow
	// ErrCategoryAuth — 401/403; permanent until key is fixed.
	ErrCategoryAuth
	// ErrCategoryFatal — 400 bad request, unknown; do not retry.
	ErrCategoryFatal
)

func ClassifyError

func ClassifyError(err error) APIErrorCategory

ClassifyError determines the category of an API error.

func (APIErrorCategory) String

func (c APIErrorCategory) String() string

type AddParams added in v0.6.3

type AddParams struct {
	Prompt     int
	Completion int
	Total      int
	Cached     int
	Reasoning  int
	CacheWrite int
	// CacheDetailsPresent is true when the provider returned a
	// prompt_tokens_details object at all (even with cached_tokens:0), letting
	// CacheObserved tell "supports caching, 0 hits" apart from "never reports
	// caching". See https://platform.openai.com/docs/guides/prompt-caching.
	CacheDetailsPresent bool
}

AddParams carries one API call's token usage. Using a struct keeps the growing set of token categories from turning Add into a long positional list.

type ChatModelConfig

type ChatModelConfig struct {
	Model   string
	APIKey  string
	BaseURL string
	// Headers are extra HTTP headers injected into every request to the
	// provider endpoint (custom gateways, auth proxies). Empty ⇒ none.
	Headers map[string]string
	// ReasoningEffort sets thinking depth via the "reasoning_effort" parameter:
	// "", "low", "medium", or "high". Empty ⇒ parameter omitted.
	ReasoningEffort string
	// Thinking, when non-nil, sends chat_template_kwargs {"enable_thinking": v}
	// to explicitly toggle extended reasoning on compatible gateways.
	Thinking *bool
	// Vision controls whether image parts are forwarded to the model. When
	// false, multimodal image content is stripped to text before sending.
	Vision bool
}

type ContextOverflowInfo

type ContextOverflowInfo struct {
	ActualTokens int
	LimitTokens  int
	TokenGap     int // ActualTokens - LimitTokens
}

ContextOverflowInfo holds parsed token counts from an overflow error.

func ParseContextOverflow

func ParseContextOverflow(err error) *ContextOverflowInfo

ParseContextOverflow extracts token counts from a context overflow error. Returns nil if the error is not a context overflow or counts cannot be parsed.

type ModelCost

type ModelCost struct {
	Input      float64 `json:"input"`
	Output     float64 `json:"output"`
	CacheRead  float64 `json:"cache_read,omitempty"`
	CacheWrite float64 `json:"cache_write,omitempty"`
}

ModelCost describes per-token costs in USD per 1M tokens.

type ModelFactory

type ModelFactory struct {
	// contains filtered or unexported fields
}

ModelFactory creates and caches ChatModel instances by "provider/model" identifier.

func NewModelFactory

func NewModelFactory(cfg *config.Config, fallback einomodel.ToolCallingChatModel) *ModelFactory

NewModelFactory creates a model factory with the given config, fallback model, and registry.

func (*ModelFactory) Fallback

Fallback returns the default fallback model.

func (*ModelFactory) GetModel

func (f *ModelFactory) GetModel(ctx context.Context, providerModel string) (einomodel.ToolCallingChatModel, error)

GetModel returns a ChatModel for the given "provider/model" identifier. Empty string returns the fallback model.

func (*ModelFactory) Registry

func (f *ModelFactory) Registry() *ModelRegistry

Registry returns the underlying ModelRegistry for metadata lookups.

type ModelInfo

type ModelInfo struct {
	ID           string
	ContextLimit int // Maximum context window size, 0 if unknown
	Pricing      ModelPricing
}

ModelInfo contains information about a model

type ModelLimit

type ModelLimit struct {
	Context int `json:"context"`
	Input   int `json:"input,omitempty"`
	Output  int `json:"output,omitempty"`
}

ModelLimit describes context window and output limits.

type ModelModalities

type ModelModalities struct {
	Input  []string `json:"input,omitempty"`
	Output []string `json:"output,omitempty"`
}

ModelModalities describes input/output modalities.

type ModelPricing

type ModelPricing struct {
	InputPer1M     float64 // cost per 1M input tokens
	OutputPer1M    float64 // cost per 1M output tokens
	CacheReadPer1M float64 // cost per 1M cache-read (cached input) tokens; 0 ⇒ no discount data, fall back to InputPer1M
}

ModelPricing contains cost information for a model.

type ModelRegistry

type ModelRegistry struct {
	// contains filtered or unexported fields
}

ModelRegistry provides model metadata from models.dev and custom config. The base data is statically generated at build time via go:generate. Custom models from config are merged in at runtime.

func NewModelRegistry

func NewModelRegistry() *ModelRegistry

NewModelRegistry creates a new ModelRegistry with a deep copy of generated data. Each RegistryProvider and its Models map are copied so that merging custom models at runtime never mutates the shared generatedProviders.

func NewModelRegistryWithConfig added in v0.4.8

func NewModelRegistryWithConfig(cfg *config.Config) *ModelRegistry

NewModelRegistryWithConfig creates a ModelRegistry and merges custom models from config.

func (*ModelRegistry) GetModelCacheCost added in v0.6.4

func (r *ModelRegistry) GetModelCacheCost(providerID, modelID string) (cacheReadPer1M, cacheWritePer1M float64)

GetModelCacheCost returns the cache-read and cache-write prices (USD per 1M tokens) for a model, or 0 when the registry has no cache pricing for it.

func (*ModelRegistry) GetModelContextLimit

func (r *ModelRegistry) GetModelContextLimit(providerID, modelID string) int

GetModelContextLimit returns the context limit for a model looked up via registry.

func (*ModelRegistry) GetModelCost

func (r *ModelRegistry) GetModelCost(providerID, modelID string) (inputPer1M, outputPer1M float64)

GetModelCost returns pricing info for a model.

func (*ModelRegistry) GetProvider

func (r *ModelRegistry) GetProvider(providerID string) *RegistryProvider

GetProvider returns provider info by ID, or nil if not found.

func (*ModelRegistry) GetProviderAPI

func (r *ModelRegistry) GetProviderAPI(providerID string) string

GetProviderAPI returns the API base URL for a provider from the registry.

func (*ModelRegistry) GetProviderEnvVars

func (r *ModelRegistry) GetProviderEnvVars(providerID string) []string

GetProviderEnvVars returns the environment variable names for a provider.

func (*ModelRegistry) HasProvider

func (r *ModelRegistry) HasProvider(providerID string) bool

HasProvider returns whether the given provider ID exists in the registry.

func (*ModelRegistry) ListProviderModels

func (r *ModelRegistry) ListProviderModels(providerID string, toolCallOnly bool) []*RegistryModel

ListProviderModels returns models for a provider from the registry. If toolCallOnly is true, only models with tool_call support are returned. Models are sorted by ID.

func (*ModelRegistry) ListProviders

func (r *ModelRegistry) ListProviders() []*RegistryProvider

ListProviders returns all providers in the curated display order.

func (*ModelRegistry) Load

func (r *ModelRegistry) Load() (map[string]*RegistryProvider, error)

Load returns the provider/model data.

func (*ModelRegistry) LookupModel

func (r *ModelRegistry) LookupModel(providerID, modelID string) (*RegistryProvider, *RegistryModel, bool)

LookupModel finds a model by "provider/model" identifier. Returns the provider info, model info, and whether it was found.

func (*ModelRegistry) MergeConfigProviders added in v0.4.8

func (r *ModelRegistry) MergeConfigProviders(providers map[string]*config.ProviderConfig)

MergeConfigProviders merges custom models from config providers into the registry. For providers not in the registry, a new entry is created. For existing providers, custom models are added (existing models are not overridden).

func (*ModelRegistry) PickDefaultModel added in v0.7.2

func (r *ModelRegistry) PickDefaultModel(providerID string) string

PickDefaultModel returns the best default model id for a provider, used when setup completes without an explicit model selection (the wizard no longer forces a model pick). Selection order: first DefaultEnabled model, then the first Recommended model, then simply the first model. Returns "" when the provider is unknown or has no models (e.g. a custom OpenAI-compatible provider) — callers must then require an explicit model id.

type ReasoningOption added in v0.7.2

type ReasoningOption struct {
	Type   string   `json:"type"`
	Values []string `json:"values,omitempty"`
	Min    *int     `json:"min,omitempty"`
	Max    *int     `json:"max,omitempty"`
}

ReasoningOption is one reasoning/thinking control a model supports, from models.dev's reasoning_options. Type is one of:

  • "effort" — Values lists the supported effort levels (e.g. low/medium/high/xhigh/max)
  • "toggle" — reasoning can be switched on/off, no extra parameters
  • "budget_tokens" — a thinking token budget bounded by Min/Max (nil ⇒ open-ended)

type RegistryModel

type RegistryModel struct {
	ID               string           `json:"id"`
	Name             string           `json:"name"`
	Family           string           `json:"family,omitempty"`
	Attachment       bool             `json:"attachment,omitempty"`
	Reasoning        bool             `json:"reasoning,omitempty"`
	ToolCall         bool             `json:"tool_call,omitempty"`
	StructuredOutput bool             `json:"structured_output,omitempty"`
	Temperature      bool             `json:"temperature,omitempty"`
	Knowledge        string           `json:"knowledge,omitempty"`
	ReleaseDate      string           `json:"release_date,omitempty"`
	LastUpdated      string           `json:"last_updated,omitempty"`
	Modalities       *ModelModalities `json:"modalities,omitempty"`
	OpenWeights      bool             `json:"open_weights,omitempty"`
	Cost             *ModelCost       `json:"cost,omitempty"`
	Limit            *ModelLimit      `json:"limit,omitempty"`
	Status           string           `json:"status,omitempty"`
	Recommended      bool             `json:"recommended,omitempty"`
	DefaultEnabled   bool             `json:"default_enabled,omitempty"`
	// ReasoningOptions describes how this model exposes its thinking controls,
	// mirroring models.dev's reasoning_options. Empty ⇒ no reasoning controls.
	ReasoningOptions []ReasoningOption `json:"reasoning_options,omitempty"`
}

RegistryModel represents a model from models.dev API.

type RegistryProvider

type RegistryProvider struct {
	ID     string                    `json:"id"`
	Name   string                    `json:"name"`
	Env    []string                  `json:"env"`
	API    string                    `json:"api"`
	Doc    string                    `json:"doc,omitempty"`
	Models map[string]*RegistryModel `json:"models"`
	// Custom is true for providers that exist only because the user configured
	// them (an OpenAI-compatible endpoint not in models.dev), as opposed to a
	// built-in registry brand. Set during MergeConfigProviders.
	Custom bool `json:"custom,omitempty"`
}

RegistryProvider represents a provider from models.dev API.

type TokenUsage

type TokenUsage struct {
	PromptTokens     int64
	CompletionTokens int64
	TotalTokens      int64
	CachedTokens     int64
	ReasoningTokens  int64
	CacheWriteTokens int64
	CallCount        int64 // number of API calls recorded (averages denominator)
	LastTotalTokens  int64
	// contains filtered or unexported fields
}

TokenUsage tracks token consumption across all API calls.

CachedTokens is the cache-READ portion of the prompt (tokens served from the provider's KV cache). CacheWriteTokens is the cache-CREATION portion; it is 0 today because the shared go-openai transport does not surface cache_creation_input_tokens, and is kept as a forward-compatible field. ReasoningTokens is the reasoning/thinking subset of the completion.

func TokenTrackerFromContext

func TokenTrackerFromContext(ctx context.Context) *TokenUsage

TokenTrackerFromContext retrieves the per-agent TokenUsage from the context, if any.

func (*TokenUsage) Add

func (t *TokenUsage) Add(p AddParams)

Add records one API call's token usage.

func (*TokenUsage) AddByModel

func (t *TokenUsage) AddByModel(model string, prompt, completion, total int)

AddByModel adds token usage attributed to a specific model name.

func (*TokenUsage) BeginTurn added in v0.6.4

func (t *TokenUsage) BeginTurn()

BeginTurn snapshots the cumulative counters as the baseline for the current agent turn so TurnUsage reports only this turn's delta. Called at the start of every runner turn.

func (*TokenUsage) CacheHitRate added in v0.6.3

func (t *TokenUsage) CacheHitRate() float64

CacheHitRate returns the cumulative KV cache hit rate, defined as cached / prompt — the fraction of prompt tokens served from the provider's cache. Returns 0 when no prompt tokens have been recorded. The result is clamped to [0,1] to stay robust against provider quirks.

func (*TokenUsage) CacheObserved added in v0.6.3

func (t *TokenUsage) CacheObserved() bool

CacheObserved reports whether the provider has reported cache details (a prompt_tokens_details object) — used to distinguish "cache hit rate is 0%" from "this provider never reports caching". It is true on the first turn that carries cache details even when cached_tokens is 0, and stays true for the session (cleared only by Reset). The CachedTokens>0 fallback keeps it correct for older snapshots recorded before the presence flag existed.

func (*TokenUsage) Get

func (t *TokenUsage) Get() (prompt, completion, total int64)

Get returns the current token usage

func (*TokenUsage) GetByModel

func (t *TokenUsage) GetByModel() map[string]int64

GetByModel returns a snapshot of per-model token totals.

func (*TokenUsage) GetFull added in v0.6.3

func (t *TokenUsage) GetFull() TokenUsageDetail

GetFull returns a cumulative snapshot of all tracked token usage.

func (*TokenUsage) GetLastDetail added in v0.4.4

func (t *TokenUsage) GetLastDetail() *TokenUsageDetail

GetLastDetail returns the last API call's token usage detail.

func (*TokenUsage) GetLastTotal added in v0.3.2

func (t *TokenUsage) GetLastTotal() int64

GetLastTotal returns the last API call's total tokens (current context usage)

func (*TokenUsage) Reset

func (t *TokenUsage) Reset()

Reset resets the token tracker

func (*TokenUsage) ResetContext added in v0.6.4

func (t *TokenUsage) ResetContext()

ResetContext clears only the "current context occupancy" snapshot (the last API call's per-call values), leaving the cumulative consumption ledger, the cache-support flag, the per-model breakdown, and the per-turn baseline intact. Call this after a compaction/summarization shrinks the live context: the context indicator should reflect the smaller window, but the session's accumulated spend must NOT be lost — it feeds budgets, the usage log, and cross-session stats. (Full Reset is for a genuine session boundary.)

func (*TokenUsage) TurnUsage added in v0.6.4

func (t *TokenUsage) TurnUsage() (prompt, completion, cached int64)

TurnUsage returns this turn's consumption (cumulative minus the BeginTurn baseline). Each value is clamped at 0 so a mid-turn Reset (which zeroes the cumulative and the baseline together) can never yield a negative delta.

type TokenUsageDetail added in v0.4.4

type TokenUsageDetail struct {
	PromptTokens     int `json:"prompt_tokens"`
	CompletionTokens int `json:"completion_tokens"`
	TotalTokens      int `json:"total_tokens"`
	CachedTokens     int `json:"cached_tokens"`
	ReasoningTokens  int `json:"reasoning_tokens,omitempty"`
	CacheWriteTokens int `json:"cache_write_tokens,omitempty"`
	CallCount        int `json:"call_count,omitempty"`
}

TokenUsageDetail holds a token usage snapshot for tracing/observability and for JSON transport to the UI. Reasoning/cache-write/call-count carry omitempty so per-call telemetry stays compact while cumulative snapshots (GetFull) carry the full breakdown.

func (TokenUsageDetail) Minus added in v0.6.3

Minus returns the per-field difference d-prev, used to derive the token delta of a single agent run from cumulative snapshots.

type ValidateResult added in v0.7.2

type ValidateResult struct {
	OK         bool   `json:"ok"`
	LatencyMS  int    `json:"latency_ms"`
	ModelCount int    `json:"model_count"`
	ErrorType  string `json:"error_type,omitempty"` // "" | "auth" | "network" | "server"
	Error      string `json:"error,omitempty"`
}

ValidateResult is the structured outcome of a connectivity test against a provider's /models endpoint. It carries everything the UI needs to render a status banner: success with latency + available-model count, or a classified failure (auth vs. network vs. server).

func ValidateProviderDetailed added in v0.7.2

func ValidateProviderDetailed(ctx context.Context, apiKey, baseURL string, headers map[string]string) ValidateResult

ValidateProviderDetailed performs the same connectivity test as ValidateProvider but returns the full structured result, including the measured latency, the number of models advertised at /models, and a classified error type on failure.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL