model

package
v0.6.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 23, 2026 License: MIT Imports: 17 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultContextLimitFallback = 200000

DefaultContextLimitFallback is the conservative context window assumed when a model's true limit cannot be determined from the registry, built-in tables, or user config. Kept deliberately small so unknown models compact early rather than overflowing the provider's real window. Override per-model via config.ContextLimits or globally via config.DefaultContextLimit.

Variables

View Source
var TokenTracker = &TokenUsage{}

TokenTracker is a global token usage tracker

Functions

func FormatAPIError

func FormatAPIError(err error, attempt, maxRetries int) string

FormatAPIError produces a user-friendly error message with retry context.

func GetModelContextLimit

func GetModelContextLimit(modelName string) int

GetModelContextLimit returns the known context limit for a given model name. Returns 0 if the model is not in the known list.

func GetTokenUsage

func GetTokenUsage() (prompt, completion, total int64)

GetTokenUsage returns the current token usage statistics

func IsRetryable

func IsRetryable(_ context.Context, err error) bool

IsRetryable returns true if the error should be retried. It is designed to be used as ModelRetryConfig.IsRetryAble in the Eino framework.

Context overflow errors are NOT retryable — they need compaction. Auth errors are NOT retryable — they need user action.

func ParseProviderModel

func ParseProviderModel(s string) (provider, model string, err error)

ParseProviderModel splits "provider/model" into its components.

func ParseRetryAfter

func ParseRetryAfter(err error) time.Duration

ParseRetryAfter extracts a delay from an error message or OpenAI APIError. It looks for Retry-After header patterns in the error text. Returns 0 if no delay information is found.

func ResetTokenUsage

func ResetTokenUsage()

ResetTokenUsage resets the token usage tracker

func ResolveContextLimit added in v0.5.1

func ResolveContextLimit(reg *ModelRegistry, cfg *config.Config, providerID, modelID string) int

ResolveContextLimit determines the effective context window (in tokens) for a provider/model pair. This is the single source of truth for window-size management — all middleware thresholds (compaction, summarization, reduction, reminders) derive from it.

Resolution order (first positive hit wins):

  1. explicit user override: cfg.ContextLimits["provider/model"], then cfg.ContextLimits["model"]
  2. models.dev registry metadata (reg.GetModelContextLimit)
  3. built-in knownModels fallback table (GetModelContextLimit)
  4. cfg.DefaultContextLimit, else DefaultContextLimitFallback

reg and cfg may be nil; the resolver degrades gracefully.

func SmartBackoff

func SmartBackoff(ctx context.Context, attempt int) time.Duration

SmartBackoff returns a delay for the given retry attempt, respecting server-sent Retry-After hints when available. It is designed to be used as ModelRetryConfig.BackoffFunc in the Eino framework.

Strategy (matching Claude-Code & OpenCode patterns):

  1. If the error contains a Retry-After hint, use it (capped at 5 min).
  2. Otherwise fall back to exponential backoff: 500ms × 2^(attempt-1), capped at 32s, plus 0-25% random jitter.

func UsageNotifierFromContext added in v0.6.3

func UsageNotifierFromContext(ctx context.Context) func()

UsageNotifierFromContext retrieves the per-call usage notifier, if any.

func ValidateProvider added in v0.3.10

func ValidateProvider(ctx context.Context, apiKey, baseURL string) error

ValidateProvider tests connectivity to a provider by making a lightweight GET /models request. Returns nil on success, or a descriptive error.

func WithRetryError

func WithRetryError(ctx context.Context, err error) context.Context

WithRetryError stores an error in context for BackoffFunc to inspect.

func WithTokenTracker

func WithTokenTracker(ctx context.Context, t *TokenUsage) context.Context

WithTokenTracker attaches a per-agent TokenUsage to the context. chatModel.Generate/Stream will increment this tracker in addition to the global TokenTracker.

func WithUsageNotifier added in v0.6.3

func WithUsageNotifier(ctx context.Context, fn func()) context.Context

WithUsageNotifier attaches a callback that chatModel.Generate/Stream invokes after each API call's usage has been recorded. UIs use it to refresh the token/context display in real time during a run, not just at turn end. The model layer stays provider/UI-agnostic — it only fires the opaque callback.

Types

type APIErrorCategory

type APIErrorCategory int

APIErrorCategory classifies LLM API errors into actionable categories.

const (
	// ErrCategoryTransient — network blips, timeouts, 5xx; safe to retry.
	ErrCategoryTransient APIErrorCategory = iota
	// ErrCategoryRateLimit — 429 / "overloaded"; retry with back-off.
	ErrCategoryRateLimit
	// ErrCategoryContextOverflow — input too long; needs compaction, NOT retry.
	ErrCategoryContextOverflow
	// ErrCategoryAuth — 401/403; permanent until key is fixed.
	ErrCategoryAuth
	// ErrCategoryFatal — 400 bad request, unknown; do not retry.
	ErrCategoryFatal
)

func ClassifyError

func ClassifyError(err error) APIErrorCategory

ClassifyError determines the category of an API error.

func (APIErrorCategory) String

func (c APIErrorCategory) String() string

type AddParams added in v0.6.3

type AddParams struct {
	Prompt     int
	Completion int
	Total      int
	Cached     int
	Reasoning  int
	CacheWrite int
	// CacheDetailsPresent is true when the provider returned a
	// prompt_tokens_details object at all (even with cached_tokens:0), letting
	// CacheObserved tell "supports caching, 0 hits" apart from "never reports
	// caching". See https://platform.openai.com/docs/guides/prompt-caching.
	CacheDetailsPresent bool
}

AddParams carries one API call's token usage. Using a struct keeps the growing set of token categories from turning Add into a long positional list.

type ChatModelConfig

type ChatModelConfig struct {
	Model   string
	APIKey  string
	BaseURL string
}

type ContextOverflowInfo

type ContextOverflowInfo struct {
	ActualTokens int
	LimitTokens  int
	TokenGap     int // ActualTokens - LimitTokens
}

ContextOverflowInfo holds parsed token counts from an overflow error.

func ParseContextOverflow

func ParseContextOverflow(err error) *ContextOverflowInfo

ParseContextOverflow extracts token counts from a context overflow error. Returns nil if the error is not a context overflow or counts cannot be parsed.

type ModelCost

type ModelCost struct {
	Input      float64 `json:"input"`
	Output     float64 `json:"output"`
	CacheRead  float64 `json:"cache_read,omitempty"`
	CacheWrite float64 `json:"cache_write,omitempty"`
}

ModelCost describes per-token costs in USD per 1M tokens.

type ModelFactory

type ModelFactory struct {
	// contains filtered or unexported fields
}

ModelFactory creates and caches ChatModel instances by "provider/model" identifier.

func NewModelFactory

func NewModelFactory(cfg *config.Config, fallback einomodel.ToolCallingChatModel) *ModelFactory

NewModelFactory creates a model factory with the given config, fallback model, and registry.

func (*ModelFactory) Fallback

Fallback returns the default fallback model.

func (*ModelFactory) GetModel

func (f *ModelFactory) GetModel(ctx context.Context, providerModel string) (einomodel.ToolCallingChatModel, error)

GetModel returns a ChatModel for the given "provider/model" identifier. Empty string returns the fallback model.

func (*ModelFactory) Registry

func (f *ModelFactory) Registry() *ModelRegistry

Registry returns the underlying ModelRegistry for metadata lookups.

type ModelInfo

type ModelInfo struct {
	ID           string
	ContextLimit int // Maximum context window size, 0 if unknown
	Pricing      ModelPricing
}

ModelInfo contains information about a model

type ModelLimit

type ModelLimit struct {
	Context int `json:"context"`
	Input   int `json:"input,omitempty"`
	Output  int `json:"output,omitempty"`
}

ModelLimit describes context window and output limits.

type ModelModalities

type ModelModalities struct {
	Input  []string `json:"input,omitempty"`
	Output []string `json:"output,omitempty"`
}

ModelModalities describes input/output modalities.

type ModelPricing

type ModelPricing struct {
	InputPer1M     float64 // cost per 1M input tokens
	OutputPer1M    float64 // cost per 1M output tokens
	CacheReadPer1M float64 // cost per 1M cache-read (cached input) tokens; 0 ⇒ no discount data, fall back to InputPer1M
}

ModelPricing contains cost information for a model.

type ModelRegistry

type ModelRegistry struct {
	// contains filtered or unexported fields
}

ModelRegistry provides model metadata from models.dev and custom config. The base data is statically generated at build time via go:generate. Custom models from config are merged in at runtime.

func NewModelRegistry

func NewModelRegistry() *ModelRegistry

NewModelRegistry creates a new ModelRegistry with a deep copy of generated data. Each RegistryProvider and its Models map are copied so that merging custom models at runtime never mutates the shared generatedProviders.

func NewModelRegistryWithConfig added in v0.4.8

func NewModelRegistryWithConfig(cfg *config.Config) *ModelRegistry

NewModelRegistryWithConfig creates a ModelRegistry and merges custom models from config.

func (*ModelRegistry) GetModelCacheCost added in v0.6.4

func (r *ModelRegistry) GetModelCacheCost(providerID, modelID string) (cacheReadPer1M, cacheWritePer1M float64)

GetModelCacheCost returns the cache-read and cache-write prices (USD per 1M tokens) for a model, or 0 when the registry has no cache pricing for it.

func (*ModelRegistry) GetModelContextLimit

func (r *ModelRegistry) GetModelContextLimit(providerID, modelID string) int

GetModelContextLimit returns the context limit for a model looked up via registry.

func (*ModelRegistry) GetModelCost

func (r *ModelRegistry) GetModelCost(providerID, modelID string) (inputPer1M, outputPer1M float64)

GetModelCost returns pricing info for a model.

func (*ModelRegistry) GetProvider

func (r *ModelRegistry) GetProvider(providerID string) *RegistryProvider

GetProvider returns provider info by ID, or nil if not found.

func (*ModelRegistry) GetProviderAPI

func (r *ModelRegistry) GetProviderAPI(providerID string) string

GetProviderAPI returns the API base URL for a provider from the registry.

func (*ModelRegistry) GetProviderEnvVars

func (r *ModelRegistry) GetProviderEnvVars(providerID string) []string

GetProviderEnvVars returns the environment variable names for a provider.

func (*ModelRegistry) HasProvider

func (r *ModelRegistry) HasProvider(providerID string) bool

HasProvider returns whether the given provider ID exists in the registry.

func (*ModelRegistry) ListProviderModels

func (r *ModelRegistry) ListProviderModels(providerID string, toolCallOnly bool) []*RegistryModel

ListProviderModels returns models for a provider from the registry. If toolCallOnly is true, only models with tool_call support are returned. Models are sorted by ID.

func (*ModelRegistry) ListProviders

func (r *ModelRegistry) ListProviders() []*RegistryProvider

ListProviders returns all providers in the curated display order.

func (*ModelRegistry) Load

func (r *ModelRegistry) Load() (map[string]*RegistryProvider, error)

Load returns the provider/model data.

func (*ModelRegistry) LookupModel

func (r *ModelRegistry) LookupModel(providerID, modelID string) (*RegistryProvider, *RegistryModel, bool)

LookupModel finds a model by "provider/model" identifier. Returns the provider info, model info, and whether it was found.

func (*ModelRegistry) MergeConfigProviders added in v0.4.8

func (r *ModelRegistry) MergeConfigProviders(providers map[string]*config.ProviderConfig)

MergeConfigProviders merges custom models from config providers into the registry. For providers not in the registry, a new entry is created. For existing providers, custom models are added (existing models are not overridden).

type RegistryModel

type RegistryModel struct {
	ID               string           `json:"id"`
	Name             string           `json:"name"`
	Family           string           `json:"family,omitempty"`
	Attachment       bool             `json:"attachment,omitempty"`
	Reasoning        bool             `json:"reasoning,omitempty"`
	ToolCall         bool             `json:"tool_call,omitempty"`
	StructuredOutput bool             `json:"structured_output,omitempty"`
	Temperature      bool             `json:"temperature,omitempty"`
	Knowledge        string           `json:"knowledge,omitempty"`
	ReleaseDate      string           `json:"release_date,omitempty"`
	LastUpdated      string           `json:"last_updated,omitempty"`
	Modalities       *ModelModalities `json:"modalities,omitempty"`
	OpenWeights      bool             `json:"open_weights,omitempty"`
	Cost             *ModelCost       `json:"cost,omitempty"`
	Limit            *ModelLimit      `json:"limit,omitempty"`
	Status           string           `json:"status,omitempty"`
	Recommended      bool             `json:"recommended,omitempty"`
	DefaultEnabled   bool             `json:"default_enabled,omitempty"`
}

RegistryModel represents a model from models.dev API.

type RegistryProvider

type RegistryProvider struct {
	ID     string                    `json:"id"`
	Name   string                    `json:"name"`
	Env    []string                  `json:"env"`
	API    string                    `json:"api"`
	Doc    string                    `json:"doc,omitempty"`
	Models map[string]*RegistryModel `json:"models"`
}

RegistryProvider represents a provider from models.dev API.

type TokenUsage

type TokenUsage struct {
	PromptTokens     int64
	CompletionTokens int64
	TotalTokens      int64
	CachedTokens     int64
	ReasoningTokens  int64
	CacheWriteTokens int64
	CallCount        int64 // number of API calls recorded (averages denominator)
	LastTotalTokens  int64
	// contains filtered or unexported fields
}

TokenUsage tracks token consumption across all API calls.

CachedTokens is the cache-READ portion of the prompt (tokens served from the provider's KV cache). CacheWriteTokens is the cache-CREATION portion; it is 0 today because the shared go-openai transport does not surface cache_creation_input_tokens, and is kept as a forward-compatible field. ReasoningTokens is the reasoning/thinking subset of the completion.

func TokenTrackerFromContext

func TokenTrackerFromContext(ctx context.Context) *TokenUsage

TokenTrackerFromContext retrieves the per-agent TokenUsage from the context, if any.

func (*TokenUsage) Add

func (t *TokenUsage) Add(p AddParams)

Add records one API call's token usage.

func (*TokenUsage) AddByModel

func (t *TokenUsage) AddByModel(model string, prompt, completion, total int)

AddByModel adds token usage attributed to a specific model name.

func (*TokenUsage) BeginTurn added in v0.6.4

func (t *TokenUsage) BeginTurn()

BeginTurn snapshots the cumulative counters as the baseline for the current agent turn so TurnUsage reports only this turn's delta. Called at the start of every runner turn.

func (*TokenUsage) CacheHitRate added in v0.6.3

func (t *TokenUsage) CacheHitRate() float64

CacheHitRate returns the cumulative KV cache hit rate, defined as cached / prompt — the fraction of prompt tokens served from the provider's cache. Returns 0 when no prompt tokens have been recorded. The result is clamped to [0,1] to stay robust against provider quirks.

func (*TokenUsage) CacheObserved added in v0.6.3

func (t *TokenUsage) CacheObserved() bool

CacheObserved reports whether the provider has reported cache details (a prompt_tokens_details object) — used to distinguish "cache hit rate is 0%" from "this provider never reports caching". It is true on the first turn that carries cache details even when cached_tokens is 0, and stays true for the session (cleared only by Reset). The CachedTokens>0 fallback keeps it correct for older snapshots recorded before the presence flag existed.

func (*TokenUsage) Get

func (t *TokenUsage) Get() (prompt, completion, total int64)

Get returns the current token usage

func (*TokenUsage) GetByModel

func (t *TokenUsage) GetByModel() map[string]int64

GetByModel returns a snapshot of per-model token totals.

func (*TokenUsage) GetFull added in v0.6.3

func (t *TokenUsage) GetFull() TokenUsageDetail

GetFull returns a cumulative snapshot of all tracked token usage.

func (*TokenUsage) GetLastDetail added in v0.4.4

func (t *TokenUsage) GetLastDetail() *TokenUsageDetail

GetLastDetail returns the last API call's token usage detail.

func (*TokenUsage) GetLastTotal added in v0.3.2

func (t *TokenUsage) GetLastTotal() int64

GetLastTotal returns the last API call's total tokens (current context usage)

func (*TokenUsage) Reset

func (t *TokenUsage) Reset()

Reset resets the token tracker

func (*TokenUsage) ResetContext added in v0.6.4

func (t *TokenUsage) ResetContext()

ResetContext clears only the "current context occupancy" snapshot (the last API call's per-call values), leaving the cumulative consumption ledger, the cache-support flag, the per-model breakdown, and the per-turn baseline intact. Call this after a compaction/summarization shrinks the live context: the context indicator should reflect the smaller window, but the session's accumulated spend must NOT be lost — it feeds budgets, the usage log, and cross-session stats. (Full Reset is for a genuine session boundary.)

func (*TokenUsage) TurnUsage added in v0.6.4

func (t *TokenUsage) TurnUsage() (prompt, completion, cached int64)

TurnUsage returns this turn's consumption (cumulative minus the BeginTurn baseline). Each value is clamped at 0 so a mid-turn Reset (which zeroes the cumulative and the baseline together) can never yield a negative delta.

type TokenUsageDetail added in v0.4.4

type TokenUsageDetail struct {
	PromptTokens     int `json:"prompt_tokens"`
	CompletionTokens int `json:"completion_tokens"`
	TotalTokens      int `json:"total_tokens"`
	CachedTokens     int `json:"cached_tokens"`
	ReasoningTokens  int `json:"reasoning_tokens,omitempty"`
	CacheWriteTokens int `json:"cache_write_tokens,omitempty"`
	CallCount        int `json:"call_count,omitempty"`
}

TokenUsageDetail holds a token usage snapshot for tracing/observability and for JSON transport to the UI. Reasoning/cache-write/call-count carry omitempty so per-call telemetry stays compact while cumulative snapshots (GetFull) carry the full breakdown.

func (TokenUsageDetail) Minus added in v0.6.3

Minus returns the per-field difference d-prev, used to derive the token delta of a single agent run from cumulative snapshots.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL