Documentation
¶
Index ¶
- Constants
- func EchoCheckModel(ctx context.Context, m model.Model) error
- func ListModels(ctx context.Context, url string) ([]string, error)
- func NewOpenAICompletionModel(name string, opts ...option.RequestOption) model.Model
- func OllamaReachable(ctx context.Context, url string) bool
- func PullModel(ctx context.Context, url, model string) error
- type ModelConfig
- type ModelMap
- type ModelProvider
- type ProviderConfig
- type ProviderConfigs
- type TaskType
- type ValidateAndFilterOption
Constants ¶
const ( DefaultGeminiModel = "gemini-3-flash-preview" DefaultAnthropicModel = "claude-sonnet-4-6" DefaultOpenAIModel = "gpt-4o" DefaultOllamaModel = "llama3" DefaultHuggingFaceModel = "bert-base-uncased" )
const DefaultOllamaModelForSetup = "llama3.2:3b"
DefaultOllamaModelForSetup is the default model offered during setup. It runs well on Mac (Apple Silicon) and is pulled behind the scenes if not already present.
const DefaultOllamaURL = "http://localhost:11434"
DefaultOllamaURL is the default base URL for a local Ollama server.
Variables ¶
This section is empty.
Functions ¶
func EchoCheckModel ¶
EchoCheckModel performs a minimal GenerateContent call (echo check) to verify that the given model's credentials are valid. It returns nil if the model responds successfully, and an error if the API returns an auth error (e.g. 401) or a system-level failure. Use this to validate tokens before using a provider.
func ListModels ¶
ListModels returns the names of models available on the Ollama server at url. If url is empty, DefaultOllamaURL is used. Returns an error if the request fails or the response cannot be parsed.
func NewOpenAICompletionModel ¶ added in v0.1.7
func NewOpenAICompletionModel(name string, opts ...option.RequestOption) model.Model
NewOpenAICompletionModel creates a new model that uses the OpenAI completions endpoint.
func OllamaReachable ¶
OllamaReachable reports whether an Ollama server is reachable at the given URL. If url is empty, DefaultOllamaURL is used. The check uses a short timeout so setup does not block if Ollama is not running.
Types ¶
type ModelConfig ¶
type ModelConfig struct {
Providers ProviderConfigs `json:"providers" yaml:"providers,omitempty" toml:"providers,omitempty"`
}
func DefaultModelConfig ¶
func DefaultModelConfig(ctx context.Context, sp security.SecretProvider) ModelConfig
DefaultModelConfig builds the default model configuration by resolving API keys through the given SecretProvider. Each provider is added only if its API key is present. Without a SecretProvider, callers can pass security.NewEnvProvider() to preserve the legacy os.Getenv behavior.
func (ModelConfig) NewEnvBasedModelProvider ¶
func (c ModelConfig) NewEnvBasedModelProvider() ModelProvider
func (*ModelConfig) ValidateAndFilter ¶
func (c *ModelConfig) ValidateAndFilter(ctx context.Context, sp security.SecretProvider, opts ...ValidateAndFilterOption) error
ValidateAndFilter keeps only providers that pass Validate and (unless skipped) EchoCheckModel, and mutates c.Providers. Providers that fail credential validation or the echo check (token invalid/401) are excluded with a warning. Returns an error if after filtering no providers remain.
type ModelProvider ¶
func NewSingleModelProvider ¶ added in v0.1.7
func NewSingleModelProvider(key string, model model.Model) ModelProvider
type ProviderConfig ¶
type ProviderConfig struct {
Name string `json:"name" yaml:"name,omitempty" toml:"name,omitempty"`
Provider string `json:"provider" yaml:"provider,omitempty" toml:"provider,omitempty"`
ModelName string `json:"model_name" yaml:"model_name,omitempty" toml:"model_name,omitempty"`
Variant string `json:"variant" yaml:"variant,omitempty" toml:"variant,omitempty"`
Token string `json:"token" yaml:"token,omitempty" toml:"token,omitempty"`
Host string `json:"host" yaml:"host,omitempty" toml:"host,omitempty"`
GoodForTask TaskType `json:"good_for_task" yaml:"good_for_task,omitempty" toml:"good_for_task,omitempty"`
// EnableTokenTailoring when true (default) trims conversation history to the model's context window (arXiv:2601.14192).
// Set to false to disable (e.g. debugging or when the provider handles context itself).
EnableTokenTailoring *bool `json:"enable_token_tailoring,omitempty" yaml:"enable_token_tailoring,omitempty" toml:"enable_token_tailoring,omitempty"`
MaxTokens *int `json:"max_tokens,omitempty" yaml:"max_tokens,omitempty" toml:"max_tokens,omitempty"`
}
func (ProviderConfig) String ¶
func (p ProviderConfig) String() string
func (ProviderConfig) Validate ¶
func (p ProviderConfig) Validate(ctx context.Context, sp security.SecretProvider) error
Validate returns an error if this provider is not usable (e.g. missing API key). It uses the given SecretProvider to resolve env-based keys when Token is empty. Call this before using the provider so the server never starts with invalid credentials.
type ProviderConfigs ¶
type ProviderConfigs []ProviderConfig
func (ProviderConfigs) Providers ¶
func (providers ProviderConfigs) Providers() []string
type TaskType ¶
type TaskType string
TaskType represents different categories of tasks that LLMs are benchmarked against. These task types help in selecting the most appropriate model based on the specific requirements of the work being performed.
const ( // TaskToolCalling represents tasks requiring reliable generation of executable code and API calls. // Benchmark: BFCL v4 (Berkeley Function Calling Leaderboard) // Top performers: Llama 3.1 405B Instruct (88.50%), Claude Opus 4.5 FC (77.47%) // Use this for: Function calling, API integration, structured code generation TaskToolCalling TaskType = "tool_calling" // TaskPlanning represents tasks involving agentic planning and coding for real-world software engineering. // Benchmark: SWE-Bench (Software Engineering Benchmark) // Top performers: Claude Sonnet 4.5 Parallel (82.00%), Claude Opus 4.5 (80.90%) // Use this for: Complex refactoring, multi-file changes, architectural decisions TaskPlanning TaskType = "planning" // TaskCoding represents pure code generation, algorithmic problem solving, and script writing. // Benchmarks: HumanEval (pioneered by Codex), MBPP, LiveCodeBench // Top performers: Claude Sonnet 4.5, GPT-5.2 // Use this for: Single-function generation, copilot-style autocomplete, algorithmic coding TaskCoding TaskType = "coding" // TaskTerminalCalling represents tasks requiring precision in command-line interfaces and terminal operations. // Benchmark: Terminal Execution Bench 2.0 // Top performers: Claude Sonnet 4.5 (61.30%), Claude Opus 4.5 (59.30%) // Use this for: Shell scripting, CLI tool usage, system administration tasks TaskTerminalCalling TaskType = "terminal_calling" // TaskScientificReasoning represents tasks requiring PhD-level scientific reasoning and logic. // Benchmark: GPQA Diamond (Graduate-Level Google-Proof Q&A) // Top performers: Gemini 3 Pro Deep Think (93.80%), GPT-5.2 (92.40%) // Use this for: Complex analysis, research tasks, domain-specific expertise TaskScientificReasoning TaskType = "scientific_reasoning" // TaskNovelReasoning represents tasks testing abstract visual pattern solving for never-before-seen problems. // Benchmark: ARC-AGI 2 (Abstraction and Reasoning Corpus) // Top performers: GPT-5.2 Pro High (54.20%), Poetiq Gemini 3 Pro Refine (54.00%) // Use this for: Novel problem-solving, pattern recognition, creative solutions TaskNovelReasoning TaskType = "novel_reasoning" // TaskGeneralTask represents broad knowledge tasks and general reasoning capabilities. // Benchmark: Humanity's Last Exam // Top performers: Gemini 3 Pro Deep Think (41.00%), Gemini 3 Pro Standard (37.50%) // Use this for: General knowledge queries, broad reasoning, interdisciplinary tasks TaskGeneralTask TaskType = "general_task" // TaskMathematical represents high-level competition mathematics and quantitative reasoning. // Benchmark: AIME 2025 (American Invitational Mathematics Examination) // Top performers: GPT-5.2 (100.00%), Gemini 3 Pro (100.00%), Grok 4.1 Heavy (100.00%) // Use this for: Mathematical proofs, quantitative analysis, algorithmic optimization TaskMathematical TaskType = "mathematical" // TaskLongHorizonAutonomy represents extended autonomous operation capabilities. // Benchmark: METR (measured in minutes before 50% failure rate) // Top performers: GPT-5 Medium (137.3 min), Claude Sonnet 4.5 (113.3 min) // Use this for: Long-running autonomous agents, multi-step workflows, sustained reasoning TaskLongHorizonAutonomy TaskType = "long_horizon_autonomy" // TaskEfficiency represents operational speed and cost efficiency considerations. // Benchmarks: Throughput (tokens/sec) and Cost (USD per 1M input tokens) // Throughput leaders: Llama 4 Scout (2600 t/s), Grok 4.1 (455 t/s) // Cost leaders: Grok 4.1 ($0.20), Gemini 3 Flash ($0.50) // Use this for: High-volume processing, cost-sensitive operations, real-time applications TaskEfficiency TaskType = "efficiency" // TaskSummarizer represents tasks requiring large-context summarization of // verbose tool outputs. Typically mapped to a model with a very large context // window (e.g., 1M tokens) so it can ingest and compress raw HTML, API // responses, or other bulk data before handing it back to a smaller agent. // Use this for: Auto-summarizing oversized tool results, condensing documents TaskSummarizer TaskType = "summarizer" // TaskComputerOperations represents native computer-use capabilities. // Benchmark: OSWorld-Verified, WebArena Verified, APEX-Agents // Top performers: GPT-5.4 Pro // Use this for: Operating applications via keyboard and mouse commands TaskComputerOperations TaskType = "computer_operations" )
type ValidateAndFilterOption ¶
type ValidateAndFilterOption func(*validateAndFilterOptions)
ValidateAndFilterOption configures ValidateAndFilter behavior.
func SkipEchoCheck ¶
func SkipEchoCheck() ValidateAndFilterOption
SkipEchoCheck disables the per-provider echo check (useful in tests to avoid real API calls).