Documentation
¶
Overview ¶
Package core contains the internal agent loop and provider-facing DTOs.
Index ¶
- Constants
- Variables
- func IsContextOverflowError(err error) bool
- func IsTransientError(err error) bool
- type AttemptMetadata
- type CompactionResult
- type Compactor
- type CostAttribution
- type CostSource
- type Event
- type EventCallback
- type EventType
- type Message
- type ModelPricing
- type Options
- type PricingTable
- type Provider
- type ProviderCapabilityMissingError
- type Reasoning
- type ReasoningStallError
- type Request
- type Response
- type Result
- type Role
- type RoutingReport
- type RoutingReporter
- type SeamOptions
- type Status
- type StreamDelta
- type StreamingProvider
- type TimingBreakdown
- type TokenUsage
- type Tool
- type ToolCall
- type ToolCallLog
- type ToolDef
Constants ¶
const ( ReasoningAuto = reasoning.ReasoningAuto ReasoningOff = reasoning.ReasoningOff ReasoningLow = reasoning.ReasoningLow ReasoningMedium = reasoning.ReasoningMedium ReasoningHigh = reasoning.ReasoningHigh ReasoningMinimal = reasoning.ReasoningMinimal ReasoningXHigh = reasoning.ReasoningXHigh ReasoningMax = reasoning.ReasoningMax )
const DefaultReasoningByteLimit = 256 * 1024 // 256KB
DefaultReasoningByteLimit is the default maximum number of bytes of pure reasoning_content allowed before the stream is aborted with ErrReasoningOverflow. Only fires when no content or tool_call delta has been seen yet. Configurable via config.yaml reasoning_byte_limit; 0 = unlimited.
const DefaultReasoningStallTimeout = 16384 * time.Second
DefaultReasoningStallTimeout is the fallback stall deadline used when no reasoning budget is available to drive the adaptive computation. Sized as the worst-case absolute ceiling: max reasoning budget (32 768 tokens) at the slowest plausible local inference rate (2 tok/s) = 16 384 s (~4.5 h). The adaptive mechanism in consumeStream computes a tighter, budget-aware deadline for any run where the reasoning budget is known.
const DefaultReasoningTailBytes = 2000
DefaultReasoningTailBytes is the default size of the reasoning-tail buffer captured for inclusion in ReasoningStallError. Sized to roughly the last 500 reasoning tokens at ~4 chars/token. Configurable via streamThresholds.reasoningTailBytes; 0 falls back to this default.
const ProviderCapabilityMissingErrorCode = "PROVIDER_CAPABILITY_MISSING"
ProviderCapabilityMissingErrorCode is the stable string code carried on ProviderCapabilityMissingError.Code so external callers can pattern-match without depending on Go type identity.
const ReasoningStallCode = "REASONING_STALL"
ReasoningStallCode is the stable, machine-matchable identifier for a reasoning-stall failure. Callers and downstream observers (benchmark harnesses, dashboards) can match on this constant rather than parsing the wrapped error string.
Variables ¶
var DefaultPricing = PricingTable{
"claude-sonnet-4-20250514": {InputPerMTok: 3.00, OutputPerMTok: 15.00},
"claude-haiku-4-20250414": {InputPerMTok: 0.80, OutputPerMTok: 4.00},
"claude-opus-4-20250515": {InputPerMTok: 15.00, OutputPerMTok: 75.00},
"gpt-4o": {InputPerMTok: 2.50, OutputPerMTok: 10.00},
"gpt-4o-mini": {InputPerMTok: 0.15, OutputPerMTok: 0.60},
"gpt-4.1": {InputPerMTok: 2.00, OutputPerMTok: 8.00},
"o3-mini": {InputPerMTok: 1.10, OutputPerMTok: 4.40},
"qwen3.5-7b": {InputPerMTok: 0, OutputPerMTok: 0},
"llama-3.2-8b": {InputPerMTok: 0, OutputPerMTok: 0},
}
DefaultPricing contains built-in pricing for common models.
var ErrCompactionNoFit = errors.New("agent: compaction could not fit within the effective context window")
ErrCompactionNoFit reports that compaction was needed but could not produce a message history that fits within the effective context window.
var ErrCompactionStuck = errors.New("agent: compaction stuck: consecutive attempts failed to produce a compacted history")
ErrCompactionStuck reports that compaction was requested but failed to produce a compacted history for multiple consecutive attempts. This prevents a runaway loop where compaction.end(no_compaction=true) events fire indefinitely without making progress.
var ErrProviderCapabilityMissing = errors.New("agent: provider capability missing")
ErrProviderCapabilityMissing is the sentinel for a provider/server reporting that a feature required to serve the request is not implemented for the active model+config combination (for example, mlx_lm's "NotImplementedError: RotatingKVCache Quantization NYI"). The condition is deterministic for a given provider+model, so the agent loop must not retry.
var ErrReasoningOverflow = errors.New("agent: reasoning overflow: model produced only reasoning tokens past byte limit")
ErrReasoningOverflow is returned by consumeStream when the model has emitted more than reasoningByteLimit bytes of pure reasoning_content without producing any content or tool_call deltas. The model is stuck in a runaway reasoning loop and the stream is aborted early.
var ErrReasoningStall = errors.New("agent: reasoning stall: model produced only reasoning tokens past stall timeout")
ErrReasoningStall is returned by consumeStream when only reasoning_content deltas have arrived for longer than reasoningStallTimeout with no content or tool_call delta. The model appears to be making no forward progress.
Stall failures are also wrapped in a ReasoningStallError carrying structured fields (model, timeout, reasoning tail, prompt id). Callers that need the machine-matchable code or the captured reasoning context should use errors.As with *ReasoningStallError.
var ErrToolCallLoop = errors.New("agent: identical tool calls repeated, aborting loop")
ErrToolCallLoop reports that the agent produced identical tool calls for toolCallLoopLimit consecutive turns, indicating a non-converging loop.
Functions ¶
func IsContextOverflowError ¶
IsContextOverflowError reports whether err indicates the request exceeded the model's context window.
func IsTransientError ¶
IsTransientError reports whether err is a transient provider error that is safe to retry (network issues, rate limits, server overload). Returns false for fatal errors (auth failures, bad requests, etc.).
Types ¶
type AttemptMetadata ¶
type AttemptMetadata struct {
AttemptIndex int `json:"attempt_index,omitempty"`
ProviderName string `json:"provider_name,omitempty"`
ProviderSystem string `json:"provider_system,omitempty"`
Route string `json:"route,omitempty"`
ServerAddress string `json:"server_address,omitempty"`
ServerPort int `json:"server_port,omitempty"`
RequestedModel string `json:"requested_model,omitempty"`
ResponseModel string `json:"response_model,omitempty"`
ResolvedModel string `json:"resolved_model,omitempty"`
Cost *CostAttribution `json:"cost,omitempty"`
Timing *TimingBreakdown `json:"timing,omitempty"`
}
AttemptMetadata captures the structured identity and attribution data for a single internal provider attempt.
identity is reported through routing_decision and final ServiceEvent data.
type CompactionResult ¶
type CompactionResult struct {
// Summary is the generated summary text.
Summary string `json:"summary"`
// FileOps tracks files read and modified.
FileOps map[string]any `json:"file_ops,omitempty"`
// TokensBefore is the estimated token count before compaction.
TokensBefore int `json:"tokens_before"`
// TokensAfter is the estimated token count after compaction.
TokensAfter int `json:"tokens_after"`
// Warning is a degradation warning, if any.
Warning string `json:"warning,omitempty"`
}
CompactionResult holds the output of a internal compaction pass.
type Compactor ¶
type Compactor func(ctx context.Context, messages []Message, provider Provider, toolCalls []ToolCallLog) ([]Message, *CompactionResult, error)
Compactor is the internal callback shape used by Request.Compactor.
type CostAttribution ¶
type CostAttribution struct {
Source CostSource `json:"source,omitempty"`
Currency string `json:"currency,omitempty"`
Amount *float64 `json:"amount,omitempty"`
InputAmount *float64 `json:"input_amount,omitempty"`
OutputAmount *float64 `json:"output_amount,omitempty"`
CacheReadAmount *float64 `json:"cache_read_amount,omitempty"`
CacheWriteAmount *float64 `json:"cache_write_amount,omitempty"`
ReasoningAmount *float64 `json:"reasoning_amount,omitempty"`
PricingRef string `json:"pricing_ref,omitempty"`
Raw json.RawMessage `json:"raw,omitempty"`
}
CostAttribution captures the provenance of the cost associated with one internal provider attempt.
type CostSource ¶
type CostSource string
CostSource identifies where the recorded cost originated in the internal loop.
const ( CostSourceProviderReported CostSource = "provider_reported" CostSourceGatewayReported CostSource = "gateway_reported" CostSourceConfigured CostSource = "configured" CostSourceUnknown CostSource = "unknown" )
type Event ¶
type Event struct {
SessionID string `json:"session_id"`
Seq int `json:"seq"`
Type EventType `json:"type"`
Timestamp time.Time `json:"ts"`
Data json.RawMessage `json:"data"`
}
Event is a structured event emitted during a internal agent loop.
uses ServiceEvent.
type EventCallback ¶
type EventCallback func(Event)
EventCallback receives events during a internal agent loop.
returns a channel of ServiceEvent values.
type EventType ¶
type EventType string
EventType identifies the kind of event emitted during a internal agent loop.
uses ServiceEvent.
const ( EventSessionStart EventType = "session.start" EventLLMRequest EventType = "llm.request" EventLLMResponse EventType = "llm.response" EventToolCall EventType = "tool.call" EventSessionEnd EventType = "session.end" EventLLMDelta EventType = "llm.delta" EventCompactionStart EventType = "compaction.start" EventCompactionEnd EventType = "compaction.end" // EventOverride and EventRejectedOverride mirror the service-stream // override / rejected_override events into the session log so windowed // reporting (UsageReport, ADR-006 §5) can rebuild routing-quality // metrics across restarts and beyond the in-memory ring's retention. EventOverride EventType = "override" EventRejectedOverride EventType = "rejected_override" // EventReasoningStall fires immediately before consumeStream returns // ErrReasoningStall. Its data payload carries model, timeout_ms, // reasoning_tail, and prompt_id so harnesses and dashboards can count // stall rate and debug what the model was reasoning about at the time. EventReasoningStall EventType = "reasoning.stall" // EventPlanningTurn fires after the planning LLM call completes when // Request.PlanningMode is enabled. Data: "plan" (string), "usage" // (TokenUsage), "model" (string). EventPlanningTurn EventType = "planning.turn" )
type Message ¶
type Message struct {
Role Role `json:"role"`
Content string `json:"content,omitempty"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
ToolCallID string `json:"tool_call_id,omitempty"`
}
Message is a single message in the internal conversation history.
type ModelPricing ¶
type ModelPricing struct {
InputPerMTok float64 `json:"input_per_mtok"`
OutputPerMTok float64 `json:"output_per_mtok"`
CacheReadPerM float64 `json:"cache_read_per_m,omitempty"`
CacheWritePerM float64 `json:"cache_write_per_m,omitempty"`
}
ModelPricing holds per-million-token costs for a model.
type Options ¶
type Options struct {
SeamOptions
Model string `json:"model,omitempty"`
Temperature *float64 `json:"temperature,omitempty"`
TopP *float64 `json:"top_p,omitempty"`
TopK *int `json:"top_k,omitempty"`
MinP *float64 `json:"min_p,omitempty"`
// RepetitionPenalty is the OpenAI-compat field name; openai-compat
// servers (omlx, lmstudio, vLLM, OpenRouter) accept it as a top-level
// extra. >1.0 discourages exact repeats; 1.05–1.1 is typical for Qwen.
RepetitionPenalty *float64 `json:"repetition_penalty,omitempty"`
Seed int64 `json:"seed,omitempty"`
MaxTokens int `json:"max_tokens,omitempty"`
Stop []string `json:"stop,omitempty"`
// SamplingSource is a comma-separated record of which resolution layers
// supplied non-nil sampler fields, e.g. "catalog" or
// "catalog,provider_config". Used only for the llm.request telemetry
// event; never sent on the provider wire. See ADR-007 §5.
SamplingSource string `json:"sampling_source,omitempty"`
// Reasoning controls model-side reasoning with one scalar value. Empty means
// unset; use ReasoningOff or ReasoningTokens(0) for explicit off.
Reasoning Reasoning `json:"reasoning,omitempty"`
// CachePolicy is the public prompt-caching opt-out plumbed from
// ServiceExecuteRequest. Valid values: "" / "default" / "off". Providers
// that implement caching (currently Anthropic, via cache_control writes
// in a follow-up bead) consult this field; providers without caching
// ignore it.
CachePolicy string `json:"cache_policy,omitempty"`
}
Options configures a single internal provider Chat call.
SeamOptions is embedded to carry test injection seams when the testseam build tag is set; in production builds it is an empty struct with no fields.
type PricingTable ¶
type PricingTable map[string]ModelPricing
PricingTable maps model IDs to their pricing.
func LoadCatalogPricing ¶
func LoadCatalogPricing(cat *modelcatalog.Catalog) PricingTable
LoadCatalogPricing builds a PricingTable from catalog pricing data. Entries from the catalog supplement DefaultPricing; catalog values take precedence.
func (PricingTable) EstimateCost ¶
func (pt PricingTable) EstimateCost(model string, inputTokens, outputTokens int) float64
EstimateCost returns the estimated cost in USD for the given token usage. Returns -1 if the model is not in the pricing table (unknown). Returns 0 if the model is free (local).
type Provider ¶
type Provider interface {
Chat(ctx context.Context, messages []Message, tools []ToolDef, opts Options) (Response, error)
}
Provider is the internal interface that LLM backends implement.
type ProviderCapabilityMissingError ¶
type ProviderCapabilityMissingError struct {
// Code is the stable error code; always equals ProviderCapabilityMissingErrorCode.
Code string
// Capability is the missing capability name extracted from the server
// message (e.g. "RotatingKVCache Quantization"). Empty when the upstream
// message did not name a recognizable capability.
Capability string
// ServerMessage is the raw error message returned by the upstream server.
ServerMessage string
// Cause is the underlying transport error, preserved for chained Unwrap
// callers that want the original SDK error.
Cause error
}
ProviderCapabilityMissingError reports that an upstream provider rejected the request because the server cannot implement a capability the model requires under the current configuration. It carries a stable Code, the extracted Capability name (when one can be parsed out of the server message), and the raw ServerMessage so logs and tests can inspect either.
Use errors.As to extract Code/Capability/ServerMessage in routing or telemetry; errors.Is(err, ErrProviderCapabilityMissing) matches without requiring callers to know the concrete struct.
func (*ProviderCapabilityMissingError) Error ¶
func (e *ProviderCapabilityMissingError) Error() string
func (*ProviderCapabilityMissingError) Is ¶
func (e *ProviderCapabilityMissingError) Is(target error) bool
func (*ProviderCapabilityMissingError) Unwrap ¶
func (e *ProviderCapabilityMissingError) Unwrap() error
Unwrap reports the sentinel so errors.Is(err, ErrProviderCapabilityMissing) works through wrapping. Use UnwrapCause for the underlying transport error.
func (*ProviderCapabilityMissingError) UnwrapCause ¶
func (e *ProviderCapabilityMissingError) UnwrapCause() error
UnwrapCause returns the underlying transport error captured at classification.
type Reasoning ¶
func ReasoningTokens ¶
type ReasoningStallError ¶
type ReasoningStallError struct {
// Model is the resolved/concrete model ID that stalled.
Model string
// Timeout is the stall threshold that was exceeded.
Timeout time.Duration
// ReasoningTail is the last N bytes of reasoning_content seen on the
// stream prior to the stall. May be empty if the stall fired before any
// reasoning bytes accumulated.
ReasoningTail string
// PromptID is a caller-supplied identifier correlating this stall to the
// prompt or turn that produced it. Empty when not provided.
PromptID string
}
ReasoningStallError is the structured form of ErrReasoningStall. It carries the fields needed to (a) count stall rate as a metric across runs/models and (b) debug what the model was reasoning about when the stall fired.
func (*ReasoningStallError) Code ¶
func (e *ReasoningStallError) Code() string
Code returns the stable, machine-matchable identifier (ReasoningStallCode).
func (*ReasoningStallError) Error ¶
func (e *ReasoningStallError) Error() string
Error renders a human-readable form compatible with the legacy flat string, preserving model and timeout for log greppability.
func (*ReasoningStallError) Unwrap ¶
func (e *ReasoningStallError) Unwrap() error
Unwrap returns ErrReasoningStall so existing errors.Is callers continue to match.
type Request ¶
type Request struct {
// Prompt is the user's task description.
Prompt string
// SystemPrompt is prepended to the conversation as a system message.
SystemPrompt string
// History carries prior conversation messages into this run.
// Use Result.Messages from a previous Run call to continue a session.
History []Message
// Provider is the configured LLM backend.
Provider Provider
// Tools are the tools available to the agent.
Tools []Tool
// MaxIterations limits the number of tool-call rounds. Zero means no limit.
MaxIterations int
// ReasoningByteLimit is the maximum bytes of pure reasoning_content
// allowed before the stream is aborted. Zero means unlimited (no limit).
ReasoningByteLimit int
// ReasoningStallTimeout overrides the stall deadline for this request.
// Zero means use DefaultReasoningStallTimeout. Not exposed through config
// or the service layer — use programmatically or in tests only.
ReasoningStallTimeout time.Duration
// WorkDir is the working directory for file operations and bash commands.
WorkDir string
// Callback receives events in real time. May be nil.
Callback EventCallback
// Metadata is correlation data (e.g., bead_id) stored on session events.
Metadata map[string]string
// SelectedProvider is the concrete provider chosen by the CLI/config layer.
SelectedProvider string
// SelectedRoute is the routing key used to choose the provider (for example
// a backend pool name or direct provider name).
SelectedRoute string
// RequestedModel is the route key or canonical target that drove selection.
RequestedModel string
// ResolvedModel is the resolved concrete model selected before the run.
ResolvedModel string
// MaxTokens is the maximum number of tokens the model may generate per turn.
// Zero means no explicit limit (provider default applies).
MaxTokens int
// Temperature is the model sampling temperature for each provider call.
// Nil means no explicit setting (provider default applies).
Temperature *float64
// TopP, TopK, MinP, RepetitionPenalty are model sampling fields that
// most OpenAI-compat servers (omlx, lmstudio, vLLM, OpenRouter) accept
// as top-level extras. Nil means no explicit setting (server default
// applies). Setting RepetitionPenalty > 1.0 prevents exact-token loops.
TopP *float64
TopK *int
MinP *float64
RepetitionPenalty *float64
// Seed is an optional model sampling seed. Zero means unset/provider chooses.
Seed int64
// SamplingSource is the resolved sampling-layer attribution, plumbed
// through to the llm.request telemetry event. See ADR-007 §5.
SamplingSource string
// Reasoning controls model-side reasoning with one scalar value. Empty means
// unset; use ReasoningOff or ReasoningTokens(0) for explicit off.
Reasoning Reasoning
// NoStream disables streaming even if the provider supports it.
NoStream bool
// CachePolicy is the prompt-caching opt-out, threaded into per-call
// Options. Valid values: "" / "default" / "off". See agent.CachePolicy*.
CachePolicy string
// Telemetry carries the runtime telemetry implementation. If nil, the
// agent loop falls back to a no-op runtime.
Telemetry telemetry.Telemetry
// PlanningMode, when true, performs one no-tool LLM call before the main
// tool loop. The plan response is injected as an assistant message wrapped
// in <plan> tags so the subsequent tool loop has it as context. Failure of
// the planning call is non-fatal: the run continues without a plan.
PlanningMode bool
// Compactor is called before each agent loop iteration (and after tool
// results). If non-nil, it may compact the message history to fit within
// the context window. Returns the (possibly compacted) messages and result.
// The compaction package provides a ready-made implementation.
Compactor Compactor
}
Request configures a single internal agent loop.
type Response ¶
type Response struct {
Content string `json:"content"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
Usage TokenUsage `json:"usage"`
Model string `json:"model"`
FinishReason string `json:"finish_reason"`
Attempt *AttemptMetadata `json:"attempt,omitempty"`
}
Response is the result of a single internal provider Chat call.
type Result ¶
type Result struct {
// Status indicates whether the run succeeded.
Status Status `json:"status"`
// Output is the final text response from the model.
Output string `json:"output"`
// ToolCalls logs every tool execution during the run.
ToolCalls []ToolCallLog `json:"tool_calls,omitempty"`
// Messages is the conversation history for this run, excluding the
// system prompt. Feed this back into Request.History to continue a session.
Messages []Message `json:"messages,omitempty"`
// Tokens is the accumulated token usage across all iterations.
Tokens TokenUsage `json:"tokens"`
// Duration is the total wall-clock time of the run.
Duration time.Duration `json:"duration_ms"`
// CostUSD is the estimated cost. -1 means unknown (model not in pricing table).
// 0 means free (local model with $0 pricing entry).
CostUSD float64 `json:"cost_usd"`
// Model is the model that was used.
Model string `json:"model"`
// SelectedProvider is the concrete provider chosen before the run.
SelectedProvider string `json:"selected_provider,omitempty"`
// SelectedRoute is the routing key used to choose the provider.
SelectedRoute string `json:"selected_route,omitempty"`
// RequestedModel is the route key or canonical target that drove selection.
RequestedModel string `json:"requested_model,omitempty"`
// ResolvedModel is the resolved concrete model selected before the run.
ResolvedModel string `json:"resolved_model,omitempty"`
// Reasoning is the resolved model-side reasoning control for this run.
Reasoning Reasoning `json:"reasoning,omitempty"`
// AttemptedProviders records providers tried in order by any routing wrapper.
AttemptedProviders []string `json:"attempted_providers,omitempty"`
// FailoverCount records how many times routing advanced to another candidate.
FailoverCount int `json:"failover_count,omitempty"`
// Error is non-nil when Status is StatusError.
Error error `json:"-"`
// SessionID identifies the session log for this run.
SessionID string `json:"session_id"`
}
Result is the outcome of a internal agent loop.
type Role ¶
type Role string
Role identifies the sender of a message in the internal conversation history.
type RoutingReport ¶
type RoutingReport struct {
SelectedProvider string `json:"selected_provider,omitempty"`
SelectedRoute string `json:"selected_route,omitempty"`
AttemptedProviders []string `json:"attempted_providers,omitempty"`
FailoverCount int `json:"failover_count,omitempty"`
}
RoutingReport summarizes dynamic routing behavior from internal wrapper providers.
type RoutingReporter ¶
type RoutingReporter interface {
RoutingReport() RoutingReport
}
RoutingReporter is implemented by internal providers that can expose route-attribution details such as failover attempts.
type SeamOptions ¶
type SeamOptions struct{}
type StreamDelta ¶
type StreamDelta struct {
// ArrivedAt records when the provider produced the delta.
// It is omitted from JSON and used only for local timing measurements.
ArrivedAt time.Time `json:"-"`
// Content is a text fragment (may be empty for tool call chunks).
Content string `json:"content,omitempty"`
// ReasoningContent holds thinking/reasoning tokens emitted by models that
// separate their internal reasoning from the final response (e.g. Qwen3,
// DeepSeek-R1). Captured from choices[0].delta.reasoning_content in the
// OpenAI-compatible streaming format.
ReasoningContent string `json:"reasoning_content,omitempty"`
// ToolCallID is set when a new tool call starts.
ToolCallID string `json:"tool_call_id,omitempty"`
// ToolCallName is set on the first delta of a tool call.
ToolCallName string `json:"tool_call_name,omitempty"`
// ToolCallArgs is a fragment of the tool call's JSON arguments.
ToolCallArgs string `json:"tool_call_args,omitempty"`
// Usage may be set on any delta, including before Done.
// Providers can emit incremental usage updates; consumers should merge them.
Usage *TokenUsage `json:"usage,omitempty"`
// FinishReason is set on the final delta.
FinishReason string `json:"finish_reason,omitempty"`
// Model is set on the first or final delta.
Model string `json:"model,omitempty"`
// Attempt carries provider identity and attribution metadata when known.
Attempt *AttemptMetadata `json:"attempt,omitempty"`
// Done signals the end of the stream.
Done bool `json:"done,omitempty"`
// Err is set when the stream terminated with an error.
// consumeStream returns this error and discards any partial content.
Err error `json:"err,omitempty"`
}
StreamDelta is a single chunk from a streaming response.
type StreamingProvider ¶
type StreamingProvider interface {
Provider
ChatStream(ctx context.Context, messages []Message, tools []ToolDef, opts Options) (<-chan StreamDelta, error)
}
StreamingProvider extends Provider with streaming support. Providers that implement this interface will be used in streaming mode by the agent loop when Request.NoStream is false.
type TimingBreakdown ¶
type TimingBreakdown struct {
FirstToken *time.Duration `json:"first_token,omitempty"`
Queue *time.Duration `json:"queue,omitempty"`
Prefill *time.Duration `json:"prefill,omitempty"`
Generation *time.Duration `json:"generation,omitempty"`
CacheRead *time.Duration `json:"cache_read,omitempty"`
CacheWrite *time.Duration `json:"cache_write,omitempty"`
}
TimingBreakdown captures optional provider timing windows for one internal attempt.
type TokenUsage ¶
type TokenUsage struct {
Input int `json:"input"`
Output int `json:"output"`
CacheRead int `json:"cache_read,omitempty"`
CacheWrite int `json:"cache_write,omitempty"`
Total int `json:"total"`
}
TokenUsage tracks input and output token counts for the internal loop API.
func (*TokenUsage) Add ¶
func (u *TokenUsage) Add(other TokenUsage)
Add accumulates token counts from another TokenUsage.
type Tool ¶
type Tool interface {
// Name returns the tool's identifier.
Name() string
// Description returns a human-readable description for the LLM.
Description() string
// Schema returns the JSON Schema for the tool's parameters.
Schema() json.RawMessage
// Execute runs the tool with the given parameters and returns the result.
Execute(ctx context.Context, params json.RawMessage) (string, error)
// Parallel reports whether this tool is safe to execute concurrently with
// other parallel-flagged tools. Read-only tools return true; tools with
// side effects (writes, shell commands, sub-agents) return false.
Parallel() bool
}
Tool is the internal interface that agent tools implement.
type ToolCall ¶
type ToolCall struct {
ID string `json:"id"`
Name string `json:"name"`
Arguments json.RawMessage `json:"arguments"`
}
ToolCall represents a tool invocation requested by the model in the internal loop API.
type ToolCallLog ¶
type ToolCallLog struct {
Tool string `json:"tool"`
Input json.RawMessage `json:"input"`
Output string `json:"output"`
Duration time.Duration `json:"duration_ms"`
Error string `json:"error,omitempty"`
}
ToolCallLog records one tool execution during a internal agent loop.