core

package
v0.10.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2026 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

Package core contains the internal agent loop and provider-facing DTOs.

Index

Constants

View Source
const (
	ReasoningAuto    = reasoning.ReasoningAuto
	ReasoningOff     = reasoning.ReasoningOff
	ReasoningLow     = reasoning.ReasoningLow
	ReasoningMedium  = reasoning.ReasoningMedium
	ReasoningHigh    = reasoning.ReasoningHigh
	ReasoningMinimal = reasoning.ReasoningMinimal
	ReasoningXHigh   = reasoning.ReasoningXHigh
	ReasoningMax     = reasoning.ReasoningMax
)
View Source
const DefaultReasoningByteLimit = 256 * 1024 // 256KB

DefaultReasoningByteLimit is the default maximum number of bytes of pure reasoning_content allowed before the stream is aborted with ErrReasoningOverflow. Only fires when no content or tool_call delta has been seen yet. Configurable via config.yaml reasoning_byte_limit; 0 = unlimited.

View Source
const DefaultReasoningStallTimeout = 16384 * time.Second

DefaultReasoningStallTimeout is the fallback stall deadline used when no reasoning budget is available to drive the adaptive computation. Sized as the worst-case absolute ceiling: max reasoning budget (32 768 tokens) at the slowest plausible local inference rate (2 tok/s) = 16 384 s (~4.5 h). The adaptive mechanism in consumeStream computes a tighter, budget-aware deadline for any run where the reasoning budget is known.

View Source
const DefaultReasoningTailBytes = 2000

DefaultReasoningTailBytes is the default size of the reasoning-tail buffer captured for inclusion in ReasoningStallError. Sized to roughly the last 500 reasoning tokens at ~4 chars/token. Configurable via streamThresholds.reasoningTailBytes; 0 falls back to this default.

View Source
const ProviderCapabilityMissingErrorCode = "PROVIDER_CAPABILITY_MISSING"

ProviderCapabilityMissingErrorCode is the stable string code carried on ProviderCapabilityMissingError.Code so external callers can pattern-match without depending on Go type identity.

View Source
const ReasoningStallCode = "REASONING_STALL"

ReasoningStallCode is the stable, machine-matchable identifier for a reasoning-stall failure. Callers and downstream observers (benchmark harnesses, dashboards) can match on this constant rather than parsing the wrapped error string.

Variables

View Source
var DefaultPricing = PricingTable{

	"claude-sonnet-4-20250514": {InputPerMTok: 3.00, OutputPerMTok: 15.00},
	"claude-haiku-4-20250414":  {InputPerMTok: 0.80, OutputPerMTok: 4.00},
	"claude-opus-4-20250515":   {InputPerMTok: 15.00, OutputPerMTok: 75.00},

	"gpt-4o":      {InputPerMTok: 2.50, OutputPerMTok: 10.00},
	"gpt-4o-mini": {InputPerMTok: 0.15, OutputPerMTok: 0.60},
	"gpt-4.1":     {InputPerMTok: 2.00, OutputPerMTok: 8.00},
	"o3-mini":     {InputPerMTok: 1.10, OutputPerMTok: 4.40},

	"qwen3.5-7b":   {InputPerMTok: 0, OutputPerMTok: 0},
	"llama-3.2-8b": {InputPerMTok: 0, OutputPerMTok: 0},
}

DefaultPricing contains built-in pricing for common models.

View Source
var ErrCompactionNoFit = errors.New("agent: compaction could not fit within the effective context window")

ErrCompactionNoFit reports that compaction was needed but could not produce a message history that fits within the effective context window.

View Source
var ErrCompactionStuck = errors.New("agent: compaction stuck: consecutive attempts failed to produce a compacted history")

ErrCompactionStuck reports that compaction was requested but failed to produce a compacted history for multiple consecutive attempts. This prevents a runaway loop where compaction.end(no_compaction=true) events fire indefinitely without making progress.

View Source
var ErrProviderCapabilityMissing = errors.New("agent: provider capability missing")

ErrProviderCapabilityMissing is the sentinel for a provider/server reporting that a feature required to serve the request is not implemented for the active model+config combination (for example, mlx_lm's "NotImplementedError: RotatingKVCache Quantization NYI"). The condition is deterministic for a given provider+model, so the agent loop must not retry.

View Source
var ErrReasoningOverflow = errors.New("agent: reasoning overflow: model produced only reasoning tokens past byte limit")

ErrReasoningOverflow is returned by consumeStream when the model has emitted more than reasoningByteLimit bytes of pure reasoning_content without producing any content or tool_call deltas. The model is stuck in a runaway reasoning loop and the stream is aborted early.

View Source
var ErrReasoningStall = errors.New("agent: reasoning stall: model produced only reasoning tokens past stall timeout")

ErrReasoningStall is returned by consumeStream when only reasoning_content deltas have arrived for longer than reasoningStallTimeout with no content or tool_call delta. The model appears to be making no forward progress.

Stall failures are also wrapped in a ReasoningStallError carrying structured fields (model, timeout, reasoning tail, prompt id). Callers that need the machine-matchable code or the captured reasoning context should use errors.As with *ReasoningStallError.

View Source
var ErrToolCallLoop = errors.New("agent: identical tool calls repeated, aborting loop")

ErrToolCallLoop reports that the agent produced identical tool calls for toolCallLoopLimit consecutive turns, indicating a non-converging loop.

Functions

func IsContextOverflowError

func IsContextOverflowError(err error) bool

IsContextOverflowError reports whether err indicates the request exceeded the model's context window.

func IsTransientError

func IsTransientError(err error) bool

IsTransientError reports whether err is a transient provider error that is safe to retry (network issues, rate limits, server overload). Returns false for fatal errors (auth failures, bad requests, etc.).

Types

type AttemptMetadata

type AttemptMetadata struct {
	AttemptIndex   int              `json:"attempt_index,omitempty"`
	ProviderName   string           `json:"provider_name,omitempty"`
	ProviderSystem string           `json:"provider_system,omitempty"`
	Route          string           `json:"route,omitempty"`
	ServerAddress  string           `json:"server_address,omitempty"`
	ServerPort     int              `json:"server_port,omitempty"`
	RequestedModel string           `json:"requested_model,omitempty"`
	ResponseModel  string           `json:"response_model,omitempty"`
	ResolvedModel  string           `json:"resolved_model,omitempty"`
	Cost           *CostAttribution `json:"cost,omitempty"`
	Timing         *TimingBreakdown `json:"timing,omitempty"`
}

AttemptMetadata captures the structured identity and attribution data for a single internal provider attempt.

identity is reported through routing_decision and final ServiceEvent data.

type CompactionResult

type CompactionResult struct {
	// Summary is the generated summary text.
	Summary string `json:"summary"`
	// FileOps tracks files read and modified.
	FileOps map[string]any `json:"file_ops,omitempty"`
	// TokensBefore is the estimated token count before compaction.
	TokensBefore int `json:"tokens_before"`
	// TokensAfter is the estimated token count after compaction.
	TokensAfter int `json:"tokens_after"`
	// Warning is a degradation warning, if any.
	Warning string `json:"warning,omitempty"`
}

CompactionResult holds the output of a internal compaction pass.

type Compactor

type Compactor func(ctx context.Context, messages []Message, provider Provider, toolCalls []ToolCallLog) ([]Message, *CompactionResult, error)

Compactor is the internal callback shape used by Request.Compactor.

type CostAttribution

type CostAttribution struct {
	Source           CostSource      `json:"source,omitempty"`
	Currency         string          `json:"currency,omitempty"`
	Amount           *float64        `json:"amount,omitempty"`
	InputAmount      *float64        `json:"input_amount,omitempty"`
	OutputAmount     *float64        `json:"output_amount,omitempty"`
	CacheReadAmount  *float64        `json:"cache_read_amount,omitempty"`
	CacheWriteAmount *float64        `json:"cache_write_amount,omitempty"`
	ReasoningAmount  *float64        `json:"reasoning_amount,omitempty"`
	PricingRef       string          `json:"pricing_ref,omitempty"`
	Raw              json.RawMessage `json:"raw,omitempty"`
}

CostAttribution captures the provenance of the cost associated with one internal provider attempt.

type CostSource

type CostSource string

CostSource identifies where the recorded cost originated in the internal loop.

const (
	CostSourceProviderReported CostSource = "provider_reported"
	CostSourceGatewayReported  CostSource = "gateway_reported"
	CostSourceConfigured       CostSource = "configured"
	CostSourceUnknown          CostSource = "unknown"
)

type Event

type Event struct {
	SessionID string          `json:"session_id"`
	Seq       int             `json:"seq"`
	Type      EventType       `json:"type"`
	Timestamp time.Time       `json:"ts"`
	Data      json.RawMessage `json:"data"`
}

Event is a structured event emitted during a internal agent loop.

uses ServiceEvent.

type EventCallback

type EventCallback func(Event)

EventCallback receives events during a internal agent loop.

returns a channel of ServiceEvent values.

type EventType

type EventType string

EventType identifies the kind of event emitted during a internal agent loop.

uses ServiceEvent.

const (
	EventSessionStart    EventType = "session.start"
	EventLLMRequest      EventType = "llm.request"
	EventLLMResponse     EventType = "llm.response"
	EventToolCall        EventType = "tool.call"
	EventSessionEnd      EventType = "session.end"
	EventLLMDelta        EventType = "llm.delta"
	EventCompactionStart EventType = "compaction.start"
	EventCompactionEnd   EventType = "compaction.end"
	// EventOverride and EventRejectedOverride mirror the service-stream
	// override / rejected_override events into the session log so windowed
	// reporting (UsageReport, ADR-006 §5) can rebuild routing-quality
	// metrics across restarts and beyond the in-memory ring's retention.
	EventOverride         EventType = "override"
	EventRejectedOverride EventType = "rejected_override"
	// EventReasoningStall fires immediately before consumeStream returns
	// ErrReasoningStall. Its data payload carries model, timeout_ms,
	// reasoning_tail, and prompt_id so harnesses and dashboards can count
	// stall rate and debug what the model was reasoning about at the time.
	EventReasoningStall EventType = "reasoning.stall"
	// EventPlanningTurn fires after the planning LLM call completes when
	// Request.PlanningMode is enabled. Data: "plan" (string), "usage"
	// (TokenUsage), "model" (string).
	EventPlanningTurn EventType = "planning.turn"
)

type Message

type Message struct {
	Role       Role       `json:"role"`
	Content    string     `json:"content,omitempty"`
	ToolCalls  []ToolCall `json:"tool_calls,omitempty"`
	ToolCallID string     `json:"tool_call_id,omitempty"`
}

Message is a single message in the internal conversation history.

type ModelPricing

type ModelPricing struct {
	InputPerMTok   float64 `json:"input_per_mtok"`
	OutputPerMTok  float64 `json:"output_per_mtok"`
	CacheReadPerM  float64 `json:"cache_read_per_m,omitempty"`
	CacheWritePerM float64 `json:"cache_write_per_m,omitempty"`
}

ModelPricing holds per-million-token costs for a model.

type Options

type Options struct {
	SeamOptions

	Model       string   `json:"model,omitempty"`
	Temperature *float64 `json:"temperature,omitempty"`
	TopP        *float64 `json:"top_p,omitempty"`
	TopK        *int     `json:"top_k,omitempty"`
	MinP        *float64 `json:"min_p,omitempty"`
	// RepetitionPenalty is the OpenAI-compat field name; openai-compat
	// servers (omlx, lmstudio, vLLM, OpenRouter) accept it as a top-level
	// extra. >1.0 discourages exact repeats; 1.05–1.1 is typical for Qwen.
	RepetitionPenalty *float64 `json:"repetition_penalty,omitempty"`
	Seed              int64    `json:"seed,omitempty"`
	MaxTokens         int      `json:"max_tokens,omitempty"`
	Stop              []string `json:"stop,omitempty"`
	// SamplingSource is a comma-separated record of which resolution layers
	// supplied non-nil sampler fields, e.g. "catalog" or
	// "catalog,provider_config". Used only for the llm.request telemetry
	// event; never sent on the provider wire. See ADR-007 §5.
	SamplingSource string `json:"sampling_source,omitempty"`
	// Reasoning controls model-side reasoning with one scalar value. Empty means
	// unset; use ReasoningOff or ReasoningTokens(0) for explicit off.
	Reasoning Reasoning `json:"reasoning,omitempty"`

	// CachePolicy is the public prompt-caching opt-out plumbed from
	// ServiceExecuteRequest. Valid values: "" / "default" / "off". Providers
	// that implement caching (currently Anthropic, via cache_control writes
	// in a follow-up bead) consult this field; providers without caching
	// ignore it.
	CachePolicy string `json:"cache_policy,omitempty"`
}

Options configures a single internal provider Chat call.

SeamOptions is embedded to carry test injection seams when the testseam build tag is set; in production builds it is an empty struct with no fields.

type PricingTable

type PricingTable map[string]ModelPricing

PricingTable maps model IDs to their pricing.

func LoadCatalogPricing

func LoadCatalogPricing(cat *modelcatalog.Catalog) PricingTable

LoadCatalogPricing builds a PricingTable from catalog pricing data. Entries from the catalog supplement DefaultPricing; catalog values take precedence.

func (PricingTable) EstimateCost

func (pt PricingTable) EstimateCost(model string, inputTokens, outputTokens int) float64

EstimateCost returns the estimated cost in USD for the given token usage. Returns -1 if the model is not in the pricing table (unknown). Returns 0 if the model is free (local).

type Provider

type Provider interface {
	Chat(ctx context.Context, messages []Message, tools []ToolDef, opts Options) (Response, error)
}

Provider is the internal interface that LLM backends implement.

type ProviderCapabilityMissingError

type ProviderCapabilityMissingError struct {
	// Code is the stable error code; always equals ProviderCapabilityMissingErrorCode.
	Code string
	// Capability is the missing capability name extracted from the server
	// message (e.g. "RotatingKVCache Quantization"). Empty when the upstream
	// message did not name a recognizable capability.
	Capability string
	// ServerMessage is the raw error message returned by the upstream server.
	ServerMessage string
	// Cause is the underlying transport error, preserved for chained Unwrap
	// callers that want the original SDK error.
	Cause error
}

ProviderCapabilityMissingError reports that an upstream provider rejected the request because the server cannot implement a capability the model requires under the current configuration. It carries a stable Code, the extracted Capability name (when one can be parsed out of the server message), and the raw ServerMessage so logs and tests can inspect either.

Use errors.As to extract Code/Capability/ServerMessage in routing or telemetry; errors.Is(err, ErrProviderCapabilityMissing) matches without requiring callers to know the concrete struct.

func (*ProviderCapabilityMissingError) Error

func (*ProviderCapabilityMissingError) Is

func (*ProviderCapabilityMissingError) Unwrap

Unwrap reports the sentinel so errors.Is(err, ErrProviderCapabilityMissing) works through wrapping. Use UnwrapCause for the underlying transport error.

func (*ProviderCapabilityMissingError) UnwrapCause

func (e *ProviderCapabilityMissingError) UnwrapCause() error

UnwrapCause returns the underlying transport error captured at classification.

type Reasoning

type Reasoning = reasoning.Reasoning

func ReasoningTokens

func ReasoningTokens(n int) Reasoning

type ReasoningStallError

type ReasoningStallError struct {
	// Model is the resolved/concrete model ID that stalled.
	Model string
	// Timeout is the stall threshold that was exceeded.
	Timeout time.Duration
	// ReasoningTail is the last N bytes of reasoning_content seen on the
	// stream prior to the stall. May be empty if the stall fired before any
	// reasoning bytes accumulated.
	ReasoningTail string
	// PromptID is a caller-supplied identifier correlating this stall to the
	// prompt or turn that produced it. Empty when not provided.
	PromptID string
}

ReasoningStallError is the structured form of ErrReasoningStall. It carries the fields needed to (a) count stall rate as a metric across runs/models and (b) debug what the model was reasoning about when the stall fired.

func (*ReasoningStallError) Code

func (e *ReasoningStallError) Code() string

Code returns the stable, machine-matchable identifier (ReasoningStallCode).

func (*ReasoningStallError) Error

func (e *ReasoningStallError) Error() string

Error renders a human-readable form compatible with the legacy flat string, preserving model and timeout for log greppability.

func (*ReasoningStallError) Unwrap

func (e *ReasoningStallError) Unwrap() error

Unwrap returns ErrReasoningStall so existing errors.Is callers continue to match.

type Request

type Request struct {
	// Prompt is the user's task description.
	Prompt string

	// SystemPrompt is prepended to the conversation as a system message.
	SystemPrompt string

	// History carries prior conversation messages into this run.
	// Use Result.Messages from a previous Run call to continue a session.
	History []Message

	// Provider is the configured LLM backend.
	Provider Provider

	// Tools are the tools available to the agent.
	Tools []Tool

	// MaxIterations limits the number of tool-call rounds. Zero means no limit.
	MaxIterations int

	// ReasoningByteLimit is the maximum bytes of pure reasoning_content
	// allowed before the stream is aborted. Zero means unlimited (no limit).
	ReasoningByteLimit int

	// ReasoningStallTimeout overrides the stall deadline for this request.
	// Zero means use DefaultReasoningStallTimeout. Not exposed through config
	// or the service layer — use programmatically or in tests only.
	ReasoningStallTimeout time.Duration

	// WorkDir is the working directory for file operations and bash commands.
	WorkDir string

	// Callback receives events in real time. May be nil.
	Callback EventCallback

	// Metadata is correlation data (e.g., bead_id) stored on session events.
	Metadata map[string]string

	// SelectedProvider is the concrete provider chosen by the CLI/config layer.
	SelectedProvider string

	// SelectedRoute is the routing key used to choose the provider (for example
	// a backend pool name or direct provider name).
	SelectedRoute string

	// RequestedModel is the route key or canonical target that drove selection.
	RequestedModel string

	// RequestedModelRef is the caller-supplied model catalog reference.
	RequestedModelRef string

	// ResolvedModelRef is the resolved catalog target reference when model
	// selection came from a model_ref.
	ResolvedModelRef string

	// ResolvedModel is the resolved concrete model selected before the run.
	ResolvedModel string

	// MaxTokens is the maximum number of tokens the model may generate per turn.
	// Zero means no explicit limit (provider default applies).
	MaxTokens int

	// Temperature is the model sampling temperature for each provider call.
	// Nil means no explicit setting (provider default applies).
	Temperature *float64

	// TopP, TopK, MinP, RepetitionPenalty are model sampling fields that
	// most OpenAI-compat servers (omlx, lmstudio, vLLM, OpenRouter) accept
	// as top-level extras. Nil means no explicit setting (server default
	// applies). Setting RepetitionPenalty > 1.0 prevents exact-token loops.
	TopP              *float64
	TopK              *int
	MinP              *float64
	RepetitionPenalty *float64

	// Seed is an optional model sampling seed. Zero means unset/provider chooses.
	Seed int64

	// SamplingSource is the resolved sampling-layer attribution, plumbed
	// through to the llm.request telemetry event. See ADR-007 §5.
	SamplingSource string

	// Reasoning controls model-side reasoning with one scalar value. Empty means
	// unset; use ReasoningOff or ReasoningTokens(0) for explicit off.
	Reasoning Reasoning

	// NoStream disables streaming even if the provider supports it.
	NoStream bool

	// CachePolicy is the prompt-caching opt-out, threaded into per-call
	// Options. Valid values: "" / "default" / "off". See agent.CachePolicy*.
	CachePolicy string

	// Telemetry carries the runtime telemetry implementation. If nil, the
	// agent loop falls back to a no-op runtime.
	Telemetry telemetry.Telemetry

	// PlanningMode, when true, performs one no-tool LLM call before the main
	// tool loop. The plan response is injected as an assistant message wrapped
	// in <plan> tags so the subsequent tool loop has it as context. Failure of
	// the planning call is non-fatal: the run continues without a plan.
	PlanningMode bool

	// Compactor is called before each agent loop iteration (and after tool
	// results). If non-nil, it may compact the message history to fit within
	// the context window. Returns the (possibly compacted) messages and result.
	// The compaction package provides a ready-made implementation.
	Compactor Compactor
}

Request configures a single internal agent loop.

type Response

type Response struct {
	Content      string           `json:"content"`
	ToolCalls    []ToolCall       `json:"tool_calls,omitempty"`
	Usage        TokenUsage       `json:"usage"`
	Model        string           `json:"model"`
	FinishReason string           `json:"finish_reason"`
	Attempt      *AttemptMetadata `json:"attempt,omitempty"`
}

Response is the result of a single internal provider Chat call.

type Result

type Result struct {
	// Status indicates whether the run succeeded.
	Status Status `json:"status"`

	// Output is the final text response from the model.
	Output string `json:"output"`

	// ToolCalls logs every tool execution during the run.
	ToolCalls []ToolCallLog `json:"tool_calls,omitempty"`

	// Messages is the conversation history for this run, excluding the
	// system prompt. Feed this back into Request.History to continue a session.
	Messages []Message `json:"messages,omitempty"`

	// Tokens is the accumulated token usage across all iterations.
	Tokens TokenUsage `json:"tokens"`

	// Duration is the total wall-clock time of the run.
	Duration time.Duration `json:"duration_ms"`

	// CostUSD is the estimated cost. -1 means unknown (model not in pricing table).
	// 0 means free (local model with $0 pricing entry).
	CostUSD float64 `json:"cost_usd"`

	// Model is the model that was used.
	Model string `json:"model"`

	// SelectedProvider is the concrete provider chosen before the run.
	SelectedProvider string `json:"selected_provider,omitempty"`

	// SelectedRoute is the routing key used to choose the provider.
	SelectedRoute string `json:"selected_route,omitempty"`

	// RequestedModel is the route key or canonical target that drove selection.
	RequestedModel string `json:"requested_model,omitempty"`

	// RequestedModelRef is the caller-supplied model catalog reference.
	RequestedModelRef string `json:"requested_model_ref,omitempty"`

	// ResolvedModelRef is the resolved catalog target reference.
	ResolvedModelRef string `json:"resolved_model_ref,omitempty"`

	// ResolvedModel is the resolved concrete model selected before the run.
	ResolvedModel string `json:"resolved_model,omitempty"`

	// Reasoning is the resolved model-side reasoning control for this run.
	Reasoning Reasoning `json:"reasoning,omitempty"`

	// AttemptedProviders records providers tried in order by any routing wrapper.
	AttemptedProviders []string `json:"attempted_providers,omitempty"`

	// FailoverCount records how many times routing advanced to another candidate.
	FailoverCount int `json:"failover_count,omitempty"`

	// Error is non-nil when Status is StatusError.
	Error error `json:"-"`

	// SessionID identifies the session log for this run.
	SessionID string `json:"session_id"`
}

Result is the outcome of a internal agent loop.

func Run

func Run(ctx context.Context, req Request) (Result, error)

Run executes the internal agent loop: send prompt, process tool calls, repeat until the model produces a final text response or limits are reached.

type Role

type Role string

Role identifies the sender of a message in the internal conversation history.

const (
	RoleSystem    Role = "system"
	RoleUser      Role = "user"
	RoleAssistant Role = "assistant"
	RoleTool      Role = "tool"
)

type RoutingReport

type RoutingReport struct {
	SelectedProvider   string   `json:"selected_provider,omitempty"`
	SelectedRoute      string   `json:"selected_route,omitempty"`
	AttemptedProviders []string `json:"attempted_providers,omitempty"`
	FailoverCount      int      `json:"failover_count,omitempty"`
}

RoutingReport summarizes dynamic routing behavior from internal wrapper providers.

type RoutingReporter

type RoutingReporter interface {
	RoutingReport() RoutingReport
}

RoutingReporter is implemented by internal providers that can expose route-attribution details such as failover attempts.

type SeamOptions

type SeamOptions struct{}

type Status

type Status string

Status represents the outcome of a internal agent loop.

const (
	StatusSuccess        Status = "success"
	StatusIterationLimit Status = "iteration_limit"
	StatusCancelled      Status = "cancelled"
	StatusError          Status = "error"
)

type StreamDelta

type StreamDelta struct {
	// ArrivedAt records when the provider produced the delta.
	// It is omitted from JSON and used only for local timing measurements.
	ArrivedAt time.Time `json:"-"`

	// Content is a text fragment (may be empty for tool call chunks).
	Content string `json:"content,omitempty"`

	// ReasoningContent holds thinking/reasoning tokens emitted by models that
	// separate their internal reasoning from the final response (e.g. Qwen3,
	// DeepSeek-R1). Captured from choices[0].delta.reasoning_content in the
	// OpenAI-compatible streaming format.
	ReasoningContent string `json:"reasoning_content,omitempty"`

	// ToolCallID is set when a new tool call starts.
	ToolCallID string `json:"tool_call_id,omitempty"`
	// ToolCallName is set on the first delta of a tool call.
	ToolCallName string `json:"tool_call_name,omitempty"`
	// ToolCallArgs is a fragment of the tool call's JSON arguments.
	ToolCallArgs string `json:"tool_call_args,omitempty"`

	// Usage may be set on any delta, including before Done.
	// Providers can emit incremental usage updates; consumers should merge them.
	Usage *TokenUsage `json:"usage,omitempty"`

	// FinishReason is set on the final delta.
	FinishReason string `json:"finish_reason,omitempty"`

	// Model is set on the first or final delta.
	Model string `json:"model,omitempty"`

	// Attempt carries provider identity and attribution metadata when known.
	Attempt *AttemptMetadata `json:"attempt,omitempty"`

	// Done signals the end of the stream.
	Done bool `json:"done,omitempty"`

	// Err is set when the stream terminated with an error.
	// consumeStream returns this error and discards any partial content.
	Err error `json:"err,omitempty"`
}

StreamDelta is a single chunk from a streaming response.

type StreamingProvider

type StreamingProvider interface {
	Provider
	ChatStream(ctx context.Context, messages []Message, tools []ToolDef, opts Options) (<-chan StreamDelta, error)
}

StreamingProvider extends Provider with streaming support. Providers that implement this interface will be used in streaming mode by the agent loop when Request.NoStream is false.

type TimingBreakdown

type TimingBreakdown struct {
	FirstToken *time.Duration `json:"first_token,omitempty"`
	Queue      *time.Duration `json:"queue,omitempty"`
	Prefill    *time.Duration `json:"prefill,omitempty"`
	Generation *time.Duration `json:"generation,omitempty"`
	CacheRead  *time.Duration `json:"cache_read,omitempty"`
	CacheWrite *time.Duration `json:"cache_write,omitempty"`
}

TimingBreakdown captures optional provider timing windows for one internal attempt.

type TokenUsage

type TokenUsage struct {
	Input      int `json:"input"`
	Output     int `json:"output"`
	CacheRead  int `json:"cache_read,omitempty"`
	CacheWrite int `json:"cache_write,omitempty"`
	Total      int `json:"total"`
}

TokenUsage tracks input and output token counts for the internal loop API.

func (*TokenUsage) Add

func (u *TokenUsage) Add(other TokenUsage)

Add accumulates token counts from another TokenUsage.

type Tool

type Tool interface {
	// Name returns the tool's identifier.
	Name() string
	// Description returns a human-readable description for the LLM.
	Description() string
	// Schema returns the JSON Schema for the tool's parameters.
	Schema() json.RawMessage
	// Execute runs the tool with the given parameters and returns the result.
	Execute(ctx context.Context, params json.RawMessage) (string, error)
	// Parallel reports whether this tool is safe to execute concurrently with
	// other parallel-flagged tools. Read-only tools return true; tools with
	// side effects (writes, shell commands, sub-agents) return false.
	Parallel() bool
}

Tool is the internal interface that agent tools implement.

type ToolCall

type ToolCall struct {
	ID        string          `json:"id"`
	Name      string          `json:"name"`
	Arguments json.RawMessage `json:"arguments"`
}

ToolCall represents a tool invocation requested by the model in the internal loop API.

type ToolCallLog

type ToolCallLog struct {
	Tool     string          `json:"tool"`
	Input    json.RawMessage `json:"input"`
	Output   string          `json:"output"`
	Duration time.Duration   `json:"duration_ms"`
	Error    string          `json:"error,omitempty"`
}

ToolCallLog records one tool execution during a internal agent loop.

type ToolDef

type ToolDef struct {
	Name        string          `json:"name"`
	Description string          `json:"description"`
	Parameters  json.RawMessage `json:"parameters"` // JSON Schema
}

ToolDef describes a tool for the internal LLM provider interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL