agent

package

v0.1.0 Latest Latest Go to latest Published: May 14, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/vitas/evidra-bench

Links

Open Source Insights

Documentation ¶

Overview ¶

Package agent provides pluggable LLM providers and a multi-turn tool-use agent loop.

Index ¶

func BackoffDuration(cfg RetryConfig, attempt int, headers http.Header) time.Duration
func IsRetryable(statusCode int) bool
func SleepWithContext(ctx context.Context, d time.Duration) error
type AnthropicProvider
- func NewAnthropicProvider() *AnthropicProvider
- func (p *AnthropicProvider) Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
- func (p *AnthropicProvider) Name() string
type BifrostProvider
- func NewBifrostProvider() *BifrostProvider
- func (p *BifrostProvider) Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
- func (p *BifrostProvider) Name() string
type ChatRequest
type ChatResponse
- func (r *ChatResponse) Done() bool
type ClaudeProvider
- func NewClaudeProvider() *ClaudeProvider
- func (p *ClaudeProvider) Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
- func (p *ClaudeProvider) Name() string
type CostEstimate
- func EstimateCost(model string, usage Usage) CostEstimate
- func (c CostEstimate) String() string
type Executor
type LoopConfig
type LoopResult
- func RunLoop(ctx context.Context, cfg LoopConfig) (*LoopResult, error)
type MCPExecutor
- func NewMCPExecutor(ctx context.Context, command string, extraEnv []string) (*MCPExecutor, error)
- func (e *MCPExecutor) Close() error
- func (e *MCPExecutor) Execute(ctx context.Context, tc ToolCall) string
- func (e *MCPExecutor) Tools(ctx context.Context) ([]ToolDef, error)
type Message
type ModelPricing
- func LookupPricing(model string) ModelPricing
type Provider
- func ResolveProvider(name string) (Provider, error)
type RateLimitError
- func (e *RateLimitError) Error() string
type RetryConfig
- func DefaultRetryConfig() RetryConfig
type ToolCall
type ToolDef
- func BenchTools() []ToolDef
type ToolExecutor
- func (e *ToolExecutor) Execute(ctx context.Context, tc ToolCall) string
type Usage

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BackoffDuration ¶

func BackoffDuration(cfg RetryConfig, attempt int, headers http.Header) time.Duration

BackoffDuration calculates the next backoff duration. If the response includes a Retry-After header, uses that instead.

func IsRetryable ¶

func IsRetryable(statusCode int) bool

IsRetryable returns true if the HTTP status code is retryable.

func SleepWithContext ¶

func SleepWithContext(ctx context.Context, d time.Duration) error

SleepWithContext sleeps for the given duration, respecting context cancellation.

Types ¶

type AnthropicProvider ¶

type AnthropicProvider struct {
	BaseURL    string
	APIKey     string
	HTTPClient *http.Client
	Retry      RetryConfig
}

AnthropicProvider talks to the Anthropic Messages API directly.

func NewAnthropicProvider ¶

func NewAnthropicProvider() *AnthropicProvider

NewAnthropicProvider creates an AnthropicProvider from environment variables.

func (*AnthropicProvider) Chat ¶

func (p *AnthropicProvider) Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)

Chat sends a chat completion request to the Anthropic Messages API.

func (*AnthropicProvider) Name ¶

func (p *AnthropicProvider) Name() string

type BifrostProvider ¶

type BifrostProvider struct {
	BaseURL    string
	HTTPClient *http.Client
	Retry      RetryConfig
	// contains filtered or unexported fields
}

BifrostProvider talks to any LLM via an OpenAI-compatible API proxy.

func NewBifrostProvider ¶

func NewBifrostProvider() *BifrostProvider

NewBifrostProvider creates a BifrostProvider from environment variables. Bifrost is an OpenAI-compatible gateway that handles multi-provider routing. Point INFRA_BENCH_BIFROST_URL at your Bifrost instance. Set INFRA_BENCH_BIFROST_RPM to throttle requests (e.g. "10" for 10 req/min).

func (*BifrostProvider) Chat ¶

func (p *BifrostProvider) Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)

Chat sends a chat completion request to the Bifrost gateway with adaptive retry.

func (*BifrostProvider) Name ¶

func (p *BifrostProvider) Name() string

type ChatRequest ¶

type ChatRequest struct {
	Model       string    `json:"model"`
	Messages    []Message `json:"messages"`
	Tools       []ToolDef `json:"tools,omitempty"`
	Temperature float64   `json:"temperature"`
	MaxTokens   int       `json:"max_tokens,omitempty"`
}

ChatRequest is a single turn in a conversation.

type ChatResponse ¶

type ChatResponse struct {
	Content          string     `json:"content,omitempty"`
	ReasoningContent string     `json:"reasoning_content,omitempty"` // DeepSeek Reasoner
	ToolCalls        []ToolCall `json:"tool_calls,omitempty"`
	Usage            Usage      `json:"usage"`
}

ChatResponse is the model's response to a ChatRequest.

func (*ChatResponse) Done ¶

func (r *ChatResponse) Done() bool

Done returns true if the model is finished (no more tool calls).

type ClaudeProvider ¶

type ClaudeProvider struct{}

ClaudeProvider uses the Claude CLI (`claude -p`) as an LLM backend.

Claude CLI doesn't support arbitrary tool definitions via flags like the OpenAI API does. Instead, we embed tool schemas in the system prompt and instruct Claude to output structured JSON tool calls. The stream-json output format captures tool_use events when Claude invokes its built-in tools, and we parse structured JSON blocks for our custom tools.

func NewClaudeProvider ¶

func NewClaudeProvider() *ClaudeProvider

NewClaudeProvider creates a new ClaudeProvider.

func (*ClaudeProvider) Chat ¶

func (p *ClaudeProvider) Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)

Chat sends a prompt to Claude CLI and parses the response. Tools from req.Tools are embedded in the system prompt as structured descriptions so Claude knows what's available.

func (*ClaudeProvider) Name ¶

func (p *ClaudeProvider) Name() string

type CostEstimate ¶

type CostEstimate struct {
	InputTokens  int
	OutputTokens int
	InputCost    float64
	OutputCost   float64
	TotalCost    float64
	Model        string
	Currency     string
}

CostEstimate is the estimated cost for a run.

func EstimateCost ¶

func EstimateCost(model string, usage Usage) CostEstimate

EstimateCost calculates the cost for a given token usage and model. Cache tokens are included: creation tokens at full price, read tokens at 10% (standard cache discount).

func (CostEstimate) String ¶

func (c CostEstimate) String() string

String formats the cost estimate for display.

type Executor ¶

type Executor interface {
	Execute(ctx context.Context, tc ToolCall) string
}

Executor runs tool calls. Implemented by ToolExecutor.

type LoopConfig ¶

type LoopConfig struct {
	Provider     Provider
	Executor     Executor
	Model        string
	MaxTurns     int
	MemoryWindow int // -1 = full history (default), 0 = stateless, N = keep last N exchanges
	Temperature  float64
	MaxTokens    int
	SystemPrompt string
	TaskPrompt   string

	// Tools overrides the default tool definitions. Used by MCPExecutor to pass
	// MCP server tools.
	Tools []ToolDef

	// InjectChan receives user messages to inject mid-run (e.g., stage agent_goal).
	// Non-blocking: drained after each tool execution round. Nil = disabled.
	InjectChan <-chan Message
	// MemoryResetChan receives memory window changes mid-run (e.g., break.memory).
	// Values: 0 = reset (system+task only), N = compact to last N exchanges.
	// Non-blocking: checked after each tool execution round. Nil = disabled.
	MemoryResetChan <-chan int
}

LoopConfig configures the agent loop.

type LoopResult ¶

type LoopResult struct {
	Turns        int
	Messages     []Message
	TotalUsage   Usage
	FinalOutput  string
	Completed    bool
	Duration     time.Duration
	MemoryWindow int
}

LoopResult is the outcome of the agent loop.

func RunLoop ¶

func RunLoop(ctx context.Context, cfg LoopConfig) (*LoopResult, error)

RunLoop executes the multi-turn tool-use agent loop.

type MCPExecutor ¶

type MCPExecutor struct {
	// contains filtered or unexported fields
}

MCPExecutor routes tool calls through an MCP server process.

func NewMCPExecutor ¶

func NewMCPExecutor(ctx context.Context, command string, extraEnv []string) (*MCPExecutor, error)

NewMCPExecutor starts an MCP server subprocess and connects via stdio. The command is split on spaces: "MCP server --signing-mode optional" Extra env vars (e.g., KUBECONFIG) are injected into the subprocess.

func (*MCPExecutor) Close ¶

func (e *MCPExecutor) Close() error

Close terminates the MCP server session.

func (*MCPExecutor) Execute ¶

func (e *MCPExecutor) Execute(ctx context.Context, tc ToolCall) string

Execute calls a tool on the MCP server and returns the result text.

func (*MCPExecutor) Tools ¶

func (e *MCPExecutor) Tools(ctx context.Context) ([]ToolDef, error)

Tools returns the tool definitions from the MCP server, converted to the harness ToolDef format for the LLM.

type Message ¶

type Message struct {
	Role             string     `json:"role"`
	Content          string     `json:"content,omitempty"`
	ReasoningContent string     `json:"reasoning_content,omitempty"` // DeepSeek Reasoner thinking tokens
	ToolCalls        []ToolCall `json:"tool_calls,omitempty"`
	ToolCallID       string     `json:"tool_call_id,omitempty"`
}

Message is a single message in the conversation.

type ModelPricing ¶

type ModelPricing struct {
	InputPerMillion         float64
	OutputPerMillion        float64
	CacheHitInputPerMillion float64
}

ModelPricing holds per-token pricing for a model (USD per 1M tokens).

func LookupPricing ¶

func LookupPricing(model string) ModelPricing

LookupPricing returns pricing for a model. Returns zero pricing for unknown models. Uses longest-prefix-match to avoid ambiguity (e.g. "openai/gpt-4o-mini" matches "openai/gpt-4o-mini" not "openai/gpt-4o").

type Provider ¶

type Provider interface {
	Name() string
	Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
}

Provider sends messages to an LLM and gets responses.

func ResolveProvider ¶

func ResolveProvider(name string) (Provider, error)

ResolveProvider returns a Provider by name.

type RateLimitError ¶

type RateLimitError struct {
	StatusCode int
	Body       string
	RetryAfter time.Duration
}

RateLimitError wraps a rate limit response for logging.

func (*RateLimitError) Error ¶

func (e *RateLimitError) Error() string

type RetryConfig ¶

type RetryConfig struct {
	MaxRetries     int           // max retry attempts (default: 3)
	InitialBackoff time.Duration // starting backoff (default: 2s)
	MaxBackoff     time.Duration // ceiling for backoff (default: 60s)
	Multiplier     float64       // backoff multiplier (default: 2.0)
}

RetryConfig configures adaptive retry behavior.

func DefaultRetryConfig ¶

func DefaultRetryConfig() RetryConfig

DefaultRetryConfig returns sensible defaults.

type ToolCall ¶

type ToolCall struct {
	ID        string `json:"id"`
	Name      string `json:"name"`
	Arguments string `json:"arguments"`
}

ToolCall is a function call requested by the model.

type ToolDef ¶

type ToolDef struct {
	Name        string         `json:"name"`
	Description string         `json:"description"`
	Parameters  map[string]any `json:"parameters"`
}

ToolDef defines a tool the model can call.

func BenchTools ¶

func BenchTools() []ToolDef

BenchTools returns the tool definitions exposed to the LLM.

type ToolExecutor ¶

type ToolExecutor struct {
	KubeconfigPath string
	ExtraEnv       []string // Additional env vars for commands (e.g., AWS_ENDPOINT_URL)
}

ToolExecutor runs tool calls against the real environment.

func (*ToolExecutor) Execute ¶

func (e *ToolExecutor) Execute(ctx context.Context, tc ToolCall) string

Execute runs a single tool call and returns the result string.

type Usage ¶

type Usage struct {
	PromptTokens             int `json:"prompt_tokens"`
	CompletionTokens         int `json:"completion_tokens"`
	CacheCreationInputTokens int `json:"cache_creation_input_tokens,omitempty"`
	CacheReadInputTokens     int `json:"cache_read_input_tokens,omitempty"`
	PromptCacheHitTokens     int `json:"prompt_cache_hit_tokens,omitempty"`
	PromptCacheMissTokens    int `json:"prompt_cache_miss_tokens,omitempty"`
}

Usage tracks token consumption including prompt cache tokens.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL