agent

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package agent runs the provider-agnostic tool-use loop that backs `sfw-mcp audit`. It pulls tool specs and handlers from internal/tools so the agent never drifts from the MCP serve surface: anything an external Claude Code / Cursor / Zed agent can call, the in-process agent can call too, and vice versa.

Index

Constants

View Source
const (
	VerdictMatch      = "MATCH"
	VerdictSuspicious = "SUSPICIOUS"
	VerdictLie        = "LIE"
	VerdictError      = "ERROR"
)
View Source
const DefaultMaxSteps = 8

DefaultMaxSteps caps the tool-use loop so a runaway plan can not rack up unbounded LLM bills or block forever. Each step is one model call plus all the tool calls that came back from it; eight is enough head-room for an audit to investigate a handful of files at varying depth without being remotely loose.

View Source
const DefaultMaxTokens = 2048

DefaultMaxTokens is the per-turn output cap. Sized for an audit verdict + a few hundred bytes of evidence; the agent ends a turn when the model emits its final verdict object, not when it bumps max_tokens.

View Source
const MaxCommitMsgRunes = 2000

MaxCommitMsgRunes caps the commit message that goes into the user turn. 2000 matches v3's pre-audit truncation; longer messages get "[TRUNCATED]" appended so the agent knows the input was clipped.

Variables

This section is empty.

Functions

func WithModel

func WithModel(ctx context.Context, model string) context.Context

WithModel returns ctx tagged with the requested model name.

Types

type AgentUsage added in v0.2.0

type AgentUsage struct {
	ToolCalls     int
	ModelSteps    int
	ProviderUsage provider.Usage
}

AgentUsage records the resource cost of a Run call. It is populated on every return path -- success, max_tokens abort, step- budget exhaustion, mid-loop provider error -- so the caller never loses sight of what was already spent before a failure. That is the "runaway-bill hides in the failure path" hole the v0.2 release explicitly closes.

func Run

func Run(ctx context.Context, p provider.Provider, system, user string, seed []provider.Message, opts LoopOptions) (string, AgentUsage, error)

Run executes the tool-use loop until the model emits end_turn, the step budget is exhausted, or a tool call returns a fatal error. Returns the accumulated assistant text from the final turn (empty on any non-end_turn exit) and the running AgentUsage regardless of which path produced the result. The third return is the error, if any.

seed lets the caller pre-populate the conversation with prior turns. The audit harness uses this to inject a pre-computed sfw_diff result as turn 1, so the model never has to call the tool itself for the audit's primary file pair and risk_evidence + the model's view share one source of truth.

Tools come from tools.All() so the agent automatically inherits every tool the MCP server publishes. A separate "agent-only" tool list would invite drift; we explicitly want one source of truth.

type AuditInputs added in v0.2.0

type AuditInputs struct {
	OldFile       string `json:"old_file"`
	NewFile       string `json:"new_file"`
	CommitMessage string `json:"commit_message"`
}

AuditInputs echoes the parameters the audit was invoked with so a CI log can be diff-ed across runs without consulting the workflow file.

type AuditOutput added in v0.2.0

type AuditOutput struct {
	Inputs         AuditInputs     `json:"inputs"`
	RiskEvidence   risk.Evidence   `json:"risk_evidence"`
	LLMAssessments []LLMAssessment `json:"llm_assessments"`
	Cost           Cost            `json:"cost"`
}

AuditOutput is the v0.2 audit response shape. The structure draws a deliberate line between deterministic, math-only signals (RiskEvidence) and non-deterministic LLM judgment (LLMAssessments). An operator who only trusts the math can pipe the JSON through `jq .risk_evidence` and ignore everything else; the LLM verdict is advisory unless the operator opts into a stricter gate.

LLMAssessments is intentionally an array even though it carries exactly one entry today. The provider-disagreement mode (v0.3) adds a second entry; making this a one-element list today means that change is purely additive instead of another schema break.

func RunAudit

func RunAudit(ctx context.Context, p provider.Provider, model, oldPath, newPath, commitMsg string, opts LoopOptions) AuditOutput

RunAudit is the v0.2 entry point. Pre-computes the diff deterministically, injects it into the agent's conversation as a seeded sfw_diff result so risk_evidence and what the model sees share a single source of truth, runs the agent loop against the configured provider, and assembles the AuditOutput. Cost is surfaced on every return path -- success, model parse failure, step-budget exhaustion, provider outage.

func (AuditOutput) ExitCode added in v0.2.0

func (o AuditOutput) ExitCode() int

ExitCode maps the audit result to the tri-state exit code reserved for v0.2:

0 -- MATCH (clean: both deterministic and LLM agree no issue)
1 -- LIE or SUSPICIOUS (the tool has an opinion)
2 -- ERROR (the tool itself broke -- provider 503, parse failure)

The split lets operators distinguish "Anthropic was 503 for ten minutes" from "Claude thinks this is a lie". A CI workflow that wants to treat infra outages as soft-fail can key on `exit == 2` without conflating it with a real verdict.

func (AuditOutput) PrimaryAssessment added in v0.2.0

func (o AuditOutput) PrimaryAssessment() LLMAssessment

PrimaryAssessment returns the first LLMAssessment, or a synthetic ERROR assessment when none ran. Convenience for callers (notably the CLI exit-code logic) that need a single verdict to act on from the array shape.

type Cost added in v0.2.0

type Cost struct {
	ToolCalls           int `json:"tool_calls"`
	ModelSteps          int `json:"model_steps"`
	InputTokens         int `json:"input_tokens"`
	OutputTokens        int `json:"output_tokens"`
	CacheReadTokens     int `json:"cache_read_tokens,omitempty"`
	CacheCreationTokens int `json:"cache_creation_tokens,omitempty"`
}

Cost surfaces what the audit spent. Always populated, including on failure paths -- that is where runaway-bill bugs hide.

type LLMAssessment added in v0.2.0

type LLMAssessment struct {
	Provider string `json:"provider"`
	Model    string `json:"model"`
	Verdict  string `json:"verdict"`
	Evidence string `json:"evidence"`
	// Error is populated only when this provider's run could not
	// produce a structured verdict (parse failure, network outage,
	// step-budget exhaustion). Verdict is set to VerdictError in
	// that case so consumers reading the verdict field alone still
	// see a usable signal.
	Error string `json:"error,omitempty"`
}

LLMAssessment is one provider's verdict on the audit. Identified by provider+model so cross-provider mode (v0.3) can attach multiple and the operator can tell which model said what.

type LoopOptions

type LoopOptions struct {
	MaxSteps  int
	MaxTokens int
}

LoopOptions tunes the run. Both fields fall back to the Default* constants when zero so a caller can pass a partially-set struct and still get sensible behaviour.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL