agent

package

v0.2.1 Latest Latest Go to latest Published: May 22, 2026 License: MIT Imports: 13 Imported by: 0

Documentation ¶

Overview ¶

Package agent runs the provider-agnostic tool-use loop that backs `sfw-mcp audit`. It pulls tool specs and handlers from internal/tools so the agent never drifts from the MCP serve surface: anything an external Claude Code / Cursor / Zed agent can call, the in-process agent can call too, and vice versa.

Index ¶

Constants
func WithModel(ctx context.Context, model string) context.Context
type AgentUsage
- func Run(ctx context.Context, p provider.Provider, system, user string, ...) (string, AgentUsage, error)
type AuditInputs
type AuditOutput
- func RunAudit(ctx context.Context, p provider.Provider, ...) AuditOutput
- func (o AuditOutput) ExitCode() int
- func (o AuditOutput) PrimaryAssessment() LLMAssessment
type Cost
type LLMAssessment
type LoopOptions

Constants ¶

View Source

const (
	VerdictMatch      = "MATCH"
	VerdictSuspicious = "SUSPICIOUS"
	VerdictLie        = "LIE"
	VerdictError      = "ERROR"
)

View Source

const DefaultMaxSteps = 8

DefaultMaxSteps caps the tool-use loop so a runaway plan can not rack up unbounded LLM bills or block forever. Each step is one model call plus all the tool calls that came back from it; eight is enough head-room for an audit to investigate a handful of files at varying depth without being remotely loose.

View Source

const DefaultMaxTokens = 2048

DefaultMaxTokens is the per-turn output cap. Sized for an audit verdict + a few hundred bytes of evidence; the agent ends a turn when the model emits its final verdict object, not when it bumps max_tokens.

View Source

const MaxCommitMsgRunes = 2000

MaxCommitMsgRunes caps the commit message that goes into the user turn. 2000 matches v3's pre-audit truncation; longer messages get "[TRUNCATED]" appended so the agent knows the input was clipped.

Variables ¶

This section is empty.

Functions ¶

func WithModel ¶

func WithModel(ctx context.Context, model string) context.Context

WithModel returns ctx tagged with the requested model name.

Types ¶

type AgentUsage ¶ added in v0.2.0

type AgentUsage struct {
	ToolCalls     int
	ModelSteps    int
	ProviderUsage provider.Usage
}

AgentUsage records the resource cost of a Run call. It is populated on every return path -- success, max_tokens abort, step- budget exhaustion, mid-loop provider error -- so the caller never loses sight of what was already spent before a failure. That is the "runaway-bill hides in the failure path" hole the v0.2 release explicitly closes.

func Run ¶

func Run(ctx context.Context, p provider.Provider, system, user string, seed []provider.Message, opts LoopOptions) (string, AgentUsage, error)

Run executes the tool-use loop until the model emits end_turn, the step budget is exhausted, or a tool call returns a fatal error. Returns the accumulated assistant text from the final turn (empty on any non-end_turn exit) and the running AgentUsage regardless of which path produced the result. The third return is the error, if any.

seed lets the caller pre-populate the conversation with prior turns. The audit harness uses this to inject a pre-computed sfw_diff result as turn 1, so the model never has to call the tool itself for the audit's primary file pair and risk_evidence + the model's view share one source of truth.

Tools come from tools.All() so the agent automatically inherits every tool the MCP server publishes. A separate "agent-only" tool list would invite drift; we explicitly want one source of truth.

type AuditInputs ¶ added in v0.2.0

type AuditInputs struct {
	OldFile       string `json:"old_file"`
	NewFile       string `json:"new_file"`
	CommitMessage string `json:"commit_message"`
}

AuditInputs echoes the parameters the audit was invoked with so a CI log can be diff-ed across runs without consulting the workflow file.

type AuditOutput ¶ added in v0.2.0

type AuditOutput struct {
	Inputs         AuditInputs     `json:"inputs"`
	RiskEvidence   risk.Evidence   `json:"risk_evidence"`
	LLMAssessments []LLMAssessment `json:"llm_assessments"`
	Cost           Cost            `json:"cost"`
}

AuditOutput is the v0.2 audit response shape. The structure draws a deliberate line between deterministic, math-only signals (RiskEvidence) and non-deterministic LLM judgment (LLMAssessments). An operator who only trusts the math can pipe the JSON through `jq .risk_evidence` and ignore everything else; the LLM verdict is advisory unless the operator opts into a stricter gate.

LLMAssessments is intentionally an array even though it carries exactly one entry today. The provider-disagreement mode (v0.3) adds a second entry; making this a one-element list today means that change is purely additive instead of another schema break.

func RunAudit ¶

func RunAudit(ctx context.Context, p provider.Provider, model, oldPath, newPath, commitMsg string, opts LoopOptions) AuditOutput

RunAudit is the v0.2 entry point. Pre-computes the diff deterministically, injects it into the agent's conversation as a seeded sfw_diff result so risk_evidence and what the model sees share a single source of truth, runs the agent loop against the configured provider, and assembles the AuditOutput. Cost is surfaced on every return path -- success, model parse failure, step-budget exhaustion, provider outage.

func (AuditOutput) ExitCode ¶ added in v0.2.0

func (o AuditOutput) ExitCode() int

ExitCode maps the audit result to the tri-state exit code reserved for v0.2:

0 -- MATCH (clean: both deterministic and LLM agree no issue)
1 -- LIE or SUSPICIOUS (the tool has an opinion)
2 -- ERROR (the tool itself broke -- provider 503, parse failure)

The split lets operators distinguish "Anthropic was 503 for ten minutes" from "Claude thinks this is a lie". A CI workflow that wants to treat infra outages as soft-fail can key on `exit == 2` without conflating it with a real verdict.

func (AuditOutput) PrimaryAssessment ¶ added in v0.2.0

func (o AuditOutput) PrimaryAssessment() LLMAssessment

PrimaryAssessment returns the first LLMAssessment, or a synthetic ERROR assessment when none ran. Convenience for callers (notably the CLI exit-code logic) that need a single verdict to act on from the array shape.

type Cost ¶ added in v0.2.0

type Cost struct {
	ToolCalls           int `json:"tool_calls"`
	ModelSteps          int `json:"model_steps"`
	InputTokens         int `json:"input_tokens"`
	OutputTokens        int `json:"output_tokens"`
	CacheReadTokens     int `json:"cache_read_tokens,omitempty"`
	CacheCreationTokens int `json:"cache_creation_tokens,omitempty"`
}

Cost surfaces what the audit spent. Always populated, including on failure paths -- that is where runaway-bill bugs hide.

type LLMAssessment ¶ added in v0.2.0

type LLMAssessment struct {
	Provider string `json:"provider"`
	Model    string `json:"model"`
	Verdict  string `json:"verdict"`
	Evidence string `json:"evidence"`
	// Error is populated only when this provider's run could not
	// produce a structured verdict (parse failure, network outage,
	// step-budget exhaustion). Verdict is set to VerdictError in
	// that case so consumers reading the verdict field alone still
	// see a usable signal.
	Error string `json:"error,omitempty"`
}

LLMAssessment is one provider's verdict on the audit. Identified by provider+model so cross-provider mode (v0.3) can attach multiple and the operator can tell which model said what.

type LoopOptions ¶

type LoopOptions struct {
	MaxSteps  int
	MaxTokens int
}

LoopOptions tunes the run. Both fields fall back to the Default* constants when zero so a caller can pass a partially-set struct and still get sensible behaviour.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL