Documentation
¶
Overview ¶
Package agent runs the provider-agnostic tool-use loop that backs `sfw-mcp audit`. It pulls tool specs and handlers from internal/tools so the agent never drifts from the MCP serve surface: anything an external Claude Code / Cursor / Zed agent can call, the in-process agent can call too, and vice versa.
Index ¶
Constants ¶
const ( VerdictMatch = "MATCH" VerdictSuspicious = "SUSPICIOUS" VerdictLie = "LIE" VerdictError = "ERROR" )
const DefaultMaxSteps = 8
DefaultMaxSteps caps the tool-use loop so a runaway plan can not rack up unbounded LLM bills or block forever. Each step is one model call plus all the tool calls that came back from it; eight is enough head-room for an audit to investigate a handful of files at varying depth without being remotely loose.
const DefaultMaxTokens = 2048
DefaultMaxTokens is the per-turn output cap. Sized for an audit verdict + a few hundred bytes of evidence; the agent ends a turn when the model emits its final verdict object, not when it bumps max_tokens.
const MaxCommitMsgRunes = 2000
MaxCommitMsgRunes caps the commit message that goes into the user turn. 2000 matches v3's pre-audit truncation; longer messages get "[TRUNCATED]" appended so the agent knows the input was clipped.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type AgentUsage ¶ added in v0.2.0
AgentUsage records the resource cost of a Run call. It is populated on every return path -- success, max_tokens abort, step- budget exhaustion, mid-loop provider error -- so the caller never loses sight of what was already spent before a failure. That is the "runaway-bill hides in the failure path" hole the v0.2 release explicitly closes.
func Run ¶
func Run(ctx context.Context, p provider.Provider, system, user string, seed []provider.Message, opts LoopOptions) (string, AgentUsage, error)
Run executes the tool-use loop until the model emits end_turn, the step budget is exhausted, or a tool call returns a fatal error. Returns the accumulated assistant text from the final turn (empty on any non-end_turn exit) and the running AgentUsage regardless of which path produced the result. The third return is the error, if any.
seed lets the caller pre-populate the conversation with prior turns. The audit harness uses this to inject a pre-computed sfw_diff result as turn 1, so the model never has to call the tool itself for the audit's primary file pair and risk_evidence + the model's view share one source of truth.
Tools come from tools.All() so the agent automatically inherits every tool the MCP server publishes. A separate "agent-only" tool list would invite drift; we explicitly want one source of truth.
type AuditInputs ¶ added in v0.2.0
type AuditInputs struct {
OldFile string `json:"old_file"`
NewFile string `json:"new_file"`
CommitMessage string `json:"commit_message"`
}
AuditInputs echoes the parameters the audit was invoked with so a CI log can be diff-ed across runs without consulting the workflow file.
type AuditOutput ¶ added in v0.2.0
type AuditOutput struct {
Inputs AuditInputs `json:"inputs"`
RiskEvidence risk.Evidence `json:"risk_evidence"`
LLMAssessments []LLMAssessment `json:"llm_assessments"`
Cost Cost `json:"cost"`
}
AuditOutput is the v0.2 audit response shape. The structure draws a deliberate line between deterministic, math-only signals (RiskEvidence) and non-deterministic LLM judgment (LLMAssessments). An operator who only trusts the math can pipe the JSON through `jq .risk_evidence` and ignore everything else; the LLM verdict is advisory unless the operator opts into a stricter gate.
LLMAssessments is intentionally an array even though it carries exactly one entry today. The provider-disagreement mode (v0.3) adds a second entry; making this a one-element list today means that change is purely additive instead of another schema break.
func RunAudit ¶
func RunAudit(ctx context.Context, p provider.Provider, model, oldPath, newPath, commitMsg string, opts LoopOptions) AuditOutput
RunAudit is the v0.2 entry point. Pre-computes the diff deterministically, injects it into the agent's conversation as a seeded sfw_diff result so risk_evidence and what the model sees share a single source of truth, runs the agent loop against the configured provider, and assembles the AuditOutput. Cost is surfaced on every return path -- success, model parse failure, step-budget exhaustion, provider outage.
func (AuditOutput) ExitCode ¶ added in v0.2.0
func (o AuditOutput) ExitCode() int
ExitCode maps the audit result to the tri-state exit code reserved for v0.2:
0 -- MATCH (clean: both deterministic and LLM agree no issue) 1 -- LIE or SUSPICIOUS (the tool has an opinion) 2 -- ERROR (the tool itself broke -- provider 503, parse failure)
The split lets operators distinguish "Anthropic was 503 for ten minutes" from "Claude thinks this is a lie". A CI workflow that wants to treat infra outages as soft-fail can key on `exit == 2` without conflating it with a real verdict.
func (AuditOutput) PrimaryAssessment ¶ added in v0.2.0
func (o AuditOutput) PrimaryAssessment() LLMAssessment
PrimaryAssessment returns the first LLMAssessment, or a synthetic ERROR assessment when none ran. Convenience for callers (notably the CLI exit-code logic) that need a single verdict to act on from the array shape.
type Cost ¶ added in v0.2.0
type Cost struct {
ToolCalls int `json:"tool_calls"`
ModelSteps int `json:"model_steps"`
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
CacheReadTokens int `json:"cache_read_tokens,omitempty"`
CacheCreationTokens int `json:"cache_creation_tokens,omitempty"`
}
Cost surfaces what the audit spent. Always populated, including on failure paths -- that is where runaway-bill bugs hide.
type LLMAssessment ¶ added in v0.2.0
type LLMAssessment struct {
Provider string `json:"provider"`
Model string `json:"model"`
Verdict string `json:"verdict"`
Evidence string `json:"evidence"`
// Error is populated only when this provider's run could not
// produce a structured verdict (parse failure, network outage,
// step-budget exhaustion). Verdict is set to VerdictError in
// that case so consumers reading the verdict field alone still
// see a usable signal.
Error string `json:"error,omitempty"`
}
LLMAssessment is one provider's verdict on the audit. Identified by provider+model so cross-provider mode (v0.3) can attach multiple and the operator can tell which model said what.
type LoopOptions ¶
LoopOptions tunes the run. Both fields fall back to the Default* constants when zero so a caller can pass a partially-set struct and still get sensible behaviour.