Documentation
¶
Overview ¶
Package halguard provides hallucination detection and mitigation for sub-agent outputs in the Genie agent framework.
It implements a tiered verification pipeline inspired by Finch-Zk (Goel et al., Aug 2025, arXiv:2508.14314v2) with:
- Pre-delegation grounding checks that score goals on a 0–1 confidence scale using multi-signal analysis (structural, semantic, information density) rather than brittle string matching.
- Post-execution cross-model consistency verification that detects hallucinated content at a fine-grained block level and applies targeted corrections using a different model family.
The Guard interface is injected into createAgentTool as an optional dependency. When nil, sub-agents execute without hallucination checks, preserving full backward compatibility.
Model selection strategy (per Finch-Zk findings):
- Collect efficiency-task models first (fast, cheap for verification).
- If fewer than the configured sample count are available, supplement with distinct models from other task types for architectural diversity.
- Cross-model diversity is critical — the paper shows that disabling cross-model sampling significantly degrades detection accuracy.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BlockLabel ¶
type BlockLabel string
BlockLabel classifies a block's consistency with cross-model samples.
const ( // BlockAccurate means the block is factually consistent with reference samples. BlockAccurate BlockLabel = "ACCURATE" // BlockContradiction means the block directly contradicts one or more reference samples. BlockContradiction BlockLabel = "CONTRADICTION" // BlockNeutral means there is insufficient information for a definitive assessment. BlockNeutral BlockLabel = "NEUTRAL" )
type BlockScore ¶
type BlockScore struct {
// Text is the original block text.
Text string
// Label is the consistency verdict: ACCURATE, CONTRADICTION, or NEUTRAL.
Label BlockLabel
// Reason explains the verdict (populated for CONTRADICTION and NEUTRAL).
Reason string
}
BlockScore holds the verification result for a single semantic block of the sub-agent's output.
type BlockScores ¶
type BlockScores []BlockScore
type Config ¶
type Config struct {
// LightThresholdChars is the output length above which the light
// verification tier is applied. Default: 200.
LightThresholdChars int `yaml:"light_threshold_chars,omitempty" toml:"light_threshold_chars,omitempty"`
// FullThresholdChars is the output length above which the full
// cross-model Finch-Zk verification tier is applied. Default: 500.
FullThresholdChars int `yaml:"full_threshold_chars,omitempty" toml:"full_threshold_chars,omitempty"`
// EnablePreCheck controls whether pre-delegation grounding checks run.
// Default: true.
EnablePreCheck bool `yaml:"enable_pre_check,omitempty" toml:"enable_pre_check,omitempty"`
// EnablePostCheck controls whether post-execution verification runs.
// Default: true.
EnablePostCheck bool `yaml:"enable_post_check,omitempty" toml:"enable_post_check,omitempty"`
// CrossModelSamples is the number of cross-model samples to generate
// for full verification. Finch-Zk shows 3 samples with batch judging
// maintains accuracy while keeping cost manageable. Default: 3.
CrossModelSamples int `yaml:"cross_model_samples,omitempty" toml:"cross_model_samples,omitempty"`
// MaxBlocksToJudge caps the number of blocks sent for cross-consistency
// judging to limit cost on very long outputs. Default: 20.
MaxBlocksToJudge int `yaml:"max_blocks_to_judge,omitempty" toml:"max_blocks_to_judge,omitempty"`
// PreCheckThreshold is the confidence score below which a sub-agent
// goal is rejected as likely fabricated. Range: (0.0–1.0]. Default: 0.4.
// Lower = more permissive, higher = more strict. A value of 0 (or an
// omitted field) is treated as "unset" and causes the default to be used.
PreCheckThreshold float64 `yaml:"pre_check_threshold,omitempty" toml:"pre_check_threshold,omitempty"`
}
Config holds the tuning parameters for hallucination guard behaviour. Zero values use sensible defaults. This struct is embedded in config.GenieConfig and deserialized from the halguard section of genie.toml / genie.yaml.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a Config with sensible defaults.
type GroundingSignals ¶
type GroundingSignals struct {
// RolePlay detects explicit role-play instructions
// (e.g. "you are an SRE", "imagine you're", "pretend to be").
RolePlay float64
// FabricationPattern detects invented operational data
// (e.g. "p99 latency spiked from", "error rate jumped from").
FabricationPattern float64
// SecondPersonRole detects "You are..." framing at the start of goals.
SecondPersonRole float64
// SpecificMetrics detects suspiciously precise numeric claims
// without tool backing (e.g. "342ms", "2.4%", "1500 req/s").
SpecificMetrics float64
// InformationDensity detects an unusually high density of specific
// technical claims per sentence.
InformationDensity float64
// TemporalUrgency detects artificial time pressure language
// (e.g. "production is down", "immediately", "urgent").
TemporalUrgency float64
}
GroundingSignals holds the weighted penalties from each fabrication detection signal. A value of 0 means the signal did not fire. All values are in the range [0, weight_max] where weight_max is the signal's maximum contribution to the total penalty.
func (GroundingSignals) HasAny ¶
func (s GroundingSignals) HasAny() bool
HasAny reports whether any signal fired (has a non-zero value).
func (GroundingSignals) MergeScaled ¶
func (s GroundingSignals) MergeScaled(other GroundingSignals, scale float64) GroundingSignals
MergeScaled adds another GroundingSignals scaled by a factor. Used to combine context-field signals at reduced weight.
func (GroundingSignals) Penalty ¶
func (s GroundingSignals) Penalty() float64
Penalty returns the total fabrication penalty as the sum of all signal contributions, capped at 1.0.
func (GroundingSignals) String ¶
func (s GroundingSignals) String() string
String returns a human-readable summary of non-zero signals.
type Guard ¶
type Guard interface {
// PreCheck scores the sub-agent goal for fabrication risk before execution.
// Returns a PreCheckResult with a Confidence score between 0.0 (certainly
// fabricated) and 1.0 (certainly genuine). The caller decides whether to
// proceed based on the score and a configurable threshold.
//
// Uses multi-signal analysis based on structural indicators, semantic
// patterns, and information density rather than brittle keyword matching.
PreCheck(ctx context.Context, req PreCheckRequest) (PreCheckResult, error)
// PostCheck verifies sub-agent output after execution.
// Returns a VerificationResult with per-block scores and, when contradictions
// are found, a corrected version of the output. The corrected text preserves
// accurate blocks and only rewrites contradicted ones.
PostCheck(ctx context.Context, req PostCheckRequest) (VerificationResult, error)
}
Guard provides pre-delegation and post-execution hallucination checks for sub-agent tool calls. Implementations must be safe for concurrent use.
func New ¶
func New( modelProvider modelprovider.ModelProvider, textGenerator TextGeneratorFunc, opts ...Option, ) Guard
New creates a Guard with the given model provider and options. The model provider is used to collect diverse models for cross-model consistency checking. The textGenerator callback performs one-shot LLM calls with proper tracing; see NewTextGenerator for the recommended implementation. When options are not provided, sensible defaults are used (pre-check enabled, post-check enabled, 3 cross-model samples).
type Option ¶
type Option func(*Config)
Option configures optional behaviour on the Guard.
func WithConfig ¶
WithConfig applies a Config struct, overriding only non-zero numeric fields and always applying the bool flags. This is the preferred option when the config comes from genie.toml deserialization.
Because TOML/YAML omitempty cannot distinguish "field absent" from "field set to zero/false", callers who want to explicitly disable pre-check or post-check must set the field in the config file:
[halguard] enable_pre_check = false
When no halguard section exists, all fields are zero and the defaults from New() are preserved.
func WithCrossModelSamples ¶
WithCrossModelSamples sets the number of cross-model samples to generate.
func WithFullThreshold ¶
WithFullThreshold sets the character count above which full Finch-Zk verification is applied.
func WithLightThreshold ¶
WithLightThreshold sets the character count above which light verification is applied.
func WithPostCheck ¶
WithPostCheck enables or disables the post-execution output verification.
func WithPreCheck ¶
WithPreCheck enables or disables the pre-delegation grounding check.
type PostCheckRequest ¶
type PostCheckRequest struct {
// Goal is the original goal given to the sub-agent.
Goal string
// Context is the optional context string provided alongside the goal.
Context string
// Output is the sub-agent's raw text output to verify.
Output string
// ToolCallsMade is the number of tool calls the sub-agent executed.
// A higher count suggests the output is more grounded in real data.
ToolCallsMade int
// GenerationModel identifies the model that generated the output
// (e.g. "claude-sonnet-4-6"). Used to select a different model family
// for cross-model verification per Finch-Zk §2.5.
GenerationModel modelprovider.ModelMap
}
PostCheckRequest is the input for Guard.PostCheck.
type PreCheckRequest ¶
type PreCheckRequest struct {
// Goal is the sub-agent's goal as specified by the parent agent.
Goal string
// Context is the optional context string provided alongside the goal.
Context string
// ToolNames lists the tools assigned to the sub-agent.
ToolNames []string
}
PreCheckRequest is the input for Guard.PreCheck.
type PreCheckResult ¶
type PreCheckResult struct {
// Confidence is the probability that the goal is genuine and grounded
// in reality. Range: 0.0 (certainly fabricated) to 1.0 (certainly genuine).
// The caller compares this against a configurable threshold to decide
// whether to proceed with execution.
Confidence float64
// Signals contains the individual signal contributions that produced
// the confidence score. Each field represents a distinct fabrication
// signal with its weighted penalty.
Signals GroundingSignals
// Summary is a human-readable explanation of the assessment.
Summary string
}
PreCheckResult carries the grounding assessment for a sub-agent goal.
type TextGeneratorFunc ¶
TextGeneratorFunc performs a one-shot LLM text generation given a specific model and prompt. The caller is responsible for creating this function with proper tracing wired in (e.g. using expert.Expert backed by the trpc-agent-go runner pipeline). This callback pattern avoids a direct import from halguard to the expert package, which would create an import cycle through config.
type VerificationResult ¶
type VerificationResult struct {
// IsFactual is true when no contradictions were detected.
IsFactual bool
// CorrectedText holds the corrected output when contradictions were
// found and targeted corrections were applied. When IsFactual is true,
// this equals the original output unchanged.
CorrectedText string
// BlockScores holds per-block verification results for observability.
// Only populated for Light and Full tier verifications.
BlockScores []BlockScore
// Tier indicates which verification level was applied.
Tier VerifyTier
}
VerificationResult carries the outcome of a PostCheck verification.
type VerifyTier ¶
type VerifyTier string
VerifyTier indicates the level of verification applied to an output.
const ( // TierNone means no verification was applied (short, tool-grounded output). TierNone VerifyTier = "none" // TierLight means a single-model sanity check was applied. TierLight VerifyTier = "light" // TierFull means the full cross-model Finch-Zk pipeline was applied. TierFull VerifyTier = "full" )