halguard

package
v0.1.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Overview

Package halguard provides hallucination detection and mitigation for sub-agent outputs in the Genie agent framework.

It implements a tiered verification pipeline inspired by Finch-Zk (Goel et al., Aug 2025, arXiv:2508.14314v2) with:

  • Pre-delegation grounding checks that score goals on a 0–1 confidence scale using multi-signal analysis (structural, semantic, information density) rather than brittle string matching.
  • Post-execution cross-model consistency verification that detects hallucinated content at a fine-grained block level and applies targeted corrections using a different model family.

The Guard interface is injected into createAgentTool as an optional dependency. When nil, sub-agents execute without hallucination checks, preserving full backward compatibility.

Model selection strategy (per Finch-Zk findings):

  1. Collect efficiency-task models first (fast, cheap for verification).
  2. If fewer than the configured sample count are available, supplement with distinct models from other task types for architectural diversity.
  3. Cross-model diversity is critical — the paper shows that disabling cross-model sampling significantly degrades detection accuracy.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BlockLabel

type BlockLabel string

BlockLabel classifies a block's consistency with cross-model samples.

const (
	// BlockAccurate means the block is factually consistent with reference samples.
	BlockAccurate BlockLabel = "ACCURATE"

	// BlockContradiction means the block directly contradicts one or more reference samples.
	BlockContradiction BlockLabel = "CONTRADICTION"

	// BlockNeutral means there is insufficient information for a definitive assessment.
	BlockNeutral BlockLabel = "NEUTRAL"
)

type BlockScore

type BlockScore struct {
	// Text is the original block text.
	Text string

	// Label is the consistency verdict: ACCURATE, CONTRADICTION, or NEUTRAL.
	Label BlockLabel

	// Reason explains the verdict (populated for CONTRADICTION and NEUTRAL).
	Reason string
}

BlockScore holds the verification result for a single semantic block of the sub-agent's output.

type BlockScores

type BlockScores []BlockScore

type Config

type Config struct {
	// LightThresholdChars is the output length above which the light
	// verification tier is applied. Default: 200.
	LightThresholdChars int `yaml:"light_threshold_chars,omitempty" toml:"light_threshold_chars,omitempty"`

	// FullThresholdChars is the output length above which the full
	// cross-model Finch-Zk verification tier is applied. Default: 500.
	FullThresholdChars int `yaml:"full_threshold_chars,omitempty" toml:"full_threshold_chars,omitempty"`

	// EnablePreCheck controls whether pre-delegation grounding checks run.
	// Default: true.
	EnablePreCheck bool `yaml:"enable_pre_check,omitempty" toml:"enable_pre_check,omitempty"`

	// EnablePostCheck controls whether post-execution verification runs.
	// Default: true.
	EnablePostCheck bool `yaml:"enable_post_check,omitempty" toml:"enable_post_check,omitempty"`

	// CrossModelSamples is the number of cross-model samples to generate
	// for full verification. Finch-Zk shows 3 samples with batch judging
	// maintains accuracy while keeping cost manageable. Default: 3.
	CrossModelSamples int `yaml:"cross_model_samples,omitempty" toml:"cross_model_samples,omitempty"`

	// MaxBlocksToJudge caps the number of blocks sent for cross-consistency
	// judging to limit cost on very long outputs. Default: 20.
	MaxBlocksToJudge int `yaml:"max_blocks_to_judge,omitempty" toml:"max_blocks_to_judge,omitempty"`

	// PreCheckThreshold is the confidence score below which a sub-agent
	// goal is rejected as likely fabricated. Range: (0.0–1.0]. Default: 0.4.
	// Lower = more permissive, higher = more strict. A value of 0 (or an
	// omitted field) is treated as "unset" and causes the default to be used.
	PreCheckThreshold float64 `yaml:"pre_check_threshold,omitempty" toml:"pre_check_threshold,omitempty"`
}

Config holds the tuning parameters for hallucination guard behaviour. Zero values use sensible defaults. This struct is embedded in config.GenieConfig and deserialized from the halguard section of genie.toml / genie.yaml.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a Config with sensible defaults.

type GroundingSignals

type GroundingSignals struct {
	// RolePlay detects explicit role-play instructions
	// (e.g. "you are an SRE", "imagine you're", "pretend to be").
	RolePlay float64

	// FabricationPattern detects invented operational data
	// (e.g. "p99 latency spiked from", "error rate jumped from").
	FabricationPattern float64

	// SecondPersonRole detects "You are..." framing at the start of goals.
	SecondPersonRole float64

	// SpecificMetrics detects suspiciously precise numeric claims
	// without tool backing (e.g. "342ms", "2.4%", "1500 req/s").
	SpecificMetrics float64

	// InformationDensity detects an unusually high density of specific
	// technical claims per sentence.
	InformationDensity float64

	// TemporalUrgency detects artificial time pressure language
	// (e.g. "production is down", "immediately", "urgent").
	TemporalUrgency float64
}

GroundingSignals holds the weighted penalties from each fabrication detection signal. A value of 0 means the signal did not fire. All values are in the range [0, weight_max] where weight_max is the signal's maximum contribution to the total penalty.

func (GroundingSignals) HasAny

func (s GroundingSignals) HasAny() bool

HasAny reports whether any signal fired (has a non-zero value).

func (GroundingSignals) MergeScaled

func (s GroundingSignals) MergeScaled(other GroundingSignals, scale float64) GroundingSignals

MergeScaled adds another GroundingSignals scaled by a factor. Used to combine context-field signals at reduced weight.

func (GroundingSignals) Penalty

func (s GroundingSignals) Penalty() float64

Penalty returns the total fabrication penalty as the sum of all signal contributions, capped at 1.0.

func (GroundingSignals) String

func (s GroundingSignals) String() string

String returns a human-readable summary of non-zero signals.

type Guard

type Guard interface {
	// PreCheck scores the sub-agent goal for fabrication risk before execution.
	// Returns a PreCheckResult with a Confidence score between 0.0 (certainly
	// fabricated) and 1.0 (certainly genuine). The caller decides whether to
	// proceed based on the score and a configurable threshold.
	//
	// Uses multi-signal analysis based on structural indicators, semantic
	// patterns, and information density rather than brittle keyword matching.
	PreCheck(ctx context.Context, req PreCheckRequest) (PreCheckResult, error)

	// PostCheck verifies sub-agent output after execution.
	// Returns a VerificationResult with per-block scores and, when contradictions
	// are found, a corrected version of the output. The corrected text preserves
	// accurate blocks and only rewrites contradicted ones.
	PostCheck(ctx context.Context, req PostCheckRequest) (VerificationResult, error)
}

Guard provides pre-delegation and post-execution hallucination checks for sub-agent tool calls. Implementations must be safe for concurrent use.

func New

func New(
	modelProvider modelprovider.ModelProvider,
	textGenerator TextGeneratorFunc,
	opts ...Option,
) Guard

New creates a Guard with the given model provider and options. The model provider is used to collect diverse models for cross-model consistency checking. The textGenerator callback performs one-shot LLM calls with proper tracing; see NewTextGenerator for the recommended implementation. When options are not provided, sensible defaults are used (pre-check enabled, post-check enabled, 3 cross-model samples).

type Option

type Option func(*Config)

Option configures optional behaviour on the Guard.

func WithConfig

func WithConfig(cfg Config) Option

WithConfig applies a Config struct, overriding only non-zero numeric fields and always applying the bool flags. This is the preferred option when the config comes from genie.toml deserialization.

Because TOML/YAML omitempty cannot distinguish "field absent" from "field set to zero/false", callers who want to explicitly disable pre-check or post-check must set the field in the config file:

[halguard]
enable_pre_check = false

When no halguard section exists, all fields are zero and the defaults from New() are preserved.

func WithCrossModelSamples

func WithCrossModelSamples(n int) Option

WithCrossModelSamples sets the number of cross-model samples to generate.

func WithFullThreshold

func WithFullThreshold(chars int) Option

WithFullThreshold sets the character count above which full Finch-Zk verification is applied.

func WithLightThreshold

func WithLightThreshold(chars int) Option

WithLightThreshold sets the character count above which light verification is applied.

func WithPostCheck

func WithPostCheck(enable bool) Option

WithPostCheck enables or disables the post-execution output verification.

func WithPreCheck

func WithPreCheck(enable bool) Option

WithPreCheck enables or disables the pre-delegation grounding check.

type PostCheckRequest

type PostCheckRequest struct {
	// Goal is the original goal given to the sub-agent.
	Goal string

	// Context is the optional context string provided alongside the goal.
	Context string

	// Output is the sub-agent's raw text output to verify.
	Output string

	// ToolCallsMade is the number of tool calls the sub-agent executed.
	// A higher count suggests the output is more grounded in real data.
	ToolCallsMade int

	// GenerationModel identifies the model that generated the output
	// (e.g. "claude-sonnet-4-6"). Used to select a different model family
	// for cross-model verification per Finch-Zk §2.5.
	GenerationModel modelprovider.ModelMap
}

PostCheckRequest is the input for Guard.PostCheck.

type PreCheckRequest

type PreCheckRequest struct {
	// Goal is the sub-agent's goal as specified by the parent agent.
	Goal string

	// Context is the optional context string provided alongside the goal.
	Context string

	// ToolNames lists the tools assigned to the sub-agent.
	ToolNames []string
}

PreCheckRequest is the input for Guard.PreCheck.

type PreCheckResult

type PreCheckResult struct {
	// Confidence is the probability that the goal is genuine and grounded
	// in reality. Range: 0.0 (certainly fabricated) to 1.0 (certainly genuine).
	// The caller compares this against a configurable threshold to decide
	// whether to proceed with execution.
	Confidence float64

	// Signals contains the individual signal contributions that produced
	// the confidence score. Each field represents a distinct fabrication
	// signal with its weighted penalty.
	Signals GroundingSignals

	// Summary is a human-readable explanation of the assessment.
	Summary string
}

PreCheckResult carries the grounding assessment for a sub-agent goal.

type TextGeneratorFunc

type TextGeneratorFunc func(ctx context.Context, m model.Model, prompt string) (string, error)

TextGeneratorFunc performs a one-shot LLM text generation given a specific model and prompt. The caller is responsible for creating this function with proper tracing wired in (e.g. using expert.Expert backed by the trpc-agent-go runner pipeline). This callback pattern avoids a direct import from halguard to the expert package, which would create an import cycle through config.

type VerificationResult

type VerificationResult struct {
	// IsFactual is true when no contradictions were detected.
	IsFactual bool

	// CorrectedText holds the corrected output when contradictions were
	// found and targeted corrections were applied. When IsFactual is true,
	// this equals the original output unchanged.
	CorrectedText string

	// BlockScores holds per-block verification results for observability.
	// Only populated for Light and Full tier verifications.
	BlockScores []BlockScore

	// Tier indicates which verification level was applied.
	Tier VerifyTier
}

VerificationResult carries the outcome of a PostCheck verification.

type VerifyTier

type VerifyTier string

VerifyTier indicates the level of verification applied to an output.

const (
	// TierNone means no verification was applied (short, tool-grounded output).
	TierNone VerifyTier = "none"

	// TierLight means a single-model sanity check was applied.
	TierLight VerifyTier = "light"

	// TierFull means the full cross-model Finch-Zk pipeline was applied.
	TierFull VerifyTier = "full"
)

Directories

Path Synopsis
Code generated by counterfeiter.
Code generated by counterfeiter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL