halguard

package

v0.1.7 Latest Latest Go to latest Published: Mar 10, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/stackgenhq/genie

Links

Open Source Insights

Documentation ¶

Overview ¶

Package halguard provides hallucination detection and mitigation for sub-agent outputs in the Genie agent framework.

It implements a tiered verification pipeline inspired by Finch-Zk (Goel et al., Aug 2025, arXiv:2508.14314v2) with:

Pre-delegation grounding checks that score goals on a 0–1 confidence scale using multi-signal analysis (structural, semantic, information density) rather than brittle string matching.
Post-execution cross-model consistency verification that detects hallucinated content at a fine-grained block level and applies targeted corrections using a different model family.

The Guard interface is injected into createAgentTool as an optional dependency. When nil, sub-agents execute without hallucination checks, preserving full backward compatibility.

Model selection strategy (per Finch-Zk findings):

Collect efficiency-task models first (fast, cheap for verification).
If fewer than the configured sample count are available, supplement with distinct models from other task types for architectural diversity.
Cross-model diversity is critical — the paper shows that disabling cross-model sampling significantly degrades detection accuracy.

Index ¶

type BlockLabel
type BlockScore
type BlockScores
type Config
- func DefaultConfig() Config
type GroundingSignals
type Guard
- func New(modelProvider modelprovider.ModelProvider, textGenerator TextGeneratorFunc, ...) Guard
type Option
type PostCheckRequest
type PreCheckRequest
type PreCheckResult
type TextGeneratorFunc
type VerificationResult
type VerifyTier

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type BlockLabel ¶

type BlockLabel string

BlockLabel classifies a block's consistency with cross-model samples.

const (
	// BlockAccurate means the block is factually consistent with reference samples.
	BlockAccurate BlockLabel = "ACCURATE"

	// BlockContradiction means the block directly contradicts one or more reference samples.
	BlockContradiction BlockLabel = "CONTRADICTION"

	// BlockNeutral means there is insufficient information for a definitive assessment.
	BlockNeutral BlockLabel = "NEUTRAL"
)

type BlockScore ¶

type BlockScore struct {
	// Text is the original block text.
	Text string

	// Label is the consistency verdict: ACCURATE, CONTRADICTION, or NEUTRAL.
	Label BlockLabel

	// Reason explains the verdict (populated for CONTRADICTION and NEUTRAL).
	Reason string
}

BlockScore holds the verification result for a single semantic block of the sub-agent's output.

type BlockScores ¶

type BlockScores []BlockScore

type Config ¶

type Config struct {
	// LightThresholdChars is the output length above which the light
	// verification tier is applied. Default: 200.
	LightThresholdChars int `yaml:"light_threshold_chars,omitempty" toml:"light_threshold_chars,omitempty"`

	// FullThresholdChars is the output length above which the full
	// cross-model Finch-Zk verification tier is applied. Default: 500.
	FullThresholdChars int `yaml:"full_threshold_chars,omitempty" toml:"full_threshold_chars,omitempty"`

	// EnablePreCheck controls whether pre-delegation grounding checks run.
	// Default: true.
	EnablePreCheck bool `yaml:"enable_pre_check,omitempty" toml:"enable_pre_check,omitempty"`

	// EnablePostCheck controls whether post-execution verification runs.
	// Default: true.
	EnablePostCheck bool `yaml:"enable_post_check,omitempty" toml:"enable_post_check,omitempty"`

	// CrossModelSamples is the number of cross-model samples to generate
	// for full verification. Finch-Zk shows 3 samples with batch judging
	// maintains accuracy while keeping cost manageable. Default: 3.
	CrossModelSamples int `yaml:"cross_model_samples,omitempty" toml:"cross_model_samples,omitempty"`

	// MaxBlocksToJudge caps the number of blocks sent for cross-consistency
	// judging to limit cost on very long outputs. Default: 20.
	MaxBlocksToJudge int `yaml:"max_blocks_to_judge,omitempty" toml:"max_blocks_to_judge,omitempty"`

	// PreCheckThreshold is the confidence score below which a sub-agent
	// goal is rejected as likely fabricated. Range: (0.0–1.0]. Default: 0.4.
	// Lower = more permissive, higher = more strict. A value of 0 (or an
	// omitted field) is treated as "unset" and causes the default to be used.
	PreCheckThreshold float64 `yaml:"pre_check_threshold,omitempty" toml:"pre_check_threshold,omitempty"`
}

Config holds the tuning parameters for hallucination guard behaviour. Zero values use sensible defaults. This struct is embedded in config.GenieConfig and deserialized from the halguard section of genie.toml / genie.yaml.

func DefaultConfig ¶

func DefaultConfig() Config

DefaultConfig returns a Config with sensible defaults.

type GroundingSignals ¶

type GroundingSignals struct {
	// RolePlay detects explicit role-play instructions
	// (e.g. "you are an SRE", "imagine you're", "pretend to be").
	RolePlay float64

	// FabricationPattern detects invented operational data
	// (e.g. "p99 latency spiked from", "error rate jumped from").
	FabricationPattern float64

	// SecondPersonRole detects "You are..." framing at the start of goals.
	SecondPersonRole float64

	// SpecificMetrics detects suspiciously precise numeric claims
	// without tool backing (e.g. "342ms", "2.4%", "1500 req/s").
	SpecificMetrics float64

	// InformationDensity detects an unusually high density of specific
	// technical claims per sentence.
	InformationDensity float64

	// TemporalUrgency detects artificial time pressure language
	// (e.g. "production is down", "immediately", "urgent").
	TemporalUrgency float64
}

GroundingSignals holds the weighted penalties from each fabrication detection signal. A value of 0 means the signal did not fire. All values are in the range [0, weight_max] where weight_max is the signal's maximum contribution to the total penalty.

func (GroundingSignals) HasAny ¶

func (s GroundingSignals) HasAny() bool

HasAny reports whether any signal fired (has a non-zero value).

func (GroundingSignals) MergeScaled ¶

func (s GroundingSignals) MergeScaled(other GroundingSignals, scale float64) GroundingSignals

MergeScaled adds another GroundingSignals scaled by a factor. Used to combine context-field signals at reduced weight.

func (GroundingSignals) Penalty ¶

func (s GroundingSignals) Penalty() float64

Penalty returns the total fabrication penalty as the sum of all signal contributions, capped at 1.0.

func (GroundingSignals) String ¶

func (s GroundingSignals) String() string

String returns a human-readable summary of non-zero signals.

type Guard ¶

type Guard interface {
	// PreCheck scores the sub-agent goal for fabrication risk before execution.
	// Returns a PreCheckResult with a Confidence score between 0.0 (certainly
	// fabricated) and 1.0 (certainly genuine). The caller decides whether to
	// proceed based on the score and a configurable threshold.
	//
	// Uses multi-signal analysis based on structural indicators, semantic
	// patterns, and information density rather than brittle keyword matching.
	PreCheck(ctx context.Context, req PreCheckRequest) (PreCheckResult, error)

	// PostCheck verifies sub-agent output after execution.
	// Returns a VerificationResult with per-block scores and, when contradictions
	// are found, a corrected version of the output. The corrected text preserves
	// accurate blocks and only rewrites contradicted ones.
	PostCheck(ctx context.Context, req PostCheckRequest) (VerificationResult, error)
}

Guard provides pre-delegation and post-execution hallucination checks for sub-agent tool calls. Implementations must be safe for concurrent use.

func New ¶

func New(
	modelProvider modelprovider.ModelProvider,
	textGenerator TextGeneratorFunc,
	opts ...Option,
) Guard

New creates a Guard with the given model provider and options. The model provider is used to collect diverse models for cross-model consistency checking. The textGenerator callback performs one-shot LLM calls with proper tracing; see NewTextGenerator for the recommended implementation. When options are not provided, sensible defaults are used (pre-check enabled, post-check enabled, 3 cross-model samples).

type Option ¶

type Option func(*Config)

Option configures optional behaviour on the Guard.

func WithConfig ¶

func WithConfig(cfg Config) Option

WithConfig applies a Config struct, overriding only non-zero numeric fields and always applying the bool flags. This is the preferred option when the config comes from genie.toml deserialization.

Because TOML/YAML omitempty cannot distinguish "field absent" from "field set to zero/false", callers who want to explicitly disable pre-check or post-check must set the field in the config file:

[halguard]
enable_pre_check = false

When no halguard section exists, all fields are zero and the defaults from New() are preserved.

func WithCrossModelSamples ¶

func WithCrossModelSamples(n int) Option

WithCrossModelSamples sets the number of cross-model samples to generate.

func WithFullThreshold ¶

func WithFullThreshold(chars int) Option

WithFullThreshold sets the character count above which full Finch-Zk verification is applied.

func WithLightThreshold ¶

func WithLightThreshold(chars int) Option

WithLightThreshold sets the character count above which light verification is applied.

func WithPostCheck ¶

func WithPostCheck(enable bool) Option

WithPostCheck enables or disables the post-execution output verification.

func WithPreCheck ¶

func WithPreCheck(enable bool) Option

WithPreCheck enables or disables the pre-delegation grounding check.

type PostCheckRequest ¶

type PostCheckRequest struct {
	// Goal is the original goal given to the sub-agent.
	Goal string

	// Context is the optional context string provided alongside the goal.
	Context string

	// Output is the sub-agent's raw text output to verify.
	Output string

	// ToolCallsMade is the number of tool calls the sub-agent executed.
	// A higher count suggests the output is more grounded in real data.
	ToolCallsMade int

	// GenerationModel identifies the model that generated the output
	// (e.g. "claude-sonnet-4-6"). Used to select a different model family
	// for cross-model verification per Finch-Zk §2.5.
	GenerationModel modelprovider.ModelMap
}

PostCheckRequest is the input for Guard.PostCheck.

type PreCheckRequest ¶

type PreCheckRequest struct {
	// Goal is the sub-agent's goal as specified by the parent agent.
	Goal string

	// Context is the optional context string provided alongside the goal.
	Context string

	// ToolNames lists the tools assigned to the sub-agent.
	ToolNames []string
}

PreCheckRequest is the input for Guard.PreCheck.

type PreCheckResult ¶

type PreCheckResult struct {
	// Confidence is the probability that the goal is genuine and grounded
	// in reality. Range: 0.0 (certainly fabricated) to 1.0 (certainly genuine).
	// The caller compares this against a configurable threshold to decide
	// whether to proceed with execution.
	Confidence float64

	// Signals contains the individual signal contributions that produced
	// the confidence score. Each field represents a distinct fabrication
	// signal with its weighted penalty.
	Signals GroundingSignals

	// Summary is a human-readable explanation of the assessment.
	Summary string
}

PreCheckResult carries the grounding assessment for a sub-agent goal.

type TextGeneratorFunc ¶

type TextGeneratorFunc func(ctx context.Context, m model.Model, prompt string) (string, error)

TextGeneratorFunc performs a one-shot LLM text generation given a specific model and prompt. The caller is responsible for creating this function with proper tracing wired in (e.g. using expert.Expert backed by the trpc-agent-go runner pipeline). This callback pattern avoids a direct import from halguard to the expert package, which would create an import cycle through config.

type VerificationResult ¶

type VerificationResult struct {
	// IsFactual is true when no contradictions were detected.
	IsFactual bool

	// CorrectedText holds the corrected output when contradictions were
	// found and targeted corrections were applied. When IsFactual is true,
	// this equals the original output unchanged.
	CorrectedText string

	// BlockScores holds per-block verification results for observability.
	// Only populated for Light and Full tier verifications.
	BlockScores []BlockScore

	// Tier indicates which verification level was applied.
	Tier VerifyTier
}

VerificationResult carries the outcome of a PostCheck verification.

type VerifyTier ¶

type VerifyTier string

VerifyTier indicates the level of verification applied to an output.

const (
	// TierNone means no verification was applied (short, tool-grounded output).
	TierNone VerifyTier = "none"

	// TierLight means a single-model sanity check was applied.
	TierLight VerifyTier = "light"

	// TierFull means the full cross-model Finch-Zk pipeline was applied.
	TierFull VerifyTier = "full"
)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
halguardfakes Code generated by counterfeiter.	Code generated by counterfeiter.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL