assertions

package
v1.3.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2026 License: Apache-2.0 Imports: 9 Imported by: 1

Documentation

Index

Constants

View Source
const PackEvalTypePrefix = "pack_eval:"

PackEvalTypePrefix is prepended to eval types when converting to assertion results. Report renderers use this prefix to distinguish pack eval results from scenario assertions.

Variables

This section is empty.

Functions

func ExtractDurationMs added in v1.3.2

func ExtractDurationMs(details map[string]interface{}) (int64, bool)

ExtractDurationMs extracts eval duration in milliseconds from a Details map. Returns (duration, true) if present and numeric, (0, false) otherwise.

func ExtractEvalID added in v1.3.2

func ExtractEvalID(details map[string]interface{}) (string, bool)

ExtractEvalID extracts the eval ID string from a Details map. Returns (id, true) if present and a string, ("", false) otherwise.

func ExtractScore added in v1.3.2

func ExtractScore(details map[string]interface{}) (float64, bool)

ExtractScore extracts a float64 score from a Details map. Returns (score, true) if present and numeric, (0, false) otherwise.

func IsPackEval added in v1.3.2

func IsPackEval(cvr ConversationValidationResult) bool

IsPackEval returns true if the ConversationValidationResult originated from a pack eval.

func IsSkipped added in v1.3.2

func IsSkipped(details map[string]interface{}) (string, bool)

IsSkipped checks whether a Details map indicates the eval was skipped. Returns (reason, true) if skipped, ("", false) otherwise.

Types

type AssertionConfig added in v1.1.0

type AssertionConfig struct {
	Type    string                 `json:"type" yaml:"type"`
	Params  map[string]interface{} `json:"params" yaml:"params"`
	Message string                 `json:"message,omitempty" yaml:"message,omitempty"`
	When    *AssertionWhen         `json:"when,omitempty" yaml:"when,omitempty"`
}

AssertionConfig extends ValidatorConfig with arena-specific fields

func (AssertionConfig) ToConversationEvalDef added in v1.3.2

func (a AssertionConfig) ToConversationEvalDef(index int) evals.EvalDef

ToConversationEvalDef converts an AssertionConfig to an evals.EvalDef with TriggerOnConversationComplete. Used for conversation-level assertions.

func (AssertionConfig) ToEvalDef added in v1.3.2

func (a AssertionConfig) ToEvalDef(index int) evals.EvalDef

ToEvalDef converts an AssertionConfig to an evals.EvalDef. This is the bridge for unifying arena assertions with runtime evals.

type AssertionResult added in v1.1.0

type AssertionResult struct {
	Passed  bool        `json:"passed"`
	Details interface{} `json:"details,omitempty"`
	Message string      `json:"message,omitempty"` // Human-readable description from config
}

AssertionResult holds the result of an assertion evaluation.

type AssertionWhen added in v1.3.2

type AssertionWhen struct {
	// ToolCalled requires an exact tool name to have been called.
	ToolCalled string `json:"tool_called,omitempty" yaml:"tool_called,omitempty"`
	// ToolCalledPattern is a regex that must match at least one tool name.
	ToolCalledPattern string `json:"tool_called_pattern,omitempty" yaml:"tool_called_pattern,omitempty"`
	// AnyToolCalled requires at least one tool to have been called.
	AnyToolCalled bool `json:"any_tool_called,omitempty" yaml:"any_tool_called,omitempty"`
	// MinToolCalls is the minimum number of tool calls required.
	MinToolCalls int `json:"min_tool_calls,omitempty" yaml:"min_tool_calls,omitempty"`
}

AssertionWhen specifies preconditions that must be met for an assertion to run. If any condition is not met, the assertion is skipped (not failed).

func (*AssertionWhen) ShouldRun added in v1.3.2

func (w *AssertionWhen) ShouldRun(
	params map[string]interface{},
) (shouldRun bool, reason string)

ShouldRun evaluates when-conditions against the current turn's tool trace. Returns whether the assertion should run and a reason string if skipped. When no tool trace is available (e.g. duplex path), returns true to let the validator itself decide how to handle the missing data.

type ConversationAssertion added in v1.1.3

type ConversationAssertion struct {
	Type    string                 `json:"type" yaml:"type"`
	Params  map[string]interface{} `json:"params" yaml:"params"`
	Message string                 `json:"message" yaml:"message"`
	When    *AssertionWhen         `json:"when,omitempty" yaml:"when,omitempty"`
}

ConversationAssertion defines an assertion to evaluate across an entire conversation. Unlike turn-level assertions that check individual responses, conversation assertions evaluate patterns, behaviors, or constraints across all turns in a self-play scenario.

type ConversationContext added in v1.1.3

type ConversationContext struct {
	// AllTurns contains the complete conversation history in chronological order.
	// Includes all messages from all roles (system, user, assistant, tool).
	AllTurns []types.Message

	// ToolCalls contains all tool invocations with their results.
	// Ordered chronologically to allow sequential analysis.
	ToolCalls []ToolCallRecord

	// Metadata provides scenario/execution context for the conversation.
	Metadata ConversationMetadata
}

ConversationContext provides all data needed to evaluate conversation-level assertions. This aggregates the complete conversation history, tool usage, and metadata for comprehensive validation across multiple turns.

func BuildConversationContextFromMessages added in v1.3.1

func BuildConversationContextFromMessages(
	messages []types.Message,
	metadata *ConversationMetadata,
) *ConversationContext

BuildConversationContextFromMessages constructs a ConversationContext from a sequence of messages and caller-supplied metadata. It extracts tool-call records from assistant messages, matches them with tool-role result messages, and aggregates cost/token information.

Callers are responsible for populating Metadata.Extras with any engine-specific data (e.g. judge targets, prompt registries) before or after calling this function.

type ConversationMetadata added in v1.1.3

type ConversationMetadata struct {
	ScenarioID     string                 `json:"scenario_id"`      // The scenario being tested
	PersonaID      string                 `json:"persona_id"`       // Persona used for self-play (if any)
	Variables      map[string]interface{} `json:"variables"`        // Variables passed to prompts
	PromptConfigID string                 `json:"prompt_config_id"` // Which prompt configuration was used
	ProviderID     string                 `json:"provider_id"`      // Which LLM provider was used
	TotalCost      float64                `json:"total_cost"`       // Total cost in USD across all turns
	TotalTokens    int                    `json:"total_tokens"`     // Total tokens used (input + output)
	Extras         map[string]interface{} `json:"extras,omitempty"` // Additional metadata (e.g., judge targets/defaults)
}

ConversationMetadata provides context about the conversation execution. Useful for conditional validation based on scenario characteristics.

type ConversationValidationResult added in v1.1.3

type ConversationValidationResult struct {
	Type    string                 `json:"type,omitempty"`    // Validator type (e.g., tools_not_called_with_args)
	Passed  bool                   `json:"passed"`            // Whether the assertion passed
	Message string                 `json:"message"`           // Human-readable result explanation
	Details map[string]interface{} `json:"details,omitempty"` // Structured details for debugging

	// For aggregated assertions (e.g., checking all turns), evidence of individual violations.
	// Helps users understand exactly which turns or actions failed the assertion.
	Violations []ConversationViolation `json:"violations,omitempty"`
}

ConversationValidationResult contains the outcome of a conversation-level assertion. Provides structured details for debugging and reporting when assertions fail.

func ConvertEvalResults added in v1.3.2

func ConvertEvalResults(results []evals.EvalResult) []ConversationValidationResult

ConvertEvalResults transforms a slice of EvalResult into ConversationValidationResult entries. Each result is tagged with the PackEvalTypePrefix so renderers can group them separately. This function is used by both the PackEvalAdapter (engine) and the statestore when building AssertionsSummary from eval results.

type ConversationValidator added in v1.1.3

type ConversationValidator interface {
	// Type returns the validator name (e.g., "tools_not_called_with_args").
	// Must match the type specified in ConversationAssertion configs.
	Type() string

	// ValidateConversation evaluates the assertion against the full conversation.
	// Returns a result indicating success/failure with detailed evidence.
	ValidateConversation(
		ctx context.Context,
		convCtx *ConversationContext,
		params map[string]interface{},
	) ConversationValidationResult
}

ConversationValidator evaluates assertions across entire conversations. Implementations check patterns, constraints, or behaviors that span multiple turns, such as "no forbidden tool arguments used" or "consistent behavior maintained".

type ConversationViolation added in v1.1.3

type ConversationViolation struct {
	TurnIndex   int                    `json:"turn_index"`          // Which turn (index in AllTurns) had the violation
	Description string                 `json:"description"`         // What was violated (human-readable)
	Evidence    map[string]interface{} `json:"evidence,omitempty"`  // Data supporting the violation (e.g., actual values)
	Timestamp   time.Time              `json:"timestamp,omitempty"` // When the violation occurred (if available)
}

ConversationViolation represents a single assertion violation within the conversation. Captures exactly where and how the assertion was violated for precise debugging.

type ToolCallRecord added in v1.1.3

type ToolCallRecord = types.ToolCallRecord

ToolCallRecord is an alias for types.ToolCallRecord so existing code referencing assertions.ToolCallRecord continues to compile unchanged.

type TurnToolCall added in v1.3.2

type TurnToolCall struct {
	CallID     string                 // from MessageToolCall.ID
	Name       string                 // tool name
	Args       map[string]interface{} // parsed arguments
	RawArgs    json.RawMessage        // original JSON arguments
	Result     string                 // from MessageToolResult.Content
	Error      string                 // from MessageToolResult.Error
	LatencyMs  int64                  // from MessageToolResult.LatencyMs
	RoundIndex int                    // which tool-use round within the turn (0-based)
	// contains filtered or unexported fields
}

TurnToolCall represents a single tool call within a turn, paired with its result. This provides the ordered, result-paired trace needed for turn-level tool assertions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL