Documentation
¶
Index ¶
- Constants
- func ExtractDurationMs(details map[string]interface{}) (int64, bool)
- func ExtractEvalID(details map[string]interface{}) (string, bool)
- func ExtractScore(details map[string]interface{}) (float64, bool)
- func IsPackEval(cvr ConversationValidationResult) bool
- func IsSkipped(details map[string]interface{}) (string, bool)
- type AssertionConfig
- type AssertionResult
- type AssertionWhen
- type ConversationAssertion
- type ConversationContext
- type ConversationMetadata
- type ConversationValidationResult
- type ConversationValidator
- type ConversationViolation
- type ToolCallRecord
- type TurnToolCall
Constants ¶
const PackEvalTypePrefix = "pack_eval:"
PackEvalTypePrefix is prepended to eval types when converting to assertion results. Report renderers use this prefix to distinguish pack eval results from scenario assertions.
Variables ¶
This section is empty.
Functions ¶
func ExtractDurationMs ¶ added in v1.3.2
ExtractDurationMs extracts eval duration in milliseconds from a Details map. Returns (duration, true) if present and numeric, (0, false) otherwise.
func ExtractEvalID ¶ added in v1.3.2
ExtractEvalID extracts the eval ID string from a Details map. Returns (id, true) if present and a string, ("", false) otherwise.
func ExtractScore ¶ added in v1.3.2
ExtractScore extracts a float64 score from a Details map. Returns (score, true) if present and numeric, (0, false) otherwise.
func IsPackEval ¶ added in v1.3.2
func IsPackEval(cvr ConversationValidationResult) bool
IsPackEval returns true if the ConversationValidationResult originated from a pack eval.
Types ¶
type AssertionConfig ¶ added in v1.1.0
type AssertionConfig struct {
Type string `json:"type" yaml:"type"`
Params map[string]interface{} `json:"params" yaml:"params"`
Message string `json:"message,omitempty" yaml:"message,omitempty"`
When *AssertionWhen `json:"when,omitempty" yaml:"when,omitempty"`
}
AssertionConfig extends ValidatorConfig with arena-specific fields
func (AssertionConfig) ToConversationEvalDef ¶ added in v1.3.2
func (a AssertionConfig) ToConversationEvalDef(index int) evals.EvalDef
ToConversationEvalDef converts an AssertionConfig to an evals.EvalDef with TriggerOnConversationComplete. Used for conversation-level assertions.
type AssertionResult ¶ added in v1.1.0
type AssertionResult struct {
Passed bool `json:"passed"`
Details interface{} `json:"details,omitempty"`
Message string `json:"message,omitempty"` // Human-readable description from config
}
AssertionResult holds the result of an assertion evaluation.
type AssertionWhen ¶ added in v1.3.2
type AssertionWhen struct {
// ToolCalled requires an exact tool name to have been called.
ToolCalled string `json:"tool_called,omitempty" yaml:"tool_called,omitempty"`
// ToolCalledPattern is a regex that must match at least one tool name.
ToolCalledPattern string `json:"tool_called_pattern,omitempty" yaml:"tool_called_pattern,omitempty"`
// AnyToolCalled requires at least one tool to have been called.
AnyToolCalled bool `json:"any_tool_called,omitempty" yaml:"any_tool_called,omitempty"`
// MinToolCalls is the minimum number of tool calls required.
MinToolCalls int `json:"min_tool_calls,omitempty" yaml:"min_tool_calls,omitempty"`
}
AssertionWhen specifies preconditions that must be met for an assertion to run. If any condition is not met, the assertion is skipped (not failed).
func (*AssertionWhen) ShouldRun ¶ added in v1.3.2
func (w *AssertionWhen) ShouldRun( params map[string]interface{}, ) (shouldRun bool, reason string)
ShouldRun evaluates when-conditions against the current turn's tool trace. Returns whether the assertion should run and a reason string if skipped. When no tool trace is available (e.g. duplex path), returns true to let the validator itself decide how to handle the missing data.
type ConversationAssertion ¶ added in v1.1.3
type ConversationAssertion struct {
Type string `json:"type" yaml:"type"`
Params map[string]interface{} `json:"params" yaml:"params"`
Message string `json:"message" yaml:"message"`
When *AssertionWhen `json:"when,omitempty" yaml:"when,omitempty"`
}
ConversationAssertion defines an assertion to evaluate across an entire conversation. Unlike turn-level assertions that check individual responses, conversation assertions evaluate patterns, behaviors, or constraints across all turns in a self-play scenario.
type ConversationContext ¶ added in v1.1.3
type ConversationContext struct {
// AllTurns contains the complete conversation history in chronological order.
// Includes all messages from all roles (system, user, assistant, tool).
AllTurns []types.Message
// ToolCalls contains all tool invocations with their results.
// Ordered chronologically to allow sequential analysis.
ToolCalls []ToolCallRecord
// Metadata provides scenario/execution context for the conversation.
Metadata ConversationMetadata
}
ConversationContext provides all data needed to evaluate conversation-level assertions. This aggregates the complete conversation history, tool usage, and metadata for comprehensive validation across multiple turns.
func BuildConversationContextFromMessages ¶ added in v1.3.1
func BuildConversationContextFromMessages( messages []types.Message, metadata *ConversationMetadata, ) *ConversationContext
BuildConversationContextFromMessages constructs a ConversationContext from a sequence of messages and caller-supplied metadata. It extracts tool-call records from assistant messages, matches them with tool-role result messages, and aggregates cost/token information.
Callers are responsible for populating Metadata.Extras with any engine-specific data (e.g. judge targets, prompt registries) before or after calling this function.
type ConversationMetadata ¶ added in v1.1.3
type ConversationMetadata struct {
ScenarioID string `json:"scenario_id"` // The scenario being tested
PersonaID string `json:"persona_id"` // Persona used for self-play (if any)
Variables map[string]interface{} `json:"variables"` // Variables passed to prompts
PromptConfigID string `json:"prompt_config_id"` // Which prompt configuration was used
ProviderID string `json:"provider_id"` // Which LLM provider was used
TotalCost float64 `json:"total_cost"` // Total cost in USD across all turns
TotalTokens int `json:"total_tokens"` // Total tokens used (input + output)
Extras map[string]interface{} `json:"extras,omitempty"` // Additional metadata (e.g., judge targets/defaults)
}
ConversationMetadata provides context about the conversation execution. Useful for conditional validation based on scenario characteristics.
type ConversationValidationResult ¶ added in v1.1.3
type ConversationValidationResult struct {
Type string `json:"type,omitempty"` // Validator type (e.g., tools_not_called_with_args)
Passed bool `json:"passed"` // Whether the assertion passed
Message string `json:"message"` // Human-readable result explanation
Details map[string]interface{} `json:"details,omitempty"` // Structured details for debugging
// For aggregated assertions (e.g., checking all turns), evidence of individual violations.
// Helps users understand exactly which turns or actions failed the assertion.
Violations []ConversationViolation `json:"violations,omitempty"`
}
ConversationValidationResult contains the outcome of a conversation-level assertion. Provides structured details for debugging and reporting when assertions fail.
func ConvertEvalResults ¶ added in v1.3.2
func ConvertEvalResults(results []evals.EvalResult) []ConversationValidationResult
ConvertEvalResults transforms a slice of EvalResult into ConversationValidationResult entries. Each result is tagged with the PackEvalTypePrefix so renderers can group them separately. This function is used by both the PackEvalAdapter (engine) and the statestore when building AssertionsSummary from eval results.
type ConversationValidator ¶ added in v1.1.3
type ConversationValidator interface {
// Type returns the validator name (e.g., "tools_not_called_with_args").
// Must match the type specified in ConversationAssertion configs.
Type() string
// ValidateConversation evaluates the assertion against the full conversation.
// Returns a result indicating success/failure with detailed evidence.
ValidateConversation(
ctx context.Context,
convCtx *ConversationContext,
params map[string]interface{},
) ConversationValidationResult
}
ConversationValidator evaluates assertions across entire conversations. Implementations check patterns, constraints, or behaviors that span multiple turns, such as "no forbidden tool arguments used" or "consistent behavior maintained".
type ConversationViolation ¶ added in v1.1.3
type ConversationViolation struct {
TurnIndex int `json:"turn_index"` // Which turn (index in AllTurns) had the violation
Description string `json:"description"` // What was violated (human-readable)
Evidence map[string]interface{} `json:"evidence,omitempty"` // Data supporting the violation (e.g., actual values)
Timestamp time.Time `json:"timestamp,omitempty"` // When the violation occurred (if available)
}
ConversationViolation represents a single assertion violation within the conversation. Captures exactly where and how the assertion was violated for precise debugging.
type ToolCallRecord ¶ added in v1.1.3
type ToolCallRecord = types.ToolCallRecord
ToolCallRecord is an alias for types.ToolCallRecord so existing code referencing assertions.ToolCallRecord continues to compile unchanged.
type TurnToolCall ¶ added in v1.3.2
type TurnToolCall struct {
CallID string // from MessageToolCall.ID
Name string // tool name
Args map[string]interface{} // parsed arguments
RawArgs json.RawMessage // original JSON arguments
Result string // from MessageToolResult.Content
Error string // from MessageToolResult.Error
LatencyMs int64 // from MessageToolResult.LatencyMs
RoundIndex int // which tool-use round within the turn (0-based)
// contains filtered or unexported fields
}
TurnToolCall represents a single tool call within a turn, paired with its result. This provides the ordered, result-paired trace needed for turn-level tool assertions.