assertions

package

v1.4.6 Latest Latest Go to latest Published: Apr 20, 2026 License: Apache-2.0 Imports: 10 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AltairaLabs/PromptKit

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func ExtractDurationMs(details map[string]interface{}) (int64, bool)
func ExtractEvalID(details map[string]interface{}) (string, bool)
func ExtractScore(details map[string]interface{}) (float64, bool)
func IsPackEval(cvr ConversationValidationResult) bool
func IsSkipped(details map[string]interface{}) (string, bool)
func ToConversationEvalDef(a AssertionConfig, index int) evals.EvalDef
func ToEvalDef(a AssertionConfig, index int) evals.EvalDef
func ValidateAssertionTypes(scenarios map[string]*config.Scenario, registry *evals.EvalTypeRegistry) []string
type AssertionConfig
type AssertionResult
type AssertionWhen
type ConversationAssertion
type ConversationContext
- func BuildConversationContextFromMessages(messages []types.Message, metadata *ConversationMetadata) *ConversationContext
type ConversationMetadata
type ConversationValidationResult
- func ConvertEvalResults(results []evals.EvalResult) []ConversationValidationResult
type ConversationValidator
type ConversationViolation
type ToolCallRecord
type TurnToolCall
type WhenEvaluator
- func NewWhenEvaluator(w *config.AssertionWhen) *WhenEvaluator
- func (e *WhenEvaluator) ShouldRun(params map[string]interface{}) (shouldRun bool, reason string)

Constants ¶

View Source

const PackEvalTypePrefix = "pack_eval:"

PackEvalTypePrefix is prepended to eval types when converting to assertion results. Report renderers use this prefix to distinguish pack eval results from scenario assertions.

Variables ¶

This section is empty.

Functions ¶

func ExtractDurationMs ¶ added in v1.3.2

func ExtractDurationMs(details map[string]interface{}) (int64, bool)

ExtractDurationMs extracts eval duration in milliseconds from a Details map. Returns (duration, true) if present and numeric, (0, false) otherwise.

func ExtractEvalID ¶ added in v1.3.2

func ExtractEvalID(details map[string]interface{}) (string, bool)

ExtractEvalID extracts the eval ID string from a Details map. Returns (id, true) if present and a string, ("", false) otherwise.

func ExtractScore ¶ added in v1.3.2

func ExtractScore(details map[string]interface{}) (float64, bool)

ExtractScore extracts a float64 score from a Details map. Returns (score, true) if present and numeric, (0, false) otherwise.

func IsPackEval ¶ added in v1.3.2

func IsPackEval(cvr ConversationValidationResult) bool

IsPackEval returns true if the ConversationValidationResult originated from a pack eval.

func IsSkipped ¶ added in v1.3.2

func IsSkipped(details map[string]interface{}) (string, bool)

IsSkipped checks whether a Details map indicates the eval was skipped. Returns (reason, true) if skipped, ("", false) otherwise.

func ToConversationEvalDef ¶ added in v1.3.10

func ToConversationEvalDef(a AssertionConfig, index int) evals.EvalDef

ToConversationEvalDef converts an AssertionConfig to an evals.EvalDef with TriggerOnConversationComplete. Used for conversation-level assertions.

func ToEvalDef ¶ added in v1.3.10

func ToEvalDef(a AssertionConfig, index int) evals.EvalDef

ToEvalDef converts an AssertionConfig to an evals.EvalDef with type "assertion". The original eval type and params are nested under eval_type/eval_params, while assertion-specific properties (min_score, max_score) stay at the top level.

func ValidateAssertionTypes ¶ added in v1.4.3

func ValidateAssertionTypes(scenarios map[string]*config.Scenario, registry *evals.EvalTypeRegistry) []string

ValidateAssertionTypes checks all assertion types in loaded scenarios against the eval handler registry. Returns a list of human-readable error strings for unknown types, each with a "did you mean?" suggestion when a close match exists.

Types ¶

type AssertionConfig ¶ added in v1.1.0

type AssertionConfig = config.AssertionConfig

AssertionConfig is an alias for config.AssertionConfig. The canonical type lives in pkg/config to keep the dependency direction correct (shared library must not import application tools). The alias preserves backward compatibility so existing arena code continues to compile unchanged.

type AssertionResult ¶ added in v1.1.0

type AssertionResult struct {
	Passed  bool        `json:"passed"`
	Details interface{} `json:"details,omitempty"`
	Message string      `json:"message,omitempty"` // Human-readable description from config
}

AssertionResult holds the result of an assertion evaluation.

type AssertionWhen ¶ added in v1.3.2

type AssertionWhen = config.AssertionWhen

AssertionWhen is an alias for config.AssertionWhen.

type ConversationAssertion ¶ added in v1.1.3

type ConversationAssertion struct {
	Type          string                 `json:"type" yaml:"type"`
	Params        map[string]interface{} `json:"params" yaml:"params"`
	Message       string                 `json:"message" yaml:"message"`
	When          *AssertionWhen         `json:"when,omitempty" yaml:"when,omitempty"`
	PassThreshold *float64               `json:"pass_threshold,omitempty" yaml:"pass_threshold,omitempty"`
}

ConversationAssertion defines an assertion to evaluate across an entire conversation. Unlike turn-level assertions that check individual responses, conversation assertions evaluate patterns, behaviors, or constraints across all turns in a self-play scenario.

type ConversationContext ¶ added in v1.1.3

type ConversationContext struct {
	// AllTurns contains the complete conversation history in chronological order.
	// Includes all messages from all roles (system, user, assistant, tool).
	AllTurns []types.Message

	// ToolCalls contains all tool invocations with their results.
	// Ordered chronologically to allow sequential analysis.
	ToolCalls []ToolCallRecord

	// Metadata provides scenario/execution context for the conversation.
	Metadata ConversationMetadata
}

ConversationContext provides all data needed to evaluate conversation-level assertions. This aggregates the complete conversation history, tool usage, and metadata for comprehensive validation across multiple turns.

func BuildConversationContextFromMessages ¶ added in v1.3.1

func BuildConversationContextFromMessages(
	messages []types.Message,
	metadata *ConversationMetadata,
) *ConversationContext

BuildConversationContextFromMessages constructs a ConversationContext from a sequence of messages and caller-supplied metadata. It extracts tool-call records from assistant messages, matches them with tool-role result messages, and aggregates cost/token information.

Callers are responsible for populating Metadata.Extras with any engine-specific data (e.g. judge targets, prompt registries) before or after calling this function.

type ConversationMetadata ¶ added in v1.1.3

type ConversationMetadata struct {
	ScenarioID     string                 `json:"scenario_id"`      // The scenario being tested
	PersonaID      string                 `json:"persona_id"`       // Persona used for self-play (if any)
	Variables      map[string]interface{} `json:"variables"`        // Variables passed to prompts
	PromptConfigID string                 `json:"prompt_config_id"` // Which prompt configuration was used
	ProviderID     string                 `json:"provider_id"`      // Which LLM provider was used
	TotalCost      float64                `json:"total_cost"`       // Total cost in USD across all turns
	TotalTokens    int                    `json:"total_tokens"`     // Total tokens used (input + output)
	Extras         map[string]interface{} `json:"extras,omitempty"` // Additional metadata (e.g., judge targets/defaults)
}

ConversationMetadata provides context about the conversation execution. Useful for conditional validation based on scenario characteristics.

type ConversationValidationResult ¶ added in v1.1.3

type ConversationValidationResult struct {
	Type    string                 `json:"type,omitempty"`    // Validator type (e.g., tools_not_called_with_args)
	Passed  bool                   `json:"passed"`            // Whether the assertion passed
	Message string                 `json:"message"`           // Human-readable result explanation
	Details map[string]interface{} `json:"details,omitempty"` // Structured details for debugging

	// For aggregated assertions (e.g., checking all turns), evidence of individual violations.
	// Helps users understand exactly which turns or actions failed the assertion.
	Violations []ConversationViolation `json:"violations,omitempty"`
}

ConversationValidationResult contains the outcome of a conversation-level assertion. Provides structured details for debugging and reporting when assertions fail.

func ConvertEvalResults ¶ added in v1.3.2

func ConvertEvalResults(results []evals.EvalResult) []ConversationValidationResult

ConvertEvalResults transforms a slice of EvalResult into ConversationValidationResult entries. Each result is tagged with the PackEvalTypePrefix so renderers can group them separately. This function is used by both the PackEvalAdapter (engine) and the statestore when building AssertionsSummary from eval results.

type ConversationValidator ¶ added in v1.1.3

type ConversationValidator interface {
	// Type returns the validator name (e.g., "tools_not_called_with_args").
	// Must match the type specified in ConversationAssertion configs.
	Type() string

	// ValidateConversation evaluates the assertion against the full conversation.
	// Returns a result indicating success/failure with detailed evidence.
	ValidateConversation(
		ctx context.Context,
		convCtx *ConversationContext,
		params map[string]interface{},
	) ConversationValidationResult
}

ConversationValidator evaluates assertions across entire conversations. Implementations check patterns, constraints, or behaviors that span multiple turns, such as "no forbidden tool arguments used" or "consistent behavior maintained".

type ConversationViolation ¶ added in v1.1.3

type ConversationViolation struct {
	TurnIndex   int                    `json:"turn_index"`          // Which turn (index in AllTurns) had the violation
	Description string                 `json:"description"`         // What was violated (human-readable)
	Evidence    map[string]interface{} `json:"evidence,omitempty"`  // Data supporting the violation (e.g., actual values)
	Timestamp   time.Time              `json:"timestamp,omitempty"` // When the violation occurred (if available)
}

ConversationViolation represents a single assertion violation within the conversation. Captures exactly where and how the assertion was violated for precise debugging.

type ToolCallRecord ¶ added in v1.1.3

type ToolCallRecord = types.ToolCallRecord

ToolCallRecord is an alias for types.ToolCallRecord so existing code referencing assertions.ToolCallRecord continues to compile unchanged.

type TurnToolCall ¶ added in v1.3.2

type TurnToolCall struct {
	CallID     string                 // from MessageToolCall.ID
	Name       string                 // tool name
	Args       map[string]interface{} // parsed arguments
	RawArgs    json.RawMessage        // original JSON arguments
	Result     string                 // from MessageToolResult.Content
	Error      string                 // from MessageToolResult.Error
	LatencyMs  int64                  // from MessageToolResult.LatencyMs
	RoundIndex int                    // which tool-use round within the turn (0-based)
	// contains filtered or unexported fields
}

TurnToolCall represents a single tool call within a turn, paired with its result. This provides the ordered, result-paired trace needed for turn-level tool assertions.

type WhenEvaluator ¶ added in v1.3.10

type WhenEvaluator struct {
	// contains filtered or unexported fields
}

WhenEvaluator wraps an AssertionWhen with compiled-regex caching for efficient repeated evaluation of tool-call pattern conditions.

func NewWhenEvaluator ¶ added in v1.3.10

func NewWhenEvaluator(w *config.AssertionWhen) *WhenEvaluator

NewWhenEvaluator creates a WhenEvaluator for the given condition.

func (*WhenEvaluator) ShouldRun ¶ added in v1.3.10

func (e *WhenEvaluator) ShouldRun(
	params map[string]interface{},
) (shouldRun bool, reason string)

ShouldRun evaluates when-conditions against the current turn's tool trace. Returns whether the assertion should run and a reason string if skipped. When no tool trace is available (e.g. duplex path), returns true to let the validator itself decide how to handle the missing data.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL