evals

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package evals provides a Go-native eval runner for voice pipeline scenarios.

Index

Constants

View Source
const DefaultEvalTimeout = 60 * time.Second

DefaultEvalTimeout is used when scenario TimeoutSecs is zero.

Variables

This section is empty.

Functions

func WriteReport

func WriteReport(results []EvalResult, dir string) error

WriteReport writes results to dir as results.json and prints a human-readable summary to stdout.

Types

type EvalConfig

type EvalConfig struct {
	Scenarios []EvalScenario `json:"scenarios"`
}

EvalConfig is the top-level config file (e.g. scenarios.json).

func LoadEvalConfig

func LoadEvalConfig(path string) (*EvalConfig, error)

LoadEvalConfig reads and parses a JSON eval config from path.

type EvalResult

type EvalResult struct {
	Name     string  `json:"name"`
	Pass     bool    `json:"pass"`
	Duration float64 `json:"duration_secs"`
	Output   string  `json:"output,omitempty"`
	Error    string  `json:"error,omitempty"`
}

EvalResult is the outcome of running one scenario.

func RunScenario

func RunScenario(ctx context.Context, voxrayConfigPath string, scenario EvalScenario) EvalResult

RunScenario runs a single eval scenario: builds an LLM-only pipeline, injects the prompt as a TranscriptionFrame, collects LLM text output, and asserts using ExpectedPattern (regex) or ExpectedContains (substring).

type EvalScenario

type EvalScenario struct {
	Name             string  `json:"name"`
	Prompt           string  `json:"prompt"`
	ExpectedPattern  string  `json:"expected_pattern"`  // regex or substring; match in response => pass
	ExpectedContains string  `json:"expected_contains"` // alternative: simple substring (if set, used instead of regex)
	TimeoutSecs      float64 `json:"timeout_secs,omitempty"`
	SystemPrompt     string  `json:"system_prompt,omitempty"` // optional override for LLM system message
}

EvalScenario defines a single eval: prompt to send and how to assert the response.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL