Documentation
¶
Index ¶
- func BuildDigest(testOutcomes []models.TestOutcome, durationMs int64, runsPerTest int) models.OutcomeDigest
- func ComputeTestStats(runs []models.RunResult) *models.TestStats
- func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error)
- func RegradeOutcome(original *models.EvaluationOutcome, gradedOutcomes []models.TestOutcome, ...) *models.EvaluationOutcome
- type EvalRunner
- type EventType
- type ProgressEvent
- type ProgressListener
- type RunnerOption
- type TestRunnerdeprecated
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BuildDigest ¶ added in v0.22.0
func BuildDigest(testOutcomes []models.TestOutcome, durationMs int64, runsPerTest int) models.OutcomeDigest
BuildDigest computes an OutcomeDigest from test outcomes. durationMs is the total wall-clock duration to store in the digest. runsPerTest controls whether digest-level bootstrap CI is computed (requires > 1).
func ComputeTestStats ¶ added in v0.22.0
ComputeTestStats computes aggregate statistics for a set of run results.
func FilterTestCases ¶
func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error)
FilterTestCases returns the subset of testCases based on whether it matches tags or task display name, or task id glob patterns. - taskPatterns - matches either the task display name or the task ID. - tagPatterns - matches tags.
If taskPatterns and tagPatterns are specified the result is the intersection of the matches between them. If both taskPatterns and tagPatterns are empty, all test cases are returned.
func RegradeOutcome ¶ added in v0.22.0
func RegradeOutcome(original *models.EvaluationOutcome, gradedOutcomes []models.TestOutcome, judgeModel string) *models.EvaluationOutcome
RegradeOutcome produces a new EvaluationOutcome by replacing test outcomes in the original with the graded ones and recomputing stats and digest.
Types ¶
type EvalRunner ¶ added in v0.31.0
type EvalRunner struct {
// contains filtered or unexported fields
}
EvalRunner orchestrates the execution of tests.
Deprecated alias: TestRunner is provided for backward compatibility.
func NewEvalRunner ¶ added in v0.31.0
func NewEvalRunner(cfg *config.EvalConfig, engine execution.AgentEngine, opts ...RunnerOption) *EvalRunner
NewEvalRunner creates a new test runner. The caller owns the engine and is responsible for initializing and shutting it down as needed.
func NewTestRunner
deprecated
func NewTestRunner(cfg *config.EvalConfig, engine execution.AgentEngine, opts ...RunnerOption) *EvalRunner
Deprecated: Use NewEvalRunner instead.
func (*EvalRunner) OnProgress ¶ added in v0.31.0
func (r *EvalRunner) OnProgress(listener ProgressListener)
OnProgress registers a progress listener
func (*EvalRunner) RunBenchmark ¶ added in v0.31.0
func (r *EvalRunner) RunBenchmark(ctx context.Context) (*models.EvaluationOutcome, error)
RunBenchmark executes the entire benchmark If Baseline is enabled, runs twice: skills-enabled and skills-disabled
type EventType ¶
type EventType string
EventType represents the type of progress event
const ( EventBenchmarkStart EventType = "benchmark_start" EventBenchmarkComplete EventType = "benchmark_complete" EventBenchmarkStopped EventType = "benchmark_stopped" EventTestStart EventType = "test_start" EventTestComplete EventType = "test_complete" EventTestCached EventType = "test_cached" EventRunStart EventType = "run_start" EventRunComplete EventType = "run_complete" EventAgentPrompt EventType = "agent_prompt" EventAgentResponse EventType = "agent_response" EventGraderResult EventType = "grader_result" )
EventType constants
type ProgressEvent ¶
type ProgressEvent struct {
EventType EventType
TestName string
TestNum int
TotalTests int
RunNum int
TotalRuns int
Status models.Status
DurationMs int64
Details map[string]any
}
ProgressEvent represents a progress update
type ProgressListener ¶
type ProgressListener func(event ProgressEvent)
ProgressListener receives progress updates
type RunnerOption ¶
type RunnerOption func(*EvalRunner)
RunnerOption configures a EvalRunner.
func WithSkipGraders ¶ added in v0.22.0
func WithSkipGraders() RunnerOption
WithSkipGraders disables grading so only execution occurs.
func WithTagFilters ¶
func WithTagFilters(patterns ...string) RunnerOption
func WithTaskFilters ¶
func WithTaskFilters(patterns ...string) RunnerOption
WithTaskFilters sets glob patterns used to filter test cases by DisplayName or TestID.
func WithUpdateSnapshots ¶
func WithUpdateSnapshots(enabled bool) RunnerOption
WithUpdateSnapshots enables snapshot file updates in diff graders.
type TestRunner
deprecated
type TestRunner = EvalRunner
Deprecated: Use EvalRunner instead.