Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FilterTestCases ¶
func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error)
FilterTestCases returns the subset of testCases based on whether it matches tags or task display name, or task id glob patterns. - taskPatterns - matches either the task display name or the task ID. - tagPatterns - matches tags.
If taskPatterns and tagPatterns are specified the result is the intersection of the matches between them. If both taskPatterns and tagPatterns are empty, all test cases are returned.
Types ¶
type EventType ¶
type EventType string
EventType represents the type of progress event
const ( EventBenchmarkStart EventType = "benchmark_start" EventBenchmarkComplete EventType = "benchmark_complete" EventBenchmarkStopped EventType = "benchmark_stopped" EventTestStart EventType = "test_start" EventTestComplete EventType = "test_complete" EventTestCached EventType = "test_cached" EventRunStart EventType = "run_start" EventRunComplete EventType = "run_complete" EventAgentPrompt EventType = "agent_prompt" EventAgentResponse EventType = "agent_response" EventGraderResult EventType = "grader_result" )
EventType constants
type ProgressEvent ¶
type ProgressEvent struct {
EventType EventType
TestName string
TestNum int
TotalTests int
RunNum int
TotalRuns int
Status models.Status
DurationMs int64
Details map[string]any
}
ProgressEvent represents a progress update
type ProgressListener ¶
type ProgressListener func(event ProgressEvent)
ProgressListener receives progress updates
type RunnerOption ¶
type RunnerOption func(*TestRunner)
RunnerOption configures a TestRunner.
func WithTagFilters ¶
func WithTagFilters(patterns ...string) RunnerOption
func WithTaskFilters ¶
func WithTaskFilters(patterns ...string) RunnerOption
WithTaskFilters sets glob patterns used to filter test cases by DisplayName or TestID.
func WithUpdateSnapshots ¶
func WithUpdateSnapshots(enabled bool) RunnerOption
WithUpdateSnapshots enables snapshot file updates in diff graders.
type TestRunner ¶
type TestRunner struct {
// contains filtered or unexported fields
}
TestRunner orchestrates the execution of tests
func NewTestRunner ¶
func NewTestRunner(cfg *config.BenchmarkConfig, engine execution.AgentEngine, opts ...RunnerOption) *TestRunner
NewTestRunner creates a new test runner. The caller owns the engine and is responsible for initializing and shutting it down as needed.
func (*TestRunner) OnProgress ¶
func (r *TestRunner) OnProgress(listener ProgressListener)
OnProgress registers a progress listener
func (*TestRunner) RunBenchmark ¶
func (r *TestRunner) RunBenchmark(ctx context.Context) (*models.EvaluationOutcome, error)
RunBenchmark executes the entire benchmark If Baseline is enabled, runs twice: skills-enabled and skills-disabled