orchestration

package

v0.31.0 Latest Latest Go to latest Published: Apr 28, 2026 License: MIT Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/microsoft/waza

Links

Documentation ¶

Index ¶

func BuildDigest(testOutcomes []models.TestOutcome, durationMs int64, runsPerTest int) models.OutcomeDigest
func ComputeTestStats(runs []models.RunResult) *models.TestStats
func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error)
func RegradeOutcome(original *models.EvaluationOutcome, gradedOutcomes []models.TestOutcome, ...) *models.EvaluationOutcome
type EvalRunner
- func NewEvalRunner(cfg *config.EvalConfig, engine execution.AgentEngine, opts ...RunnerOption) *EvalRunner
- func NewTestRunner(cfg *config.EvalConfig, engine execution.AgentEngine, opts ...RunnerOption) *EvalRunnerdeprecated
- func (r *EvalRunner) OnProgress(listener ProgressListener)
- func (r *EvalRunner) RunBenchmark(ctx context.Context) (*models.EvaluationOutcome, error)
type EventType
type ProgressEvent
type ProgressListener
type RunnerOption
type TestRunnerdeprecated

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildDigest ¶ added in v0.22.0

func BuildDigest(testOutcomes []models.TestOutcome, durationMs int64, runsPerTest int) models.OutcomeDigest

BuildDigest computes an OutcomeDigest from test outcomes. durationMs is the total wall-clock duration to store in the digest. runsPerTest controls whether digest-level bootstrap CI is computed (requires > 1).

func ComputeTestStats ¶ added in v0.22.0

func ComputeTestStats(runs []models.RunResult) *models.TestStats

ComputeTestStats computes aggregate statistics for a set of run results.

func FilterTestCases ¶

func FilterTestCases(testCases []*models.TestCase, taskPatterns []string, tagPatterns []string) ([]*models.TestCase, error)

FilterTestCases returns the subset of testCases based on whether it matches tags or task display name, or task id glob patterns. - taskPatterns - matches either the task display name or the task ID. - tagPatterns - matches tags.

If taskPatterns and tagPatterns are specified the result is the intersection of the matches between them. If both taskPatterns and tagPatterns are empty, all test cases are returned.

func RegradeOutcome ¶ added in v0.22.0

func RegradeOutcome(original *models.EvaluationOutcome, gradedOutcomes []models.TestOutcome, judgeModel string) *models.EvaluationOutcome

RegradeOutcome produces a new EvaluationOutcome by replacing test outcomes in the original with the graded ones and recomputing stats and digest.

Types ¶

type EvalRunner ¶ added in v0.31.0

type EvalRunner struct {
	// contains filtered or unexported fields
}

EvalRunner orchestrates the execution of tests.

Deprecated alias: TestRunner is provided for backward compatibility.

func NewEvalRunner ¶ added in v0.31.0

func NewEvalRunner(cfg *config.EvalConfig, engine execution.AgentEngine, opts ...RunnerOption) *EvalRunner

NewEvalRunner creates a new test runner. The caller owns the engine and is responsible for initializing and shutting it down as needed.

func NewTestRunner deprecated

func NewTestRunner(cfg *config.EvalConfig, engine execution.AgentEngine, opts ...RunnerOption) *EvalRunner

Deprecated: Use NewEvalRunner instead.

func (*EvalRunner) OnProgress ¶ added in v0.31.0

func (r *EvalRunner) OnProgress(listener ProgressListener)

OnProgress registers a progress listener

func (*EvalRunner) RunBenchmark ¶ added in v0.31.0

func (r *EvalRunner) RunBenchmark(ctx context.Context) (*models.EvaluationOutcome, error)

RunBenchmark executes the entire benchmark If Baseline is enabled, runs twice: skills-enabled and skills-disabled

type EventType ¶

type EventType string

EventType represents the type of progress event

const (
	EventBenchmarkStart    EventType = "benchmark_start"
	EventBenchmarkComplete EventType = "benchmark_complete"
	EventBenchmarkStopped  EventType = "benchmark_stopped"
	EventTestStart         EventType = "test_start"
	EventTestComplete      EventType = "test_complete"
	EventTestCached        EventType = "test_cached"
	EventRunStart          EventType = "run_start"
	EventRunComplete       EventType = "run_complete"
	EventAgentPrompt       EventType = "agent_prompt"
	EventAgentResponse     EventType = "agent_response"
	EventGraderResult      EventType = "grader_result"
)

EventType constants

type ProgressEvent ¶

type ProgressEvent struct {
	EventType  EventType
	TestName   string
	TestNum    int
	TotalTests int
	RunNum     int
	TotalRuns  int
	Status     models.Status
	DurationMs int64
	Details    map[string]any
}

ProgressEvent represents a progress update

type ProgressListener ¶

type ProgressListener func(event ProgressEvent)

ProgressListener receives progress updates

type RunnerOption ¶

type RunnerOption func(*EvalRunner)

RunnerOption configures a EvalRunner.

func WithCache ¶

func WithCache(c *cache.Cache) RunnerOption

WithCache enables result caching

func WithSkipGraders ¶ added in v0.22.0

func WithSkipGraders() RunnerOption

WithSkipGraders disables grading so only execution occurs.

func WithTagFilters ¶

func WithTagFilters(patterns ...string) RunnerOption

func WithTaskFilters ¶

func WithTaskFilters(patterns ...string) RunnerOption

WithTaskFilters sets glob patterns used to filter test cases by DisplayName or TestID.

func WithUpdateSnapshots ¶

func WithUpdateSnapshots(enabled bool) RunnerOption

WithUpdateSnapshots enables snapshot file updates in diff graders.

type TestRunner deprecated

type TestRunner = EvalRunner

Deprecated: Use EvalRunner instead.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL