baseline

package
v0.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 12, 2026 License: MIT Imports: 2 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BaselineResult

type BaselineResult struct {
	TaskName    string               `json:"task_name"`
	Baseline    *models.RunResult    `json:"baseline"`
	WithSkill   *models.RunResult    `json:"with_skill"`
	Improvement float64              `json:"improvement"`
	Breakdown   ImprovementBreakdown `json:"improvement_breakdown"`
}

BaselineResult pairs a task's baseline (no skill) and skill-enabled results with computed improvement metrics.

func ComputeFromOutcomes

func ComputeFromOutcomes(withSkill, baseline *models.TestOutcome) *BaselineResult

ComputeFromOutcomes computes BaselineResults for paired TestOutcomes.

type ImprovementBreakdown

type ImprovementBreakdown struct {
	QualityDelta   float64 `json:"quality_delta"`
	TokenReduction float64 `json:"token_reduction"`
	TurnReduction  float64 `json:"turn_reduction"`
	TimeReduction  float64 `json:"time_reduction"`
	TaskCompletion float64 `json:"task_completion"`
}

ImprovementBreakdown captures per-dimension deltas between baseline and skill runs. Positive values mean the skill run was better; negative means worse.

func ComputeImprovement

func ComputeImprovement(baseline, withSkill *models.RunResult) (float64, ImprovementBreakdown)

ComputeImprovement calculates the overall improvement score and per-dimension breakdown between a baseline run (no skill) and a skill-enabled run. Returns a value in [-1, 1] where positive means the skill helped.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL