Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BaselineResult ¶
type BaselineResult struct {
TaskName string `json:"task_name"`
Baseline *models.RunResult `json:"baseline"`
WithSkill *models.RunResult `json:"with_skill"`
Improvement float64 `json:"improvement"`
Breakdown ImprovementBreakdown `json:"improvement_breakdown"`
}
BaselineResult pairs a task's baseline (no skill) and skill-enabled results with computed improvement metrics.
func ComputeFromOutcomes ¶
func ComputeFromOutcomes(withSkill, baseline *models.TestOutcome) *BaselineResult
ComputeFromOutcomes computes BaselineResults for paired TestOutcomes.
type ImprovementBreakdown ¶
type ImprovementBreakdown struct {
QualityDelta float64 `json:"quality_delta"`
TokenReduction float64 `json:"token_reduction"`
TurnReduction float64 `json:"turn_reduction"`
TimeReduction float64 `json:"time_reduction"`
TaskCompletion float64 `json:"task_completion"`
}
ImprovementBreakdown captures per-dimension deltas between baseline and skill runs. Positive values mean the skill run was better; negative means worse.
func ComputeImprovement ¶
func ComputeImprovement(baseline, withSkill *models.RunResult) (float64, ImprovementBreakdown)
ComputeImprovement calculates the overall improvement score and per-dimension breakdown between a baseline run (no skill) and a skill-enabled run. Returns a value in [-1, 1] where positive means the skill helped.
Click to show internal directories.
Click to hide internal directories.