Documentation
¶
Overview ¶
Package eval implements the CLASP Evaluation Framework (SDD-005).
Provides structured capability scoring for SOC agents across 6 dimensions with 5 maturity levels each. Supports automated scoring via LLM-as-judge and trend analysis via stored results.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SaveResult ¶
func SaveResult(dir string, result *EvalResult) error
SaveResult saves an eval result to the results directory.
Types ¶
type AgentProfile ¶
type AgentProfile struct {
AgentID string `json:"agent_id"`
Results []EvalResult `json:"results"`
Averages map[Dimension]float64 `json:"averages"`
OverallL int `json:"overall_l"`
EvalCount int `json:"eval_count"`
LastEvalAt time.Time `json:"last_eval_at"`
}
AgentProfile aggregates multiple EvalResults into a capability profile.
func (*AgentProfile) ComputeAverages ¶
func (p *AgentProfile) ComputeAverages()
ComputeAverages calculates per-dimension average scores across all results.
type EvalResult ¶
type EvalResult struct {
AgentID string `json:"agent_id"`
Timestamp time.Time `json:"timestamp"`
ScenarioID string `json:"scenario_id"`
Scores map[Dimension]Score `json:"scores"`
OverallL int `json:"overall_l"` // 1-5 aggregate
JudgeModel string `json:"judge_model,omitempty"`
}
EvalResult represents the outcome of evaluating an agent on a scenario.
func (*EvalResult) ComputeOverall ¶
func (r *EvalResult) ComputeOverall() int
ComputeOverall calculates the aggregate maturity level (average, rounded down).
type EvalScenario ¶
type EvalScenario struct {
ID string `json:"id"`
Name string `json:"name"`
Stage Stage `json:"stage"`
Description string `json:"description"`
Inputs []string `json:"inputs"`
Expected string `json:"expected"`
Dimensions []Dimension `json:"dimensions"` // Which dimensions this tests
}
EvalScenario defines a test scenario for agent evaluation.
func LoadScenarios ¶
func LoadScenarios(path string) ([]EvalScenario, error)
LoadScenarios loads eval scenarios from a JSON file.
type Regression ¶
type Regression struct {
Dimension Dimension `json:"dimension"`
Previous float64 `json:"previous"`
Current float64 `json:"current"`
Delta float64 `json:"delta"`
}
DetectRegression compares current profile to a previous one. Returns dimensions where the score dropped.
func DetectRegressions ¶
func DetectRegressions(previous, current *AgentProfile) []Regression