Documentation
¶
Overview ¶
Package metrics computes the IR + dynamics scores the eval harness reports per run. Two slices: standard ranking quality (recall@k, MRR, nDCG) and memmy-specific dynamics (per-query weight delta on the top-K hits — proxy for "did Recall reinforce the right nodes").
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Category ¶
type Category struct {
N int `json:"n"`
RecallAt1 float64 `json:"recall_at_1"`
RecallAt3 float64 `json:"recall_at_3"`
RecallAt5 float64 `json:"recall_at_5"`
RecallAt8 float64 `json:"recall_at_8"`
MRR float64 `json:"mrr"`
NDCG float64 `json:"ndcg"`
ReinforcementMean float64 `json:"reinforcement_mean"`
}
Category bundles per-category aggregates. (Distinct from queries.Category which is a string label; this is the row.)
type QueryRow ¶
type QueryRow struct {
QueryID string `json:"query_id"`
Category string `json:"category"`
Text string `json:"text"`
GoldTurnUUIDs []string `json:"gold_turn_uuids"`
HitNodeIDs []string `json:"hit_node_ids"`
HitGoldFlags []bool `json:"hit_gold_flags"`
HitScores []float64 `json:"hit_scores"`
RecallAt1 float64 `json:"recall_at_1"`
RecallAt3 float64 `json:"recall_at_3"`
RecallAt5 float64 `json:"recall_at_5"`
RecallAt8 float64 `json:"recall_at_8"`
MRR float64 `json:"mrr"`
NDCG float64 `json:"ndcg"`
ReinforcementSum float64 `json:"reinforcement_sum"`
ReinforcementMax float64 `json:"reinforcement_max"`
StartedAtUnixMS int64 `json:"started_at_unix_ms"`
FinishedAtUnixMS int64 `json:"finished_at_unix_ms"`
Error string `json:"error,omitempty"`
}
QueryRow is one row in queries.jsonl — flat for easy plotting.
type Summary ¶
type Summary struct {
RunID string `json:"run_id"`
DatasetName string `json:"dataset_name"`
QueriesExecuted int `json:"queries_executed"`
OverallRecallAt1 float64 `json:"overall_recall_at_1"`
OverallRecallAt3 float64 `json:"overall_recall_at_3"`
OverallRecallAt5 float64 `json:"overall_recall_at_5"`
OverallRecallAt8 float64 `json:"overall_recall_at_8"`
OverallMRR float64 `json:"overall_mrr"`
OverallNDCG float64 `json:"overall_ndcg"`
OverallReinforcementMean float64 `json:"overall_reinforcement_mean"`
PerCategory map[string]Category `json:"per_category"`
GeneratedAt time.Time `json:"generated_at"`
}
Summary is the per-run aggregate written to summary.json.
Click to show internal directories.
Click to hide internal directories.