metrics

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package metrics computes the IR + dynamics scores the eval harness reports per run. Two slices: standard ranking quality (recall@k, MRR, nDCG) and memmy-specific dynamics (per-query weight delta on the top-K hits — proxy for "did Recall reinforce the right nodes").

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WriteRun

func WriteRun(outDir string, rows []QueryRow, summary Summary) error

WriteRun writes queries.jsonl + summary.json to outDir.

Types

type Category

type Category struct {
	N                 int     `json:"n"`
	RecallAt1         float64 `json:"recall_at_1"`
	RecallAt3         float64 `json:"recall_at_3"`
	RecallAt5         float64 `json:"recall_at_5"`
	RecallAt8         float64 `json:"recall_at_8"`
	MRR               float64 `json:"mrr"`
	NDCG              float64 `json:"ndcg"`
	ReinforcementMean float64 `json:"reinforcement_mean"`
}

Category bundles per-category aggregates. (Distinct from queries.Category which is a string label; this is the row.)

type QueryRow

type QueryRow struct {
	QueryID          string    `json:"query_id"`
	Category         string    `json:"category"`
	Text             string    `json:"text"`
	GoldTurnUUIDs    []string  `json:"gold_turn_uuids"`
	HitNodeIDs       []string  `json:"hit_node_ids"`
	HitGoldFlags     []bool    `json:"hit_gold_flags"`
	HitScores        []float64 `json:"hit_scores"`
	RecallAt1        float64   `json:"recall_at_1"`
	RecallAt3        float64   `json:"recall_at_3"`
	RecallAt5        float64   `json:"recall_at_5"`
	RecallAt8        float64   `json:"recall_at_8"`
	MRR              float64   `json:"mrr"`
	NDCG             float64   `json:"ndcg"`
	ReinforcementSum float64   `json:"reinforcement_sum"`
	ReinforcementMax float64   `json:"reinforcement_max"`
	StartedAtUnixMS  int64     `json:"started_at_unix_ms"`
	FinishedAtUnixMS int64     `json:"finished_at_unix_ms"`
	Error            string    `json:"error,omitempty"`
}

QueryRow is one row in queries.jsonl — flat for easy plotting.

func Compute

func Compute(qr harness.QueryResult, turnUUIDForNode func(string) string) QueryRow

Compute scores one query's hits against its gold labels and the pre/post node-state pair captured by the harness. Returns the flat row for queries.jsonl.

type Summary

type Summary struct {
	RunID                    string              `json:"run_id"`
	DatasetName              string              `json:"dataset_name"`
	QueriesExecuted          int                 `json:"queries_executed"`
	OverallRecallAt1         float64             `json:"overall_recall_at_1"`
	OverallRecallAt3         float64             `json:"overall_recall_at_3"`
	OverallRecallAt5         float64             `json:"overall_recall_at_5"`
	OverallRecallAt8         float64             `json:"overall_recall_at_8"`
	OverallMRR               float64             `json:"overall_mrr"`
	OverallNDCG              float64             `json:"overall_ndcg"`
	OverallReinforcementMean float64             `json:"overall_reinforcement_mean"`
	PerCategory              map[string]Category `json:"per_category"`
	GeneratedAt              time.Time           `json:"generated_at"`
}

Summary is the per-run aggregate written to summary.json.

func Aggregate

func Aggregate(runID, datasetName string, rows []QueryRow) Summary

Aggregate summarises a slice of QueryRows into a Summary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL