bench

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package bench implements a LongMemEval-style evaluation harness for Yaad. It measures retrieval accuracy (R@K), MRR, and token efficiency.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type QA

type QA struct {
	Question        string
	ExpectedNodeID  string // ID of the node that should be retrieved
	ExpectedContent string // or match by content substring
}

QA is a single question-answer pair for evaluation.

func CodingBenchQAs

func CodingBenchQAs() []QA

CodingBenchQAs returns an extended set of 50 coding-specific QA pairs for more rigorous evaluation. Seed your DB with realistic coding memories first.

func DefaultQAs

func DefaultQAs() []QA

DefaultQAs returns a built-in set of memory QA pairs. Covers the same categories as LongMemEval: single-hop, multi-hop, temporal, preference.

type Result

type Result struct {
	Total     int
	HitAtK    map[int]int // hits at K=1,3,5,10
	MRR       float64
	AvgTokens float64
	Duration  time.Duration
}

Result holds evaluation metrics.

func Run

func Run(ctx context.Context, eng *engine.Engine, qas []QA, depth, limit int) *Result

Run evaluates retrieval accuracy on a set of QA pairs.

func (*Result) String

func (r *Result) String() string

String formats the result as a readable report.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL