bench

package

v0.5.0 Latest Latest Go to latest Published: May 11, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/GrayCodeAI/yaad

Links

Open Source Insights

Documentation ¶

Overview ¶

Package bench implements a LongMemEval-style evaluation harness for Yaad. It measures retrieval accuracy (R@K), MRR, and token efficiency.

Index ¶

type QA
- func CodingBenchQAs() []QA
- func DefaultQAs() []QA
type Result
- func Run(ctx context.Context, eng *engine.Engine, qas []QA, depth, limit int) *Result
- func (r *Result) String() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type QA ¶

type QA struct {
	Question        string
	ExpectedNodeID  string // ID of the node that should be retrieved
	ExpectedContent string // or match by content substring
}

QA is a single question-answer pair for evaluation.

func CodingBenchQAs ¶

func CodingBenchQAs() []QA

CodingBenchQAs returns an extended set of 50 coding-specific QA pairs for more rigorous evaluation. Seed your DB with realistic coding memories first.

func DefaultQAs ¶

func DefaultQAs() []QA

DefaultQAs returns a built-in set of memory QA pairs. Covers the same categories as LongMemEval: single-hop, multi-hop, temporal, preference.

type Result ¶

type Result struct {
	Total     int
	HitAtK    map[int]int // hits at K=1,3,5,10
	MRR       float64
	AvgTokens float64
	Duration  time.Duration
}

Result holds evaluation metrics.

func Run ¶

func Run(ctx context.Context, eng *engine.Engine, qas []QA, depth, limit int) *Result

Run evaluates retrieval accuracy on a set of QA pairs.

func (*Result) String ¶

func (r *Result) String() string

String formats the result as a readable report.

Source Files ¶

View all Source files

bench.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL