longmemeval

package

v0.10.0 Latest Latest Go to latest Published: Mar 22, 2026 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ajitpratap0/openclaw-cortex

Links

Open Source Insights

Documentation ¶

Overview ¶

Package longmemeval provides a synthetic LongMemEval-style benchmark dataset with 10 QA pairs testing temporal reasoning, multi-hop retrieval, and knowledge-update (superseded memory) scenarios.

LongMemEval specifically probes an agent's ability to recall information from long-horizon conversation histories, including cases where facts have changed over time.

Package longmemeval implements the LongMemEval benchmark harness.

Index ¶

func Run(ctx context.Context, client runner.Client, k int) (*runner.BenchmarkSummary, error)
type MemoryFact
type QAPair
- func Dataset() []QAPair

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Run ¶

func Run(ctx context.Context, client runner.Client, k int) (*runner.BenchmarkSummary, error)

Run ingests all synthetic LongMemEval facts and evaluates all QA pairs. It returns a BenchmarkSummary with individual and aggregate results.

For each QA pair:

Ingest all facts via client.Store.
Run Recall(question, k) to retrieve relevant memories.
Score: ExactMatch + TokenF1 + RecallAtK.

Types ¶

type MemoryFact ¶

type MemoryFact struct {
	Content string
	// DatasetValidFrom and DatasetValidTo are dataset metadata for human
	// readability only. The "Dataset" prefix is intentional: these fields are
	// NOT forwarded to the openclaw-cortex binary — the harness calls
	// client.Store(ctx, fact.Content) and ignores them entirely.
	// Temporal-versioning paths (valid_from/valid_to in the store, --supersedes,
	// SearchFilters.AsOf) are therefore out of scope for this harness; it measures
	// semantic retrieval only. See longmemeval/harness.go for the full rationale.
	DatasetValidFrom string // e.g. "2024-01" — dataset documentation only; NOT passed to binary
	DatasetValidTo   string // non-empty = superseded fact; NOT passed as --supersedes
}

MemoryFact is a pre-formed statement that gets ingested directly via Store (rather than a full conversation turn) to simulate a long conversation history.

type QAPair ¶

type QAPair struct {
	ID          string
	Facts       []MemoryFact // facts to ingest (in order) before querying
	Question    string
	GroundTruth string
	Category    string // "temporal" | "multi-hop" | "knowledge-update"
}

QAPair is a LongMemEval-style evaluation unit.

func Dataset ¶

func Dataset() []QAPair

Dataset returns the synthetic LongMemEval QA pairs.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL