Documentation
¶
Overview ¶
Package longmemeval provides a synthetic LongMemEval-style benchmark dataset with 10 QA pairs testing temporal reasoning, multi-hop retrieval, and knowledge-update (superseded memory) scenarios.
LongMemEval specifically probes an agent's ability to recall information from long-horizon conversation histories, including cases where facts have changed over time.
Package longmemeval implements the LongMemEval benchmark harness.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Run ¶
Run ingests all synthetic LongMemEval facts and evaluates all QA pairs. It returns a BenchmarkSummary with individual and aggregate results.
For each QA pair:
- Ingest all facts via client.Store.
- Run Recall(question, k) to retrieve relevant memories.
- Score: ExactMatch + TokenF1 + RecallAtK.
Types ¶
type MemoryFact ¶
type MemoryFact struct {
Content string
// DatasetValidFrom and DatasetValidTo are dataset metadata for human
// readability only. The "Dataset" prefix is intentional: these fields are
// NOT forwarded to the openclaw-cortex binary — the harness calls
// client.Store(ctx, fact.Content) and ignores them entirely.
// Temporal-versioning paths (valid_from/valid_to in the store, --supersedes,
// SearchFilters.AsOf) are therefore out of scope for this harness; it measures
// semantic retrieval only. See longmemeval/harness.go for the full rationale.
DatasetValidFrom string // e.g. "2024-01" — dataset documentation only; NOT passed to binary
DatasetValidTo string // non-empty = superseded fact; NOT passed as --supersedes
}
MemoryFact is a pre-formed statement that gets ingested directly via Store (rather than a full conversation turn) to simulate a long conversation history.