Documentation
¶
Overview ¶
Package dataset owns the on-disk layout for memmy-eval datasets.
One dataset = one directory under MEMMY_EVAL_HOME (default ~/.local/share/memmy-eval/<name>/) containing:
manifest.json — DatasetManifest written by ingest corpus.sqlite — chunks + embeddings (embedcache schema) queries.sqlite — labeled queries (queries package schema) runs/<run_id>/ — per-run config + metrics + result rows
Index ¶
Constants ¶
const EnvHome = "MEMMY_EVAL_HOME"
EnvHome overrides the dataset root. Empty -> ~/.local/share/memmy-eval.
Variables ¶
This section is empty.
Functions ¶
func DefaultRoot ¶
DefaultRoot returns the directory where datasets live by default. Honors EnvHome before falling back to the XDG-conventional location.
Types ¶
type Dataset ¶
Dataset is a handle to one dataset directory. It is a value type — no resources are held open, so there is no Close.
func Open ¶
Open creates or returns a handle to the named dataset under root. If root is empty, DefaultRoot() is used. The directory tree is created idempotently; an existing manifest is preserved.
func (*Dataset) CorpusDBPath ¶
CorpusDBPath returns the absolute path to corpus.sqlite.
func (*Dataset) ManifestPath ¶
ManifestPath returns the absolute path to manifest.json.
func (*Dataset) QueriesDBPath ¶
QueriesDBPath returns the absolute path to queries.sqlite.
type Stats ¶
type Stats struct {
Name string
Root string
HasCorpus bool
HasQueries bool
ChunkCount int
QueryCount int
RunCount int
LastRunTime time.Time // zero when no runs
UpdatedAt time.Time // from manifest
}
Stats summarizes a dataset for the `ls` subcommand. Counts come from the on-disk JSON manifest plus best-effort directory walks.
func ListDatasets ¶
ListDatasets walks root and returns per-dataset stats. A dataset is "any direct child directory containing manifest.json." Counts that need a SQLite query are read from the manifest, not the db file, so this stays cheap on cold startup.