dataset

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package dataset owns the on-disk layout for memmy-eval datasets.

One dataset = one directory under MEMMY_EVAL_HOME (default ~/.local/share/memmy-eval/<name>/) containing:

manifest.json    — DatasetManifest written by ingest
corpus.sqlite    — chunks + embeddings (embedcache schema)
queries.sqlite   — labeled queries (queries package schema)
runs/<run_id>/   — per-run config + metrics + result rows

Index

Constants

View Source
const EnvHome = "MEMMY_EVAL_HOME"

EnvHome overrides the dataset root. Empty -> ~/.local/share/memmy-eval.

Variables

This section is empty.

Functions

func DefaultRoot

func DefaultRoot() (string, error)

DefaultRoot returns the directory where datasets live by default. Honors EnvHome before falling back to the XDG-conventional location.

Types

type Dataset

type Dataset struct {
	Name string
	Root string // the dataset directory (root/<name>)
}

Dataset is a handle to one dataset directory. It is a value type — no resources are held open, so there is no Close.

func Open

func Open(root, name string) (*Dataset, error)

Open creates or returns a handle to the named dataset under root. If root is empty, DefaultRoot() is used. The directory tree is created idempotently; an existing manifest is preserved.

func (*Dataset) CorpusDBPath

func (d *Dataset) CorpusDBPath() string

CorpusDBPath returns the absolute path to corpus.sqlite.

func (*Dataset) ManifestPath

func (d *Dataset) ManifestPath() string

ManifestPath returns the absolute path to manifest.json.

func (*Dataset) QueriesDBPath

func (d *Dataset) QueriesDBPath() string

QueriesDBPath returns the absolute path to queries.sqlite.

func (*Dataset) RunDir

func (d *Dataset) RunDir(runID string) (string, error)

RunDir returns runs/<runID>, creating it if missing.

func (*Dataset) RunsDir

func (d *Dataset) RunsDir() string

RunsDir returns the absolute path to the per-run output directory.

type Stats

type Stats struct {
	Name        string
	Root        string
	HasCorpus   bool
	HasQueries  bool
	ChunkCount  int
	QueryCount  int
	RunCount    int
	LastRunTime time.Time // zero when no runs
	UpdatedAt   time.Time // from manifest
}

Stats summarizes a dataset for the `ls` subcommand. Counts come from the on-disk JSON manifest plus best-effort directory walks.

func ListDatasets

func ListDatasets(root string) ([]Stats, error)

ListDatasets walks root and returns per-dataset stats. A dataset is "any direct child directory containing manifest.json." Counts that need a SQLite query are read from the manifest, not the db file, so this stays cheap on cold startup.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL