Documentation
¶
Overview ¶
Package queries owns the labeled-query side of the eval framework.
A LabeledQuery couples a query string with the gold node IDs (or turn UUIDs) that should rank highly when the query runs against the replayed memmy store. Labels come either from a Generator (cheap rule-based for tests, LLM for production) or from a Judge that scores returned candidates after the fact.
The package ships pluggable Generator / Judge interfaces and Fake implementations that work without network access. Real Gemini-backed implementations are wired in cmd/memmy-eval to keep this package import-graph clean for unit tests.
Index ¶
- func QueryID(text string, cat Category) string
- type Candidate
- type Category
- type FakeGenerator
- type FakeJudge
- type GenerateRequest
- type Generator
- type Judge
- type LabeledQuery
- type Store
- func (s *Store) All(ctx context.Context) ([]LabeledQuery, error)
- func (s *Store) ByCategory(ctx context.Context, c Category) ([]LabeledQuery, error)
- func (s *Store) Close() error
- func (s *Store) Count(ctx context.Context) (int, error)
- func (s *Store) CountForGeneration(ctx context.Context, generatorVersion, corpusSnapshotHash string, ...) (int, error)
- func (s *Store) Embedding(ctx context.Context, queryID string, dim int) ([]float32, bool, error)
- func (s *Store) Put(ctx context.Context, q LabeledQuery, ...) error
- func (s *Store) PutEmbedding(ctx context.Context, queryID string, vec []float32) error
- type Verdict
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Category ¶
type Category string
Category labels the kind of query for per-bucket metric reporting. Generators produce queries grouped by category and downstream metric aggregation slices results by category.
const ( // CategoryParaphrase: lightly reworded version of a known turn — // the gold turn should rank top-1. CategoryParaphrase Category = "paraphrase" // CategoryNegation: query that should NOT match the named turn. CategoryNegation Category = "negation" // CategoryTopicJump: query targets a topic only one specific turn covered. CategoryTopicJump Category = "topic-jump" // CategoryDistractor: query similar in surface form but topically distinct. CategoryDistractor Category = "distractor" // CategoryStaleRelevant: query targets an old turn — tests decay vs relevance. CategoryStaleRelevant Category = "stale-relevant" // CategoryTemporal: query references a time window ("yesterday"). CategoryTemporal Category = "temporal" )
func AllCategories ¶
func AllCategories() []Category
AllCategories returns every supported category in declaration order.
type FakeGenerator ¶
type FakeGenerator struct{}
FakeGenerator emits deterministic per-turn queries for tests. It does NOT use any external service. Output:
- paraphrase: the first sentence of the turn, prefixed with "about:"
- distractor: a phrase guaranteed not to appear in the corpus
func NewFakeGenerator ¶
func NewFakeGenerator() *FakeGenerator
NewFakeGenerator returns the deterministic test Generator.
func (FakeGenerator) Generate ¶
func (FakeGenerator) Generate(_ context.Context, turns []corpus.StoredTurn, req GenerateRequest) ([]LabeledQuery, error)
Generate satisfies Generator.
type FakeJudge ¶
type FakeJudge struct{}
FakeJudge declares any candidate whose text shares a non-trivial token with the query "relevant" (score 1.0); everything else 0.0.
func NewFakeJudge ¶
func NewFakeJudge() *FakeJudge
NewFakeJudge returns the deterministic test Judge.
type GenerateRequest ¶
GenerateRequest configures one Generator.Generate call.
type Generator ¶
type Generator interface {
// Version is the generator-version field used as part of the
// dedup key. Bump when the prompting strategy changes.
Version() string
// Generate returns up to req.TargetN queries per category in
// req.Categories, drawn from `turns`.
Generate(ctx context.Context, turns []corpus.StoredTurn, req GenerateRequest) ([]LabeledQuery, error)
}
Generator turns a corpus into a labeled query set. Implementations must be deterministic given the same (corpus, request) pair — the dedup key in the queries store assumes identical re-runs produce identical outputs.
type Judge ¶
type Judge interface {
Version() string
Judge(ctx context.Context, q LabeledQuery, candidates []Candidate) ([]Verdict, error)
}
Judge scores a (query, candidate) pair as relevant or not. Used to expand gold labels after a run by asking an LLM "did the candidate actually answer this query?" Returned scores are in [0, 1].
type LabeledQuery ¶
type LabeledQuery struct {
ID string
Category Category
Text string
GoldTurnUUIDs []string
Notes string
GeneratedAt time.Time
}
LabeledQuery is the unit a Generator produces and a query battery consumes. GoldTurnUUIDs are the corpus turn UUIDs whose chunks should be considered "correct hits"; metrics map them to memmy node IDs at scoring time via the source turn UUID stored in node text.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store is the per-dataset queries.sqlite handle.
func (*Store) All ¶
func (s *Store) All(ctx context.Context) ([]LabeledQuery, error)
All returns every labeled query in storage order (by category then ID).
func (*Store) ByCategory ¶
ByCategory returns labeled queries filtered to one category.
func (*Store) CountForGeneration ¶
func (s *Store) CountForGeneration(ctx context.Context, generatorVersion, corpusSnapshotHash string, category Category) (int, error)
CountForGeneration returns the number of queries already stored that match the (generator_version, corpus_snapshot_hash, category) tuple. Used by the queries subcommand to decide whether to re-run the generator at all.
func (*Store) Put ¶
func (s *Store) Put(ctx context.Context, q LabeledQuery, generatorVersion, corpusSnapshotHash string) error
Put inserts a labeled query. Idempotent: same ID is a no-op (preserves the original generated_at and embedding so re-running the generator after data appears does not blow away the cached vector).