queries

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2026 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

Package queries owns the labeled-query side of the eval framework.

A LabeledQuery couples a query string with the gold node IDs (or turn UUIDs) that should rank highly when the query runs against the replayed memmy store. Labels come either from a Generator (cheap rule-based for tests, LLM for production) or from a Judge that scores returned candidates after the fact.

The package ships pluggable Generator / Judge interfaces and Fake implementations that work without network access. Real Gemini-backed implementations are wired in cmd/memmy-eval to keep this package import-graph clean for unit tests.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func QueryID

func QueryID(text string, cat Category) string

QueryID returns the stable hash used as primary key.

Types

type Candidate

type Candidate struct {
	NodeID   string
	TurnUUID string
	Text     string
}

Candidate is a single returned chunk handed to the Judge.

type Category

type Category string

Category labels the kind of query for per-bucket metric reporting. Generators produce queries grouped by category and downstream metric aggregation slices results by category.

const (
	// CategoryParaphrase: lightly reworded version of a known turn —
	// the gold turn should rank top-1.
	CategoryParaphrase Category = "paraphrase"
	// CategoryNegation: query that should NOT match the named turn.
	CategoryNegation Category = "negation"
	// CategoryTopicJump: query targets a topic only one specific turn covered.
	CategoryTopicJump Category = "topic-jump"
	// CategoryDistractor: query similar in surface form but topically distinct.
	CategoryDistractor Category = "distractor"
	// CategoryStaleRelevant: query targets an old turn — tests decay vs relevance.
	CategoryStaleRelevant Category = "stale-relevant"
	// CategoryTemporal: query references a time window ("yesterday").
	CategoryTemporal Category = "temporal"
)

func AllCategories

func AllCategories() []Category

AllCategories returns every supported category in declaration order.

type FakeGenerator

type FakeGenerator struct{}

FakeGenerator emits deterministic per-turn queries for tests. It does NOT use any external service. Output:

  • paraphrase: the first sentence of the turn, prefixed with "about:"
  • distractor: a phrase guaranteed not to appear in the corpus

func NewFakeGenerator

func NewFakeGenerator() *FakeGenerator

NewFakeGenerator returns the deterministic test Generator.

func (FakeGenerator) Generate

Generate satisfies Generator.

func (FakeGenerator) Version

func (FakeGenerator) Version() string

Version satisfies Generator.

type FakeJudge

type FakeJudge struct{}

FakeJudge declares any candidate whose text shares a non-trivial token with the query "relevant" (score 1.0); everything else 0.0.

func NewFakeJudge

func NewFakeJudge() *FakeJudge

NewFakeJudge returns the deterministic test Judge.

func (FakeJudge) Judge

func (FakeJudge) Judge(_ context.Context, q LabeledQuery, cands []Candidate) ([]Verdict, error)

Judge satisfies Judge.

func (FakeJudge) Version

func (FakeJudge) Version() string

Version satisfies Judge.

type GenerateRequest

type GenerateRequest struct {
	Categories []Category
	TargetN    int // per category
}

GenerateRequest configures one Generator.Generate call.

type Generator

type Generator interface {
	// Version is the generator-version field used as part of the
	// dedup key. Bump when the prompting strategy changes.
	Version() string
	// Generate returns up to req.TargetN queries per category in
	// req.Categories, drawn from `turns`.
	Generate(ctx context.Context, turns []corpus.StoredTurn, req GenerateRequest) ([]LabeledQuery, error)
}

Generator turns a corpus into a labeled query set. Implementations must be deterministic given the same (corpus, request) pair — the dedup key in the queries store assumes identical re-runs produce identical outputs.

type Judge

type Judge interface {
	Version() string
	Judge(ctx context.Context, q LabeledQuery, candidates []Candidate) ([]Verdict, error)
}

Judge scores a (query, candidate) pair as relevant or not. Used to expand gold labels after a run by asking an LLM "did the candidate actually answer this query?" Returned scores are in [0, 1].

type LabeledQuery

type LabeledQuery struct {
	ID            string
	Category      Category
	Text          string
	GoldTurnUUIDs []string
	Notes         string
	GeneratedAt   time.Time
}

LabeledQuery is the unit a Generator produces and a query battery consumes. GoldTurnUUIDs are the corpus turn UUIDs whose chunks should be considered "correct hits"; metrics map them to memmy node IDs at scoring time via the source turn UUID stored in node text.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store is the per-dataset queries.sqlite handle.

func OpenStore

func OpenStore(path string) (*Store, error)

OpenStore opens or creates the queries database at path.

func (*Store) All

func (s *Store) All(ctx context.Context) ([]LabeledQuery, error)

All returns every labeled query in storage order (by category then ID).

func (*Store) ByCategory

func (s *Store) ByCategory(ctx context.Context, c Category) ([]LabeledQuery, error)

ByCategory returns labeled queries filtered to one category.

func (*Store) Close

func (s *Store) Close() error

Close releases the handle.

func (*Store) Count

func (s *Store) Count(ctx context.Context) (int, error)

Count returns the total number of queries.

func (*Store) CountForGeneration

func (s *Store) CountForGeneration(ctx context.Context, generatorVersion, corpusSnapshotHash string, category Category) (int, error)

CountForGeneration returns the number of queries already stored that match the (generator_version, corpus_snapshot_hash, category) tuple. Used by the queries subcommand to decide whether to re-run the generator at all.

func (*Store) Embedding

func (s *Store) Embedding(ctx context.Context, queryID string, dim int) ([]float32, bool, error)

Embedding returns the stored embedding, or (nil, false) when none.

func (*Store) Put

func (s *Store) Put(ctx context.Context, q LabeledQuery, generatorVersion, corpusSnapshotHash string) error

Put inserts a labeled query. Idempotent: same ID is a no-op (preserves the original generated_at and embedding so re-running the generator after data appears does not blow away the cached vector).

func (*Store) PutEmbedding

func (s *Store) PutEmbedding(ctx context.Context, queryID string, vec []float32) error

PutEmbedding stores the query embedding. Idempotent.

type Verdict

type Verdict struct {
	NodeID string
	Score  float64 // 0..1, higher = more relevant
	Reason string
}

Verdict is the Judge's assessment of one candidate.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL