Documentation
¶
Overview ¶
Package embed provides a local semantic-search battery for GoFastr.
The package is built around a single per-app Index that stores [Chunk]s with vector embeddings and serves [Query]s via brute-force cosine similarity, with optional hybrid keyword fusion, metadata filtering, MMR diversity, and a pluggable rerank hook.
Components are intentionally separated so users can swap parts:
- Embedder turns text into vectors. The default is an ONNX-backed all-MiniLM-L6-v2 (added in M1.5); a deterministic stub (NewStubEmbedder) ships for tests and offline development.
- Chunker splits a Document into [Chunk]s. The default FixedWindow is language-agnostic and tokenizer-free.
- Store holds vectors and metadata. The default FlatStore keeps everything in memory and persists to disk in M2.
See battery/embed/README.md for the architecture, retrieval pipeline, and milestone plan.
Index ¶
- func Handler(idx Index) http.Handler
- type Chunk
- type Chunker
- type Document
- type Embedder
- type Filter
- type FixedWindow
- type FlatStore
- func (s *FlatStore) Add(_ context.Context, chunks []Chunk) error
- func (s *FlatStore) AllChunks() []Chunk
- func (s *FlatStore) Candidates(_ context.Context, qv []float32, filter Filter, top int) ([]Hit, error)
- func (s *FlatStore) ChunkByID(id string) (Chunk, bool)
- func (s *FlatStore) ChunkIDsForDoc(docID string) []string
- func (s *FlatStore) LoadSnapshot(path string) error
- func (s *FlatStore) RemoveDoc(_ context.Context, docID string) error
- func (s *FlatStore) Snapshot(path string) error
- func (s *FlatStore) Stats() Stats
- type Hit
- type Index
- type KeywordBackend
- type KeywordHit
- type LangAware
- type MemoryKeyword
- type ModelMismatchError
- type OllamaConfig
- type OllamaEmbedder
- type Options
- type Plugin
- type Query
- type Reranker
- type Stats
- type Store
- type StubEmbedder
- type WatchOptions
- type Watcher
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Handler ¶
Handler returns an http.Handler that exposes the index over HTTP. The handler is framework-agnostic; it can be mounted under any prefix on any router or http.ServeMux. The plugin in Plugin mounts it under "/embed" on a GoFastr framework.App.
Security contract:
- Every route requires a non-empty Authorization header. The handler does not validate the credential itself — that is the caller's job when the handler is mounted behind an auth middleware. Rejecting anonymous traffic at the handler is a defense-in-depth measure so an unprotected mount cannot accidentally expose the index.
- POST routes require Content-Type: application/json (415 otherwise).
- Request bodies are capped at 1 MiB (413 otherwise).
- Upstream / driver errors are NEVER echoed back to the client; the handler returns a generic "internal error" string instead.
Routes:
- POST /index body: {"documents": [...]} → {"added": N}
- POST /query body: Query → {"hits": [...]}
- GET /stats → Stats
- DELETE /doc/{id} (or query param ?id=) → 204
Types ¶
type Chunk ¶
type Chunk struct {
ID string `json:"id"`
DocID string `json:"doc_id"`
Source string `json:"source,omitempty"`
Text string `json:"text"`
Offset [2]int `json:"offset"`
Metadata map[string]any `json:"metadata,omitempty"`
Vec []float32 `json:"-"`
}
Chunk is a stored, embedded slice of a Document. Vec is owned by the store and is L2-normalized so cosine similarity reduces to a dot product.
type Chunker ¶
Chunker splits a Document into [Chunk]s. Implementations MUST produce chunks with stable IDs derived from the document and chunk offset so re-indexing the same content does not duplicate entries.
type Document ¶
type Document struct {
ID string `json:"id"`
Source string `json:"source,omitempty"`
Text string `json:"text"`
Metadata map[string]any `json:"metadata,omitempty"`
}
Document is a unit of input to the index. A single Document is split by a Chunker into one or more [Chunk]s that are individually embedded and stored.
type Embedder ¶
type Embedder interface {
Embed(ctx context.Context, texts []string) ([][]float32, error)
Dim() int
Name() string
}
Embedder turns text into fixed-dimension vectors. Implementations MUST produce L2-normalized output so the store can rely on dot product as cosine similarity.
type Filter ¶
type Filter struct {
Source string `json:"source,omitempty"`
Kind string `json:"kind,omitempty"`
MetaMatch map[string]any `json:"meta_match,omitempty"`
}
Filter restricts the candidate set before scoring. Empty fields are ignored. MetaMatch entries are checked for exact equality against the chunk's Metadata.
type FixedWindow ¶
type FixedWindow struct {
Size int // window size, in runes
Overlap int // overlap between consecutive windows, in runes
}
FixedWindow is a language-agnostic chunker that splits text into fixed-size rune windows with overlap. It does not look at token boundaries — that's the embedder's concern — so chunks are always reproducible from byte offsets alone.
func NewFixedWindow ¶
func NewFixedWindow(size, overlap int) *FixedWindow
NewFixedWindow validates size/overlap and returns a chunker. Overlap is clamped to [0, size-1].
type FlatStore ¶
type FlatStore struct {
// contains filtered or unexported fields
}
FlatStore is the default in-memory Store: a slice of chunks scored by brute-force cosine similarity. It targets up to ~100k chunks; for 384-dim float32 vectors that's roughly 150MB plus chunk text, with query latency under ~30ms on a modern CPU.
func NewFlatStore ¶
NewFlatStore returns an empty store sized for vectors of dimension dim produced by the named model. dim and model are recorded so future persistence (M2) can refuse to load a snapshot embedded with a different model.
func (*FlatStore) Add ¶
Add appends chunks. Vectors are L2-normalized in place so query-time scoring is a plain dot product. When the store was constructed with dim=0 (embedder dim unknown at boot time), the first non-empty vec fixes the dimension for the lifetime of the store.
func (*FlatStore) AllChunks ¶
AllChunks returns a copy of every stored chunk. Part of chunkLister; used to rebuild the keyword index after a snapshot load (snapshots don't persist keyword state).
func (*FlatStore) Candidates ¶
func (s *FlatStore) Candidates(_ context.Context, qv []float32, filter Filter, top int) ([]Hit, error)
Candidates returns up to top chunks by cosine similarity to qv, applying filter to the candidate set first.
func (*FlatStore) ChunkByID ¶
ChunkByID returns the chunk with the given ID. Part of chunkLister; used to hydrate keyword-only hits (which carry just a chunk ID) into full Hits.
func (*FlatStore) ChunkIDsForDoc ¶
ChunkIDsForDoc returns the IDs of every chunk belonging to docID. Part of the chunkLister capability the hybrid/keyword index uses to purge a doc's keyword entries when the doc is removed.
func (*FlatStore) LoadSnapshot ¶
LoadSnapshot replaces the store's contents with the snapshot at path. If the snapshot was produced by a different embedder model or dimension, [ErrModelMismatch] is returned and the store is left untouched.
func (*FlatStore) RemoveDoc ¶
RemoveDoc deletes every chunk belonging to docID. It performs a stable, in-place compaction so retained chunks keep their relative order — useful when persistence writes the underlying slice in insertion order.
type Hit ¶
type Hit struct {
Chunk Chunk `json:"chunk"`
Score float64 `json:"score"`
Reason string `json:"reason,omitempty"`
}
Hit is a single retrieval result. Reason describes which stage of the pipeline produced the score ("vec", "kw", "hybrid", "mmr", "rerank") so callers can debug ranking behaviour.
type Index ¶
type Index interface {
Add(ctx context.Context, docs ...Document) error
Remove(ctx context.Context, docIDs ...string) error
Query(ctx context.Context, q Query) ([]Hit, error)
// Snapshot persists the current state to the path configured via
// [Options.Path]. It is a no-op if no path was configured.
Snapshot() error
Stats() Stats
Close() error
}
Index is the public handle returned by Open. It composes a Chunker, Embedder and Store into the read/write API GoFastr apps consume.
type KeywordBackend ¶
type KeywordBackend interface {
Index(ctx context.Context, id, text string) error
Delete(ctx context.Context, id string) error
// Search returns up to top results as (chunkID, score) pairs.
Search(ctx context.Context, text string, top int) ([]KeywordHit, error)
}
KeywordBackend is the subset of the battery/search backend the embed package needs for hybrid retrieval. It is defined here so the embed package does not import battery/search in its core path; the real wiring happens in hybrid_search.go via an adapter.
func WrapSearchBackend ¶
func WrapSearchBackend(b search.Backend) KeywordBackend
WrapSearchBackend adapts a battery/search.Backend to the KeywordBackend interface this package expects. It is intentionally thin: every Add/Remove writes through, and Search maps Documents back to chunk IDs.
type KeywordHit ¶
KeywordHit is one result from a KeywordBackend.
type LangAware ¶
type LangAware struct {
// MaxRunes caps a single produced chunk in runes. Chunks above this
// are re-chunked with FixedWindow. Defaults to 1024.
MaxRunes int
// Fallback is used for unknown kinds and for over-large structural
// chunks. Defaults to NewFixedWindow(512, 64).
Fallback Chunker
}
LangAware is a structure-aware chunker that splits Go source on top-level declarations and Markdown on headings, falling back to a FixedWindow for anything it can't parse or for chunks that exceed MaxRunes.
Routing key is Document.Metadata["kind"] when present; otherwise the Source extension drives the choice (.go → go, .md/.markdown → md).
func NewLangAware ¶
func NewLangAware() *LangAware
NewLangAware returns a LangAware chunker with sensible defaults.
type MemoryKeyword ¶
type MemoryKeyword struct {
// contains filtered or unexported fields
}
MemoryKeyword is an allocation-cheap, dependency-free KeywordBackend used as the default for [Options.Keyword] when the caller wants hybrid retrieval without wiring battery/search.
It scores documents using a simplified BM25-flavoured formula: term frequency in the document, normalised by document length and scaled by inverse document frequency. This is not BM25 — there is no IDF saturation parameter — but it is the right shape for fusing with vector scores via RRF, where only the rank order matters.
func NewMemoryKeyword ¶
func NewMemoryKeyword() *MemoryKeyword
NewMemoryKeyword returns an empty in-memory keyword backend.
func (*MemoryKeyword) Delete ¶
func (m *MemoryKeyword) Delete(_ context.Context, id string) error
Delete implements KeywordBackend.
func (*MemoryKeyword) Index ¶
func (m *MemoryKeyword) Index(_ context.Context, id, text string) error
Index implements KeywordBackend. Re-indexing the same id replaces the prior entry so totals stay consistent.
func (*MemoryKeyword) Search ¶
func (m *MemoryKeyword) Search(_ context.Context, text string, top int) ([]KeywordHit, error)
Search implements KeywordBackend. Documents with zero matching terms are not returned.
type ModelMismatchError ¶
ModelMismatchError is returned when a persisted snapshot was produced by a different embedder than the one configured on the store. Mixing vectors from different models is silently catastrophic for retrieval quality, so the load is refused loudly instead.
func (*ModelMismatchError) Error ¶
func (e *ModelMismatchError) Error() string
type OllamaConfig ¶
type OllamaConfig struct {
// BaseURL is the Ollama server root. Defaults to
// "http://localhost:11434". Trailing slashes are trimmed.
BaseURL string
// Model is the embedding model name (e.g. "nomic-embed-text",
// "mxbai-embed-large", "all-minilm"). Defaults to
// "nomic-embed-text". Whatever you pick must be a model the server
// has pulled.
Model string
// Dim is the embedding dimension. If 0, it is detected on the
// first call by inspecting the response. Setting it explicitly
// skips a probe at construction time and lets [FlatStore] pre-size.
Dim int
// Timeout is the per-request HTTP timeout. Defaults to 30s.
Timeout time.Duration
// Client is an optional pre-configured http.Client (for tests or
// custom transport). When nil, a fresh client with the timeout is
// constructed.
Client *http.Client
}
OllamaConfig configures an OllamaEmbedder.
type OllamaEmbedder ¶
type OllamaEmbedder struct {
// contains filtered or unexported fields
}
OllamaEmbedder is an Embedder that calls a locally running Ollama (or any compatible) server's /api/embed endpoint. Ollama itself, LM Studio, and llama.cpp's server all expose this contract.
This is the recommended swap from StubEmbedder for production use:
idx, _ := embed.Open(embed.Options{
Embedder: embed.NewOllamaEmbedder(embed.OllamaConfig{
Model: "nomic-embed-text",
}),
})
Pre-flight: run `ollama serve` (it auto-starts on most installs) and `ollama pull nomic-embed-text`. The default model is 768-dim and is the standard recommendation for local semantic search.
Failure modes:
- Server unreachable → [Embed] returns the underlying network error. Callers should not blindly retry — a missing daemon is a configuration problem, not a transient one.
- Model not pulled → Ollama responds 404 with a clear message; we surface it verbatim so users see "pull <model> first".
- Dim mismatch with the configured FlatStore is detected later at [Store.Add] time via [errVecDim].
func NewOllamaEmbedder ¶
func NewOllamaEmbedder(cfg OllamaConfig) *OllamaEmbedder
NewOllamaEmbedder constructs an Ollama-backed embedder. It does not make any network calls until [Embed] is invoked — failures during boot are deferred to first use so misconfigurations can be retried without restarting the app.
func (*OllamaEmbedder) Dim ¶
func (e *OllamaEmbedder) Dim() int
Dim returns the embedding dimension. If the dimension was not set in the config and no embedding has been produced yet, Dim returns 0; Open should be passed a config with Dim set when the store is initialised separately.
func (*OllamaEmbedder) Embed ¶
Embed calls /api/embed with the entire batch. Ollama performs the batching server-side, so we hand the slice over whole instead of looping client-side.
func (*OllamaEmbedder) Name ¶
func (e *OllamaEmbedder) Name() string
Name returns "ollama:<model>" — the snapshot fingerprint baked into persisted indexes uses this string, so changing models triggers a loud ModelMismatchError on reload.
type Options ¶
type Options struct {
// Embedder produces vectors. Required.
Embedder Embedder
// Chunker splits documents. Defaults to a [FixedWindow] with 512-rune
// windows and 64-rune overlap if nil.
Chunker Chunker
// Store holds vectors. Defaults to an in-memory [FlatStore] if nil.
//
// A custom Store must additionally implement Snapshot(path)/LoadSnapshot(path)
// to be used with Path (persistence), and ChunkIDsForDoc(docID)/ChunkByID(id)/
// AllChunks() to be used with Keyword (hybrid search). Open returns an error
// rather than silently degrading if a capability is missing. FlatStore
// implements all of them.
Store Store
// Keyword enables hybrid retrieval. When set, chunk text is
// mirrored into this backend at Add time and consulted during
// Query when [Query.Hybrid] is true. When nil, hybrid queries
// silently fall back to pure vector retrieval. Use
// [NewMemoryKeyword] for a zero-dep in-process backend or wrap any
// [battery/search].Backend with [WrapSearchBackend].
Keyword KeywordBackend
// Reranker is the optional second-stage scorer. When nil,
// [Query.Rerank] requests error out so silent quality loss is
// impossible.
Reranker Reranker
// Path is the directory where the snapshot and WAL live. When
// empty, the index runs purely in memory and [Index.Snapshot] is a
// no-op. When set, [Open] loads any existing snapshot, replays the
// WAL on top, and persists subsequent writes through the WAL.
Path string
// SnapshotEvery is the number of mutating ops after which the index
// automatically flushes a full snapshot and truncates the WAL. 0
// disables auto-snapshot; -1 snapshots after every op (mostly for
// tests). The default is 1000.
SnapshotEvery int
}
Options configures Open.
type Plugin ¶
type Plugin struct {
// contains filtered or unexported fields
}
Plugin is the framework.Plugin adapter for the embed battery. It owns no state of its own — callers construct an Index and hand it to NewPlugin, which then registers /embed/* routes on the app's router during framework.App.Init.
Mount via:
idx, _ := embed.Open(embed.Options{Embedder: e, Path: "~/.gofastr/embed/myapp"})
app.Plugins.Register(embed.NewPlugin(idx))
func NewPlugin ¶
NewPlugin returns a Plugin that mounts routes under "/embed". Use Plugin.WithPrefix to change the mount point.
func (*Plugin) Index ¶
Index returns the underlying Index so other plugins or the app can perform direct calls without going through HTTP.
func (*Plugin) Init ¶
Init implements framework.Plugin. Mounts the stdlib Handler under the configured prefix on the app's router; routing semantics match Go 1.22 ServeMux.
func (*Plugin) WithPrefix ¶
WithPrefix overrides the URL prefix. Leading slash required.
type Query ¶
type Query struct {
Text string `json:"text"`
K int `json:"k,omitempty"`
Filter Filter `json:"filter,omitempty"`
Hybrid bool `json:"hybrid,omitempty"`
MMRLambda float64 `json:"mmr_lambda,omitempty"`
Rerank bool `json:"rerank,omitempty"`
}
Query is the input to [Index.Query].
Zero-value semantics:
- K=0 → 10
- Hybrid=false and MMRLambda=0 → pure vector retrieval, no diversity reranking.
type Reranker ¶
Reranker is an optional second-stage scorer applied to candidate hits before they are returned. It receives the original query text (not its vector) so cross-encoder rerankers can score query/chunk pairs jointly. The returned slice may be reordered or truncated; score values should reflect the reranker's confidence.
type Stats ¶
type Stats struct {
Docs int `json:"docs"`
Chunks int `json:"chunks"`
Dim int `json:"dim"`
Model string `json:"model"`
}
Stats describes the current state of an Index.
type Store ¶
type Store interface {
Add(ctx context.Context, chunks []Chunk) error
RemoveDoc(ctx context.Context, docID string) error
// Candidates returns up to top chunks by cosine similarity to qv,
// after applying filter. The returned slice MUST NOT be retained;
// callers should copy if needed.
Candidates(ctx context.Context, qv []float32, filter Filter, top int) ([]Hit, error)
Stats() Stats
}
Store is the per-app vector store. It owns the embedding lifecycle for added documents and exposes a candidate-generation API the retrieval pipeline composes on top of.
type StubEmbedder ¶
type StubEmbedder struct {
// contains filtered or unexported fields
}
StubEmbedder is a deterministic, dependency-free Embedder used by tests and offline development. It produces a hashed bag-of-words representation: each whitespace-separated token is hashed into one of Dim buckets with a sign derived from a second hash, and the result is L2-normalized.
It is NOT a real embedding model — there is no semantic similarity across paraphrases or synonyms — but it is fast, allocation-light, and produces high cosine similarity for documents that share tokens, which is enough to test the retrieval pipeline end-to-end.
func NewStubEmbedder ¶
func NewStubEmbedder(dim int) *StubEmbedder
NewStubEmbedder returns a StubEmbedder of the given dimension. Dimensions <= 0 fall back to 128.
type WatchOptions ¶
type WatchOptions struct {
// IncludeExts is the set of file extensions to index, including the
// leading dot (".go", ".md"). Empty means all files.
IncludeExts []string
// ExcludeDirs is a set of directory names to skip entirely, matched
// on the path's base name. Defaults to a sensible set if nil:
// {".git", "node_modules", "dist", ".gofastr", "vendor"}.
ExcludeDirs []string
// PollInterval is how often the watcher re-scans roots after the
// initial walk. Defaults to 2 seconds. Set to <= 0 to disable
// polling (initial walk only).
PollInterval time.Duration
// MaxFileSize skips files larger than this. 0 means 1 MiB. Negative
// means no limit.
MaxFileSize int64
// MetadataFunc derives chunk metadata from the absolute path of a
// file being indexed. When nil, a default sets {"kind": "code"} for
// .go files and {"kind": "doc"} for .md/.txt; everything else is
// untagged.
MetadataFunc func(absPath string) map[string]any
}
WatchOptions configures a Watcher.
type Watcher ¶
type Watcher struct {
// contains filtered or unexported fields
}
Watcher walks one or more filesystem roots, feeds matching files into an Index as [Document]s, and (optionally) re-scans on a timer to keep the index in sync with changes.
The watcher uses polling rather than OS-level file events so it has no third-party dependency. Cost on small trees (~thousands of files) is negligible; on very large trees, swap for an fsnotify-backed implementation by replacing this file.
func NewWatcher ¶
func NewWatcher(idx Index, opts WatchOptions) *Watcher
NewWatcher constructs a Watcher. The Index is fed via [Index.Add] and [Index.Remove] as files appear, change, and disappear.