embedding

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package embedding provides semantic vector embeddings for symbols. Uses MiniLM-L6-v2 via hugot (pure-Go ONNX runtime) for offline embedding generation. Model auto-downloads from Hugging Face on first use (~30MB). Vectors are stored in an HNSW index (coder/hnsw) for nearest-neighbor search.

Index

Constants

This section is empty.

Variables

View Source
var (
	Dims = 768
)

Model configuration. Override with KNOWING_EMBED_MODEL env var. Default: nomic-code (P@10 0.247, faster than jina-code, session 17 validated). Options: "nomic-code" (default), "jina-code", "bge-small"

Functions

This section is empty.

Types

type Embedder

type Embedder struct {
	// contains filtered or unexported fields
}

Embedder generates embedding vectors and provides nearest-neighbor search.

func New

func New() (*Embedder, error)

New creates an Embedder, downloading the model if needed. The model is cached at ~/.cache/knowing/models/.

func (*Embedder) AddVector

func (e *Embedder) AddVector(id string, vec []float32)

AddVector indexes a symbol ID with its embedding vector for nearest-neighbor search.

func (*Embedder) Close

func (e *Embedder) Close() error

Close releases resources.

func (*Embedder) Count

func (e *Embedder) Count() int

Count returns the number of indexed vectors.

func (*Embedder) Embed

func (e *Embedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed returns the embedding vector for a single text string.

func (*Embedder) EmbedBatch

func (e *Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch returns embedding vectors for multiple texts.

func (*Embedder) Search

func (e *Embedder) Search(query []float32, k int) []string

Search returns the k nearest neighbor symbol IDs to the query vector.

type EmbeddingStore added in v0.11.0

type EmbeddingStore interface {
	BatchPutEmbeddings(ctx context.Context, model string, hashes []types.Hash, vectors [][]byte) error
	GetEmbeddings(ctx context.Context, model string, hashes []types.Hash) (map[types.Hash][]byte, error)
	GetAllEmbeddings(ctx context.Context, model string) (map[types.Hash][]byte, error)
}

EmbeddingStore persists embedding vectors keyed by node hash and model name. Implemented by store.SQLiteStore.

type Searcher

type Searcher struct {
	// contains filtered or unexported fields
}

Searcher wraps an Embedder and provides the VectorSearcher interface expected by the context engine. It resolves HNSW string keys (hex-encoded node hashes) back to types.Hash values.

func NewSearcher

func NewSearcher(e *Embedder) *Searcher

NewSearcher creates a Searcher from an initialized Embedder.

func (*Searcher) Close

func (s *Searcher) Close() error

Close releases the underlying embedder resources.

func (*Searcher) Count

func (s *Searcher) Count() int

Count returns the number of indexed vectors.

func (*Searcher) EmbedAndSearch

func (s *Searcher) EmbedAndSearch(ctx context.Context, query string, k int) ([]types.Hash, error)

EmbedAndSearch embeds the query text and returns the k nearest symbol hashes.

func (*Searcher) IndexBatch

func (s *Searcher) IndexBatch(ctx context.Context, nodes []types.Node, filePaths []string) error

IndexBatch embeds multiple nodes and adds them to the HNSW index. More efficient than individual IndexNode calls due to batched embedding. When a store is attached, vectors are also persisted to SQLite.

func (*Searcher) IndexNode

func (s *Searcher) IndexNode(ctx context.Context, node types.Node, filePath string) error

IndexNode embeds a node's text representation and adds it to the HNSW index. The text format is: "kind name signature filepath"

func (*Searcher) LoadAndSearchFromStore added in v0.12.0

func (s *Searcher) LoadAndSearchFromStore(ctx context.Context, query string, k int) ([]types.Hash, error)

LoadAndSearchFromStore loads all cached vectors from the embedding store and performs brute-force cosine similarity search against the query. This bypasses the HNSW index entirely: no index build, no in-memory graph, just read vectors from SQLite and compute cosine. O(n) per query but eliminates the multi-minute HNSW rebuild that dominates startup time on large repos.

Returns the k nearest node hashes sorted by descending similarity. If the store has no vectors, returns nil (not an error).

func (*Searcher) Model added in v0.12.0

func (s *Searcher) Model() string

Model returns the model identifier used for cache keys.

func (*Searcher) PreloadVectors added in v0.13.0

func (s *Searcher) PreloadVectors(ctx context.Context) int

PreloadVectors eagerly loads all cached vectors from the embedding store into memory. Call this at engine setup time to eliminate per-task SQLite reads during gap-fill. Safe to call multiple times (no-op after first load). Returns the number of vectors loaded.

func (*Searcher) ReRank added in v0.10.0

func (s *Searcher) ReRank(ctx context.Context, query string, candidates []string) ([]int, error)

ReRank embeds the query and each candidate text, returns indices sorted by descending cosine similarity to the query. Used as a post-RWR re-ranking step.

func (*Searcher) ReRankByHashes added in v0.11.0

func (s *Searcher) ReRankByHashes(ctx context.Context, query string, hashes []types.Hash, fallbackTexts []string) ([]float64, error)

ReRankByHashes re-ranks candidates using cached vectors from the store. Only embeds the query text (1 inference call). Candidates with no cached vector fall back to on-the-fly embedding. Returns cosine similarity scores at original index positions, same contract as ReRankScores.

func (*Searcher) ReRankScores added in v0.10.0

func (s *Searcher) ReRankScores(ctx context.Context, query string, candidates []string) ([]float64, error)

ReRankScores embeds the query and each candidate, returns cosine similarity scores for each candidate at its original index position.

func (*Searcher) SetStore added in v0.11.0

func (s *Searcher) SetStore(store EmbeddingStore)

SetStore attaches a persistent embedding store for vector caching. When set, IndexBatch writes vectors to the store, and ReRankByHashes reads cached vectors instead of re-embedding candidates.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL