embedding

package

v0.14.0 Latest Latest Go to latest Published: Jun 3, 2026 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/blackwell-systems/knowing

Links

Open Source Insights

Documentation ¶

Overview ¶

Package embedding provides semantic vector embeddings for symbols. Uses MiniLM-L6-v2 via hugot (pure-Go ONNX runtime) for offline embedding generation. Model auto-downloads from Hugging Face on first use (~30MB). Vectors are stored in an HNSW index (coder/hnsw) for nearest-neighbor search.

Index ¶

Variables
type Embedder
- func New() (*Embedder, error)
type EmbeddingStore
type Searcher
- func NewSearcher(e *Embedder) *Searcher

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	Dims = 768
)

Model configuration. Override with KNOWING_EMBED_MODEL env var. Default: nomic-code (P@10 0.247, faster than jina-code, session 17 validated). Options: "nomic-code" (default), "jina-code", "bge-small"

Functions ¶

This section is empty.

Types ¶

type Embedder ¶

type Embedder struct {
	// contains filtered or unexported fields
}

Embedder generates embedding vectors and provides nearest-neighbor search.

func New ¶

func New() (*Embedder, error)

New creates an Embedder, downloading the model if needed. The model is cached at ~/.cache/knowing/models/.

func (*Embedder) AddVector ¶

func (e *Embedder) AddVector(id string, vec []float32)

AddVector indexes a symbol ID with its embedding vector for nearest-neighbor search.

func (*Embedder) Close ¶

func (e *Embedder) Close() error

Close releases resources.

func (*Embedder) Count ¶

func (e *Embedder) Count() int

Count returns the number of indexed vectors.

func (*Embedder) Embed ¶

func (e *Embedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed returns the embedding vector for a single text string.

func (*Embedder) EmbedBatch ¶

func (e *Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch returns embedding vectors for multiple texts.

func (*Embedder) Search ¶

func (e *Embedder) Search(query []float32, k int) []string

Search returns the k nearest neighbor symbol IDs to the query vector.

type EmbeddingStore ¶ added in v0.11.0

type EmbeddingStore interface {
	BatchPutEmbeddings(ctx context.Context, model string, hashes []types.Hash, vectors [][]byte) error
	GetEmbeddings(ctx context.Context, model string, hashes []types.Hash) (map[types.Hash][]byte, error)
	GetAllEmbeddings(ctx context.Context, model string) (map[types.Hash][]byte, error)
}

EmbeddingStore persists embedding vectors keyed by node hash and model name. Implemented by store.SQLiteStore.

type Searcher ¶

type Searcher struct {
	// contains filtered or unexported fields
}

Searcher wraps an Embedder and provides the VectorSearcher interface expected by the context engine. It resolves HNSW string keys (hex-encoded node hashes) back to types.Hash values.

func NewSearcher ¶

func NewSearcher(e *Embedder) *Searcher

NewSearcher creates a Searcher from an initialized Embedder.

func (*Searcher) Close ¶

func (s *Searcher) Close() error

Close releases the underlying embedder resources.

func (*Searcher) Count ¶

func (s *Searcher) Count() int

Count returns the number of indexed vectors.

func (*Searcher) EmbedAndSearch ¶

func (s *Searcher) EmbedAndSearch(ctx context.Context, query string, k int) ([]types.Hash, error)

EmbedAndSearch embeds the query text and returns the k nearest symbol hashes.

func (*Searcher) IndexBatch ¶

func (s *Searcher) IndexBatch(ctx context.Context, nodes []types.Node, filePaths []string) error

IndexBatch embeds multiple nodes and adds them to the HNSW index. More efficient than individual IndexNode calls due to batched embedding. When a store is attached, vectors are also persisted to SQLite.

func (*Searcher) IndexNode ¶

func (s *Searcher) IndexNode(ctx context.Context, node types.Node, filePath string) error

IndexNode embeds a node's text representation and adds it to the HNSW index. The text format is: "kind name signature filepath"

func (*Searcher) LoadAndSearchFromStore ¶ added in v0.12.0

func (s *Searcher) LoadAndSearchFromStore(ctx context.Context, query string, k int) ([]types.Hash, error)

LoadAndSearchFromStore loads all cached vectors from the embedding store and performs brute-force cosine similarity search against the query. This bypasses the HNSW index entirely: no index build, no in-memory graph, just read vectors from SQLite and compute cosine. O(n) per query but eliminates the multi-minute HNSW rebuild that dominates startup time on large repos.

Returns the k nearest node hashes sorted by descending similarity. If the store has no vectors, returns nil (not an error).

func (*Searcher) Model ¶ added in v0.12.0

func (s *Searcher) Model() string

Model returns the model identifier used for cache keys.

func (*Searcher) PreloadVectors ¶ added in v0.13.0

func (s *Searcher) PreloadVectors(ctx context.Context) int

PreloadVectors eagerly loads all cached vectors from the embedding store into memory. Call this at engine setup time to eliminate per-task SQLite reads during gap-fill. Safe to call multiple times (no-op after first load). Returns the number of vectors loaded.

func (*Searcher) ReRank ¶ added in v0.10.0

func (s *Searcher) ReRank(ctx context.Context, query string, candidates []string) ([]int, error)

ReRank embeds the query and each candidate text, returns indices sorted by descending cosine similarity to the query. Used as a post-RWR re-ranking step.

func (*Searcher) ReRankByHashes ¶ added in v0.11.0

func (s *Searcher) ReRankByHashes(ctx context.Context, query string, hashes []types.Hash, fallbackTexts []string) ([]float64, error)

ReRankByHashes re-ranks candidates using cached vectors from the store. Only embeds the query text (1 inference call). Candidates with no cached vector fall back to on-the-fly embedding. Returns cosine similarity scores at original index positions, same contract as ReRankScores.

func (*Searcher) ReRankScores ¶ added in v0.10.0

func (s *Searcher) ReRankScores(ctx context.Context, query string, candidates []string) ([]float64, error)

ReRankScores embeds the query and each candidate, returns cosine similarity scores for each candidate at its original index position.

func (*Searcher) SetStore ¶ added in v0.11.0

func (s *Searcher) SetStore(store EmbeddingStore)

SetStore attaches a persistent embedding store for vector caching. When set, IndexBatch writes vectors to the store, and ReRankByHashes reads cached vectors instead of re-embedding candidates.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL