Documentation
¶
Overview ¶
Package embedding provides semantic vector embeddings for symbols. Uses MiniLM-L6-v2 via hugot (pure-Go ONNX runtime) for offline embedding generation. Model auto-downloads from Hugging Face on first use (~30MB). Vectors are stored in an HNSW index (coder/hnsw) for nearest-neighbor search.
Index ¶
- Variables
- type Embedder
- func (e *Embedder) AddVector(id string, vec []float32)
- func (e *Embedder) Close() error
- func (e *Embedder) Count() int
- func (e *Embedder) Embed(ctx context.Context, text string) ([]float32, error)
- func (e *Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
- func (e *Embedder) Search(query []float32, k int) []string
- type EmbeddingStore
- type Searcher
- func (s *Searcher) Close() error
- func (s *Searcher) Count() int
- func (s *Searcher) EmbedAndSearch(ctx context.Context, query string, k int) ([]types.Hash, error)
- func (s *Searcher) IndexBatch(ctx context.Context, nodes []types.Node, filePaths []string) error
- func (s *Searcher) IndexNode(ctx context.Context, node types.Node, filePath string) error
- func (s *Searcher) ReRank(ctx context.Context, query string, candidates []string) ([]int, error)
- func (s *Searcher) ReRankByHashes(ctx context.Context, query string, hashes []types.Hash, fallbackTexts []string) ([]float64, error)
- func (s *Searcher) ReRankScores(ctx context.Context, query string, candidates []string) ([]float64, error)
- func (s *Searcher) SetStore(store EmbeddingStore)
Constants ¶
This section is empty.
Variables ¶
var (
Dims = 768
)
Model configuration. Override with KNOWING_EMBED_MODEL env var. Default: jina-code (best for code retrieval, +17% P@10 validated). Options: "jina-code" (default), "bge-small", "nomic-code"
Functions ¶
This section is empty.
Types ¶
type Embedder ¶
type Embedder struct {
// contains filtered or unexported fields
}
Embedder generates embedding vectors and provides nearest-neighbor search.
func New ¶
New creates an Embedder, downloading the model if needed. The model is cached at ~/.cache/knowing/models/.
func (*Embedder) AddVector ¶
AddVector indexes a symbol ID with its embedding vector for nearest-neighbor search.
func (*Embedder) EmbedBatch ¶
EmbedBatch returns embedding vectors for multiple texts.
type EmbeddingStore ¶ added in v0.11.0
type EmbeddingStore interface {
BatchPutEmbeddings(ctx context.Context, model string, hashes []types.Hash, vectors [][]byte) error
GetEmbeddings(ctx context.Context, model string, hashes []types.Hash) (map[types.Hash][]byte, error)
}
EmbeddingStore persists embedding vectors keyed by node hash and model name. Implemented by store.SQLiteStore.
type Searcher ¶
type Searcher struct {
// contains filtered or unexported fields
}
Searcher wraps an Embedder and provides the VectorSearcher interface expected by the context engine. It resolves HNSW string keys (hex-encoded node hashes) back to types.Hash values.
func NewSearcher ¶
NewSearcher creates a Searcher from an initialized Embedder.
func (*Searcher) EmbedAndSearch ¶
EmbedAndSearch embeds the query text and returns the k nearest symbol hashes.
func (*Searcher) IndexBatch ¶
IndexBatch embeds multiple nodes and adds them to the HNSW index. More efficient than individual IndexNode calls due to batched embedding. When a store is attached, vectors are also persisted to SQLite.
func (*Searcher) IndexNode ¶
IndexNode embeds a node's text representation and adds it to the HNSW index. The text format is: "kind name signature filepath"
func (*Searcher) ReRank ¶ added in v0.10.0
ReRank embeds the query and each candidate text, returns indices sorted by descending cosine similarity to the query. Used as a post-RWR re-ranking step.
func (*Searcher) ReRankByHashes ¶ added in v0.11.0
func (s *Searcher) ReRankByHashes(ctx context.Context, query string, hashes []types.Hash, fallbackTexts []string) ([]float64, error)
ReRankByHashes re-ranks candidates using cached vectors from the store. Only embeds the query text (1 inference call). Candidates with no cached vector fall back to on-the-fly embedding. Returns cosine similarity scores at original index positions, same contract as ReRankScores.
func (*Searcher) ReRankScores ¶ added in v0.10.0
func (s *Searcher) ReRankScores(ctx context.Context, query string, candidates []string) ([]float64, error)
ReRankScores embeds the query and each candidate, returns cosine similarity scores for each candidate at its original index position.
func (*Searcher) SetStore ¶ added in v0.11.0
func (s *Searcher) SetStore(store EmbeddingStore)
SetStore attaches a persistent embedding store for vector caching. When set, IndexBatch writes vectors to the store, and ReRankByHashes reads cached vectors instead of re-embedding candidates.