Documentation
¶
Overview ¶
Package vectorindex provides an in-memory HNSW vector index for fast approximate nearest-neighbor search over chunk/entity embeddings.
It replaces the previous O(n) brute-force scan (store.CosineSimilarity over every chunk) with a logarithmic HNSW lookup, while keeping the SQLite store as the single source of truth: the index is rebuilt from BLOBs on boot via BuildFromStore. Concurrent reads + writes are safe (sync.RWMutex).
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrDimMismatch = errors.New("vectorindex: vector dimension mismatch")
ErrDimMismatch is returned when a vector's dimension doesn't match the index's established dimension.
var ErrEmptyQuery = errors.New("vectorindex: empty query vector")
ErrEmptyQuery is returned when Search is called with an empty query vector.
Functions ¶
This section is empty.
Types ¶
type HNSW ¶
type HNSW struct {
// contains filtered or unexported fields
}
HNSW is a small, pure-Go Hierarchical Navigable Small World index.
It replaces the initial coder/hnsw wrapper, which shipped with broken top-k eviction (Max() on a min-heap returned data[last], evicting a near- arbitrary candidate instead of the worst). See: https://github.com/coder/hnsw/blob/v0.6.1/heap/heap.go#L85
This implementation follows Malkov & Yashunin (arXiv:1603.09320) with the conventional simplifications used by many OSS ports (FAISS, hnswlib):
- Layer-0 is the full graph; higher layers are probabilistically sampled copies with exponential decay (Ml = 1/ln(M)).
- Candidate/dynamic list uses a two-heap strategy: a max-heap of the top-k found so far, and a min-heap of candidates to expand next.
- Cosine similarity is the default; vectors are not normalized in place (caller should normalize if they care about strict cosine).
Safe for concurrent use: all public methods take the RWMutex.
func NewDefaultHNSW ¶
func NewDefaultHNSW() *HNSW
NewDefaultHNSW is the conventional M=16, efSearch=50, efConstruction=200 configuration.
func NewHNSW ¶
NewHNSW constructs an empty index with explicit parameters. m is the max neighbors per node (>=4, default 16 when <=0). efConstr is the ef parameter used during Add (default 200). efSearch is the ef parameter used at query time (default 50).
type Index ¶
type Index interface {
// Add inserts (or replaces) a vector under id. vec must match the
// dimensionality of previously-added vectors; if not, an error is
// returned (first Add establishes the dimension).
Add(id string, vec []float32) error
// Remove drops id from the index. Missing ids are a no-op.
Remove(id string) error
// Search returns the top-k most similar vectors to query, ordered by
// descending cosine similarity.
Search(query []float32, k int) ([]Hit, error)
// Size returns the number of vectors currently in the index.
Size() int
// Dims returns the vector dimension (0 before first Add).
Dims() int
}
Index is the vector-index interface. Implementations must be safe for concurrent use by multiple goroutines.
func BuildFromStore ¶
BuildFromStore scans every (chunk, embedding) row in st and loads it into a fresh HNSW index. The embedding model is auto-detected from the first row; if multiple models coexist, only the dominant one is indexed (warning logged).
Returns an empty index (Size()==0) when there are no embeddings yet — this is the common case on a fresh install and not an error.