vectorindex

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package vectorindex provides an in-memory HNSW vector index for fast approximate nearest-neighbor search over chunk/entity embeddings.

It replaces the previous O(n) brute-force scan (store.CosineSimilarity over every chunk) with a logarithmic HNSW lookup, while keeping the SQLite store as the single source of truth: the index is rebuilt from BLOBs on boot via BuildFromStore. Concurrent reads + writes are safe (sync.RWMutex).

Index

Constants

This section is empty.

Variables

View Source
var ErrDimMismatch = errors.New("vectorindex: vector dimension mismatch")

ErrDimMismatch is returned when a vector's dimension doesn't match the index's established dimension.

View Source
var ErrEmptyQuery = errors.New("vectorindex: empty query vector")

ErrEmptyQuery is returned when Search is called with an empty query vector.

Functions

This section is empty.

Types

type HNSW

type HNSW struct {
	// contains filtered or unexported fields
}

HNSW is a small, pure-Go Hierarchical Navigable Small World index.

It replaces the initial coder/hnsw wrapper, which shipped with broken top-k eviction (Max() on a min-heap returned data[last], evicting a near- arbitrary candidate instead of the worst). See: https://github.com/coder/hnsw/blob/v0.6.1/heap/heap.go#L85

This implementation follows Malkov & Yashunin (arXiv:1603.09320) with the conventional simplifications used by many OSS ports (FAISS, hnswlib):

  • Layer-0 is the full graph; higher layers are probabilistically sampled copies with exponential decay (Ml = 1/ln(M)).
  • Candidate/dynamic list uses a two-heap strategy: a max-heap of the top-k found so far, and a min-heap of candidates to expand next.
  • Cosine similarity is the default; vectors are not normalized in place (caller should normalize if they care about strict cosine).

Safe for concurrent use: all public methods take the RWMutex.

func NewDefaultHNSW

func NewDefaultHNSW() *HNSW

NewDefaultHNSW is the conventional M=16, efSearch=50, efConstruction=200 configuration.

func NewHNSW

func NewHNSW(m, efConstr, efSearch int) *HNSW

NewHNSW constructs an empty index with explicit parameters. m is the max neighbors per node (>=4, default 16 when <=0). efConstr is the ef parameter used during Add (default 200). efSearch is the ef parameter used at query time (default 50).

func (*HNSW) Add

func (h *HNSW) Add(id string, vec []float32) error

Add inserts (or replaces) a vector under id.

func (*HNSW) Dims

func (h *HNSW) Dims() int

Dims returns the dimensionality of indexed vectors (0 before first Add).

func (*HNSW) Remove

func (h *HNSW) Remove(id string) error

Remove drops id from the index. Missing ids are a no-op.

func (*HNSW) Search

func (h *HNSW) Search(query []float32, k int) ([]Hit, error)

Search returns the top-k nearest neighbors to query (by cosine similarity).

func (*HNSW) Size

func (h *HNSW) Size() int

Size returns the number of indexed vectors.

type Hit

type Hit struct {
	ID    string
	Score float32 // cosine similarity in [-1, 1]; 1.0 == identical
}

Hit is a single search result.

type Index

type Index interface {
	// Add inserts (or replaces) a vector under id. vec must match the
	// dimensionality of previously-added vectors; if not, an error is
	// returned (first Add establishes the dimension).
	Add(id string, vec []float32) error
	// Remove drops id from the index. Missing ids are a no-op.
	Remove(id string) error
	// Search returns the top-k most similar vectors to query, ordered by
	// descending cosine similarity.
	Search(query []float32, k int) ([]Hit, error)
	// Size returns the number of vectors currently in the index.
	Size() int
	// Dims returns the vector dimension (0 before first Add).
	Dims() int
}

Index is the vector-index interface. Implementations must be safe for concurrent use by multiple goroutines.

func BuildFromStore

func BuildFromStore(ctx context.Context, st *store.Store) (Index, error)

BuildFromStore scans every (chunk, embedding) row in st and loads it into a fresh HNSW index. The embedding model is auto-detected from the first row; if multiple models coexist, only the dominant one is indexed (warning logged).

Returns an empty index (Size()==0) when there are no embeddings yet — this is the common case on a fresh install and not an error.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL