hnsw

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2024 License: CC0-1.0 Imports: 7 Imported by: 5

README

hnsw

GoDoc

Package hnsw implements Hierarchical Navigable Small World graphs in Go. You can read up about how they work here. In essence, they allow for fast approximate nearest neighbor searches with high-dimensional vector data.

This package can be thought of as an in-memory alternative to your favorite vector database (e.g. Pinecone, Weaviate). Granted, it implements just the essential operations:

Operation Complexity Description
Insert $O(log(n))$ Insert a vector into the graph
Delete $O(M^2 \cdot log(n))$ Delete a vector from the graph
Search $O(log(n))$ Search for the nearest neighbors of a vector
Lookup $O(1)$ Retrieve an object by ID

Note: Complexities are approximate where $n$ is the number of vectors in the graph and $M$ is the maximum number of neighbors each node can have. This paper is a good resource for understanding the effect of the various construction parameters.

Usage:

g := hnsw.NewGraph[hnsw.Vector]()
g.Add(
    hnsw.MakeVector("1", []float32{1, 1, 1}),
    hnsw.MakeVector("2", []float32{1, -1, 0.999}),
    hnsw.MakeVector("3", []float32{1, 0, -0.5}),
)

neighbors := g.Search(
    []float32{0.5, 0.5, 0.5},
    1,
)
fmt.Printf("best friend: %v\n", neighbors[0].Embedding())
// Output: best friend: [1 1 1]

Performance

By and large the greatest effect you can have on the performance of the graph is reducing the dimensionality of your data. At 1536 dimensions (OpenAI default), 70% of the query process under default parameters is spent in the distance function.

Roadmap

  • Implement durable, file-system backend

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CosineDistance

func CosineDistance(a, b []float32) float32

CosineDistance computes the cosine distance between two vectors.

func EuclideanDistance

func EuclideanDistance(a, b []float32) float32

EuclideanDistance computes the Euclidean distance between two vectors.

Types

type Analyzer

type Analyzer[T Embeddable] struct {
	Graph *Graph[T]
}

Analyzer is a struct that holds a graph and provides methods for analyzing it.

func (*Analyzer[T]) Connectivity

func (a *Analyzer[T]) Connectivity() []float64

Connectivity returns the average number of edges in the graph for each non-empty layer.

func (*Analyzer[T]) Height

func (a *Analyzer[T]) Height() int

func (*Analyzer[T]) Topography

func (a *Analyzer[T]) Topography() []int

Topography returns the number of nodes in each layer of the graph.

type DistanceFunc

type DistanceFunc func(a, b []float32) float32

DistanceFunc is a function that computes the distance between two vectors.

type Embeddable

type Embeddable interface {
	// ID returns a unique identifier for the object.
	ID() string
	// Embedding returns the embedding of the object.
	// float32 is used for compatibility with OpenAI embeddings.
	Embedding() Embedding
}

Embeddable describes a type that can be embedded in a HNSW graph.

type Embedding

type Embedding = []float32

type Graph

type Graph[T Embeddable] struct {
	// M is the maximum number of neighbors to keep for each node.
	M int
	// Distance is the distance function used to compare embeddings.
	Distance DistanceFunc

	// Ml is the level generation factor. E.g. 1 / log(Ml) is the probability
	// of adding a node to a level.
	Ml float64

	// EfSearch is the number of nodes to consider in the search phase.
	EfSearch int

	// Rng is used for level generation. It may be set to a deterministic value
	// for reproducibility. Note that deterministic number generation can lead to
	// degenerate graphs when exposed to adversarial inputs.
	Rng *rand.Rand
	// contains filtered or unexported fields
}

Graph is a Hierarchical Navigable Small World graph. All public parameters must be set before adding nodes to the graph.

func NewGraph

func NewGraph[T Embeddable]() *Graph[T]

NewGraph returns a new graph with default parameters, roughly designed for storing OpenAI embeddings.

func (*Graph[T]) Add

func (g *Graph[T]) Add(nodes ...T)

Add inserts nodes into the graph. If another node with the same ID exists, it is replaced.

func (*Graph[T]) Delete

func (h *Graph[T]) Delete(id string)

Delete removes a node from the graph by ID. It tries to preserve the clustering properties of the graph by replenishing the affected neighborhoods.

func (*Graph[T]) Len

func (h *Graph[T]) Len() int

Len returns the number of nodes in the graph.

func (*Graph[T]) Lookup

func (h *Graph[T]) Lookup(id string) (T, bool)

Lookup returns the node with the given ID.

func (*Graph[T]) Search

func (h *Graph[T]) Search(near Embedding, k int) []T

Search finds the k nearest neighbors to the target node.

type Vector

type Vector struct {
	// contains filtered or unexported fields
}

Vector is a struct that holds an ID and an embedding and implements the Embeddable interface.

func MakeVector

func MakeVector(id string, embedding []float32) Vector

MakeVector creates a new Vector with the given ID and embedding.

func (Vector) Embedding

func (v Vector) Embedding() []float32

func (Vector) ID

func (v Vector) ID() string

Directories

Path Synopsis
example
readme command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL