lsh

package
v1.0.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: Apache-2.0, Apache-2.0 Imports: 5 Imported by: 0

Documentation

Overview

Package lsh provides a Locality-Sensitive Hashing index for fast approximate nearest-neighbor retrieval of MinHash signatures.

LSH groups similar MinHash signatures into the same buckets by hashing bands of consecutive hash values. This enables O(N) indexing and sublinear query time, replacing O(N^2) pairwise comparison.

The index is parameterized by numBands and numRows where numBands * numRows = numHashes. Higher numBands lowers the similarity threshold for candidate retrieval.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrInvalidParams is returned when numBands or numRows is not positive.
	ErrInvalidParams = errors.New("lsh: numBands and numRows must be positive")

	// ErrNilSignature is returned when a nil signature is provided.
	ErrNilSignature = errors.New("lsh: signature must not be nil")

	// ErrSizeMismatch is returned when signature size does not match numBands * numRows.
	ErrSizeMismatch = errors.New("lsh: signature size must equal numBands * numRows")
)

Functions

This section is empty.

Types

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index is a thread-safe LSH index for approximate nearest-neighbor retrieval.

func New

func New(numBands, numRows int) (*Index, error)

New creates a new LSH index with the given number of bands and rows per band. The total number of hash functions expected from signatures is numBands * numRows.

func (*Index) Insert

func (idx *Index) Insert(id string, sig *minhash.Signature) error

Insert adds a signature to the index with the given identifier. Returns an error if sig is nil or its size does not match numBands * numRows.

func (*Index) Query

func (idx *Index) Query(sig *minhash.Signature) ([]string, error)

Query returns deduplicated candidate IDs whose signatures share at least one band hash with the query signature.

func (*Index) QueryThreshold

func (idx *Index) QueryThreshold(sig *minhash.Signature, threshold float64) ([]string, error)

QueryThreshold returns candidate IDs whose exact MinHash similarity with the query signature is at or above the given threshold.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL