Documentation
¶
Overview ¶
Package embeddings provides a client for the local embedding service.
Package embeddings provides utilities for loading pre-computed embedding datasets like SIFT1M, GloVe, etc. for benchmarking the private vector search system.
Index ¶
- func LoadBvecs(path string) ([][]float64, error)
- func LoadFvecs(path string) ([][]float64, error)
- func LoadIvecs(path string) ([][]int, error)
- func ReadBvecs(r io.Reader) ([][]float64, error)
- func ReadFvecs(r io.Reader) ([][]float64, error)
- func ReadIvecs(r io.Reader) ([][]int, error)
- func SaveFvecs(path string, vectors [][]float64) error
- func WriteFvecs(w io.Writer, vectors [][]float64) error
- type Client
- type Config
- type Dataset
- type DatasetStats
- type EmbedRequest
- type EmbedResponse
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func LoadBvecs ¶
LoadBvecs loads byte vectors from a .bvecs file (used by some datasets).
BVECS format: For each vector:
- 4 bytes: dimension (int32, little-endian)
- dimension bytes: uint8 values
func LoadFvecs ¶
LoadFvecs loads vectors from a .fvecs file (used by SIFT1M, etc.)
FVECS format: For each vector:
- 4 bytes: dimension (int32, little-endian)
- dimension * 4 bytes: float32 values (little-endian)
All vectors must have the same dimension.
func LoadIvecs ¶
LoadIvecs loads integer vectors from a .ivecs file (used for ground truth).
IVECS format (same structure as FVECS but with int32 values): For each vector:
- 4 bytes: dimension (int32, little-endian)
- dimension * 4 bytes: int32 values (little-endian)
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client is a client for the embedding service.
func (*Client) Embed ¶
func (c *Client) Embed(texts []string) (*EmbedResponse, error)
Embed generates embeddings for the given texts.
func (*Client) EmbedSingle ¶
EmbedSingle generates an embedding for a single text.
func (*Client) EmbedWithOptions ¶
func (c *Client) EmbedWithOptions(texts []string, normalize bool) (*EmbedResponse, error)
EmbedWithOptions generates embeddings with custom options.
type Dataset ¶
type Dataset struct {
// Name of the dataset (e.g., "sift1m", "glove")
Name string
// Dimension of each vector
Dimension int
// Vectors is the main dataset (base vectors)
Vectors [][]float64
// IDs for each vector (optional, auto-generated if not provided)
IDs []string
// Queries is the query set (for benchmarking)
Queries [][]float64
// GroundTruth contains the true nearest neighbors for each query
// GroundTruth[i] = indices of nearest neighbors for Queries[i]
GroundTruth [][]int
}
Dataset represents a loaded embedding dataset with optional ground truth.
func Generate ¶
Generate creates a synthetic dataset with random vectors. Useful for testing when real datasets aren't available.
func GloVe ¶
GloVe loads GloVe word vectors from a text file. Format: word dim1 dim2 dim3 ... dimN
Download from: https://nlp.stanford.edu/projects/glove/
func SIFT1M ¶
SIFT1M loads the SIFT1M dataset from the given directory. Expected files:
- sift_base.fvecs: 1M base vectors (128-dim)
- sift_query.fvecs: 10K query vectors
- sift_groundtruth.ivecs: Ground truth nearest neighbors
Download from: http://corpus-texmex.irisa.fr/
func SIFT10K ¶
SIFT10K loads a smaller subset for quick testing. Uses the same format as SIFT1M but with 10K vectors.
func (*Dataset) Stats ¶
func (d *Dataset) Stats() DatasetStats
Stats returns statistics about the dataset.
type DatasetStats ¶
type DatasetStats struct {
Name string
NumVectors int
NumQueries int
Dimension int
HasGroundTruth bool
GroundTruthDepth int // Number of ground truth neighbors per query
}
DatasetStats contains summary statistics about a dataset.
type EmbedRequest ¶
EmbedRequest is the request to the embedding service.