embeddings

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package embeddings provides a client for the local embedding service.

Package embeddings provides utilities for loading pre-computed embedding datasets like SIFT1M, GloVe, etc. for benchmarking the private vector search system.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func LoadBvecs

func LoadBvecs(path string) ([][]float64, error)

LoadBvecs loads byte vectors from a .bvecs file (used by some datasets).

BVECS format: For each vector:

  • 4 bytes: dimension (int32, little-endian)
  • dimension bytes: uint8 values

func LoadFvecs

func LoadFvecs(path string) ([][]float64, error)

LoadFvecs loads vectors from a .fvecs file (used by SIFT1M, etc.)

FVECS format: For each vector:

  • 4 bytes: dimension (int32, little-endian)
  • dimension * 4 bytes: float32 values (little-endian)

All vectors must have the same dimension.

func LoadIvecs

func LoadIvecs(path string) ([][]int, error)

LoadIvecs loads integer vectors from a .ivecs file (used for ground truth).

IVECS format (same structure as FVECS but with int32 values): For each vector:

  • 4 bytes: dimension (int32, little-endian)
  • dimension * 4 bytes: int32 values (little-endian)

func ReadBvecs

func ReadBvecs(r io.Reader) ([][]float64, error)

ReadBvecs reads byte vectors from an io.Reader in BVECS format.

func ReadFvecs

func ReadFvecs(r io.Reader) ([][]float64, error)

ReadFvecs reads vectors from an io.Reader in FVECS format.

func ReadIvecs

func ReadIvecs(r io.Reader) ([][]int, error)

ReadIvecs reads integer vectors from an io.Reader in IVECS format.

func SaveFvecs

func SaveFvecs(path string, vectors [][]float64) error

SaveFvecs saves vectors to a .fvecs file.

func WriteFvecs

func WriteFvecs(w io.Writer, vectors [][]float64) error

WriteFvecs writes vectors to an io.Writer in FVECS format.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is a client for the embedding service.

func NewClient

func NewClient(cfg Config) *Client

NewClient creates a new embedding client.

func (*Client) Embed

func (c *Client) Embed(texts []string) (*EmbedResponse, error)

Embed generates embeddings for the given texts.

func (*Client) EmbedSingle

func (c *Client) EmbedSingle(text string) ([]float64, error)

EmbedSingle generates an embedding for a single text.

func (*Client) EmbedWithOptions

func (c *Client) EmbedWithOptions(texts []string, normalize bool) (*EmbedResponse, error)

EmbedWithOptions generates embeddings with custom options.

func (*Client) Health

func (c *Client) Health() error

Health checks if the embedding service is healthy.

type Config

type Config struct {
	BaseURL string
	Timeout time.Duration
}

Config holds the client configuration.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns the default configuration.

type Dataset

type Dataset struct {
	// Name of the dataset (e.g., "sift1m", "glove")
	Name string

	// Dimension of each vector
	Dimension int

	// Vectors is the main dataset (base vectors)
	Vectors [][]float64

	// IDs for each vector (optional, auto-generated if not provided)
	IDs []string

	// Queries is the query set (for benchmarking)
	Queries [][]float64

	// GroundTruth contains the true nearest neighbors for each query
	// GroundTruth[i] = indices of nearest neighbors for Queries[i]
	GroundTruth [][]int
}

Dataset represents a loaded embedding dataset with optional ground truth.

func FromFvecs

func FromFvecs(path string, name string) (*Dataset, error)

FromFvecs loads a dataset from a single .fvecs file. No queries or ground truth.

func Generate

func Generate(n, dim int, seed int64) *Dataset

Generate creates a synthetic dataset with random vectors. Useful for testing when real datasets aren't available.

func GloVe

func GloVe(path string) (*Dataset, error)

GloVe loads GloVe word vectors from a text file. Format: word dim1 dim2 dim3 ... dimN

Download from: https://nlp.stanford.edu/projects/glove/

func SIFT1M

func SIFT1M(dir string) (*Dataset, error)

SIFT1M loads the SIFT1M dataset from the given directory. Expected files:

  • sift_base.fvecs: 1M base vectors (128-dim)
  • sift_query.fvecs: 10K query vectors
  • sift_groundtruth.ivecs: Ground truth nearest neighbors

Download from: http://corpus-texmex.irisa.fr/

func SIFT10K

func SIFT10K(dir string) (*Dataset, error)

SIFT10K loads a smaller subset for quick testing. Uses the same format as SIFT1M but with 10K vectors.

func (*Dataset) Stats

func (d *Dataset) Stats() DatasetStats

Stats returns statistics about the dataset.

func (*Dataset) Subset

func (d *Dataset) Subset(n int) *Dataset

Subset returns a subset of the dataset with the first n vectors. Useful for testing with smaller datasets.

type DatasetStats

type DatasetStats struct {
	Name             string
	NumVectors       int
	NumQueries       int
	Dimension        int
	HasGroundTruth   bool
	GroundTruthDepth int // Number of ground truth neighbors per query
}

DatasetStats contains summary statistics about a dataset.

type EmbedRequest

type EmbedRequest struct {
	Texts     []string `json:"texts"`
	Normalize bool     `json:"normalize"`
}

EmbedRequest is the request to the embedding service.

type EmbedResponse

type EmbedResponse struct {
	Embeddings [][]float64 `json:"embeddings"`
	Dimension  int         `json:"dimension"`
	Model      string      `json:"model"`
	LatencyMs  float64     `json:"latency_ms"`
}

EmbedResponse is the response from the embedding service.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL