Documentation
¶
Overview ¶
Package opaque provides privacy-preserving vector search using homomorphic encryption.
Opaque encrypts vectors with AES-256-GCM, scores queries against cluster centroids using CKKS homomorphic encryption (so the server never sees the query), and hides access patterns with decoy bucket fetches.
Quick Start ¶
db, err := opaque.NewDB(opaque.Config{
Dimension: 128,
NumClusters: 64,
})
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Add vectors
db.Add(ctx, "doc-1", vector1)
db.Add(ctx, "doc-2", vector2)
// Build the index (runs k-means clustering + initializes HE engines)
if err := db.Build(ctx); err != nil {
log.Fatal(err)
}
// Search
results, err := db.Search(ctx, queryVector, 10)
Lifecycle ¶
The DB follows a three-phase lifecycle:
- Add vectors with DB.Add or DB.AddBatch
- Build the index with DB.Build (expensive: k-means clustering + HE engine initialization)
- Search with DB.Search (safe for concurrent use)
K-means clustering requires all vectors upfront, so DB.Build must be called after all vectors are added. To add vectors after building, use DB.Add followed by DB.Rebuild.
Index ¶
- Variables
- type ClusterStats
- type Config
- type DB
- func (db *DB) Add(ctx context.Context, id string, vector []float64) error
- func (db *DB) AddBatch(ctx context.Context, ids []string, vectors [][]float64) error
- func (db *DB) AddBatchWithMetadata(ctx context.Context, ids []string, vectors [][]float64, metadatas []Metadata) error
- func (db *DB) AddWithMetadata(ctx context.Context, id string, vector []float64, meta Metadata) error
- func (db *DB) Build(ctx context.Context) error
- func (db *DB) Close() error
- func (db *DB) ClusterStats() ClusterStats
- func (db *DB) Count(ctx context.Context) int
- func (db *DB) Delete(ctx context.Context, id string) error
- func (db *DB) Get(ctx context.Context, id string) ([]float64, error)
- func (db *DB) GetConfig() Config
- func (db *DB) GetMetadata(ctx context.Context, id string) (Metadata, error)
- func (db *DB) Has(ctx context.Context, id string) bool
- func (db *DB) IsReady() bool
- func (db *DB) List(ctx context.Context, offset, limit int) ([]string, error)
- func (db *DB) Rebuild(ctx context.Context) error
- func (db *DB) Save(path string) error
- func (db *DB) Search(ctx context.Context, query []float64, topK int) ([]Result, error)
- func (db *DB) SearchWithFilter(ctx context.Context, query []float64, topK int, filter Filter) ([]Result, error)
- func (db *DB) Size() int
- func (db *DB) Stats(ctx context.Context) DBStats
- func (db *DB) Update(ctx context.Context, id string, vector []float64) error
- type DBStats
- type Filter
- type Metadata
- type Result
- type StorageBackend
Constants ¶
This section is empty.
Variables ¶
var ( // ErrNotBuilt is returned when Search is called before Build. ErrNotBuilt = errors.New("opaque: index not built") // ErrAlreadyBuilt is returned when Add/AddBatch is called after Build. ErrAlreadyBuilt = errors.New("opaque: index already built; use Rebuild to add vectors") // ErrDimensionMismatch is returned when a vector has the wrong dimension. ErrDimensionMismatch = errors.New("opaque: dimension mismatch") // ErrNotFound is returned when a vector ID is not found. ErrNotFound = errors.New("opaque: vector not found") // ErrEmptyID is returned when an empty vector ID is provided. ErrEmptyID = errors.New("opaque: empty vector ID") // ErrNoVectors is returned when Build is called with no buffered vectors. ErrNoVectors = errors.New("opaque: no vectors added") // ErrNotReady is returned when an operation requires a built index but // the DB is not in the ready state (e.g., Save before Build). ErrNotReady = errors.New("opaque: database not ready") // ErrClosed is returned when an operation is attempted on a closed DB. ErrClosed = errors.New("opaque: database is closed") )
Sentinel errors for programmatic error handling. Use errors.Is to check:
if errors.Is(err, opaque.ErrNotBuilt) { ... }
Functions ¶
This section is empty.
Types ¶
type ClusterStats ¶
type ClusterStats struct {
NumClusters int // Number of clusters
MinSize int // Smallest cluster size
MaxSize int // Largest cluster size
AvgSize float64 // Average cluster size
EmptyClusters int // Number of empty clusters (should be 0)
Iterations int // K-means iterations until convergence
}
ClusterStats contains statistics about k-means clustering quality.
type Config ¶
type Config struct {
// Dimension is the length of each vector. Required.
// All vectors added to the DB must have exactly this many elements.
Dimension int
// NumClusters is the number of k-means clusters used to partition vectors.
// More clusters means faster search (fewer vectors per cluster) but weaker
// privacy (smaller anonymity sets per cluster). Must be >= 2.
// Default: 64.
NumClusters int
// TopClusters is the number of clusters probed during each search.
// Higher values improve recall at the cost of more computation, bandwidth,
// and weaker access pattern privacy (more clusters probed = easier to infer intent).
// Must be <= NumClusters.
// Default: max(NumClusters / 16, 4). For 64 clusters this is 4 (~6% of data).
TopClusters int
// NumDecoys is the number of extra clusters fetched per search to hide
// which clusters are actually relevant. Higher values provide better
// access pattern privacy at the cost of additional bandwidth.
// Default: 8.
NumDecoys int
// WorkerPoolSize is the number of parallel CKKS homomorphic encryption engines.
// Each engine consumes ~50MB of memory but enables parallel centroid scoring.
// Set to 0 for automatic sizing (min(NumCPU, 8)).
// Default: 0 (automatic).
WorkerPoolSize int
// Storage selects the backend for encrypted blob storage.
// Default: Memory.
Storage StorageBackend
// StoragePath is the directory for file-backed storage.
// Required when Storage is [File], ignored otherwise.
StoragePath string
// ProbeThreshold controls multi-probe cluster selection during search.
// Clusters scoring within this fraction of the top cluster score are also probed,
// beyond the TopClusters limit. For example, 0.95 means clusters within 5% of
// the best score are included.
// Set to 1.0 to disable multi-probe (strict top-K only).
// Default: 0.95.
ProbeThreshold float64
// RedundantAssignments assigns each vector to multiple clusters during indexing.
// Improves recall for vectors near cluster boundaries at the cost of increased storage.
// A value of 2 means each vector is stored in its 2 nearest clusters.
// Default: 1 (no redundancy).
RedundantAssignments int
// PCADimension enables optional PCA dimensionality reduction.
// When set to a positive value, vectors are projected to this dimension
// before clustering and encryption, reducing latency and bandwidth.
// The PCA transform is applied client-side, so it has no privacy impact.
// Must be less than Dimension. Set to 0 to disable (default).
// Default: 0 (disabled).
PCADimension int
// NumKMeansInit is the number of k-means clustering initializations to run.
// Multiple runs with different seeds are executed in parallel, and the result
// with the lowest inertia (best cluster quality) is kept.
// Higher values improve cluster quality at the cost of more CPU during Build.
// Default: 1 (single initialization).
NumKMeansInit int
// NormalizedStorage stores vectors pre-normalized during Build, skipping
// per-vector normalization during Search. This reduces local scoring latency
// by 10-15%. Stored vectors lose original magnitudes (direction is preserved).
// Default: true for new databases.
NormalizedStorage *bool
// ProbeStrategy selects the cluster probing method during search.
// "threshold" (default) uses ProbeThreshold ratio to include nearby clusters.
// "gap" uses adaptive score-gap detection to find natural breaks in the score distribution.
// Default: "" (uses "threshold").
ProbeStrategy string
// GapMultiplier controls gap-based probing sensitivity when ProbeStrategy is "gap".
// Expansion stops when the gap between consecutive cluster scores exceeds
// GapMultiplier times the median gap. Lower values probe fewer clusters.
// Default: 2.0.
GapMultiplier float64
// OnBuildProgress is called during [DB.Build] and [DB.Rebuild] to report progress.
// The phase parameter identifies the current step, and pct is a value between 0 and 1
// indicating completion within that phase.
//
// Phases reported (in order):
// - "pca": PCA fitting and dimensionality reduction (only if PCADimension > 0)
// - "clustering": k-means clustering of vectors
// - "encrypting": AES-256-GCM encryption of vectors
// - "indexing": blob storage and HE engine initialization
//
// The callback is invoked synchronously from the Build goroutine. Keep it fast.
// A nil callback disables progress reporting (default).
OnBuildProgress func(phase string, pct float64) `json:"-"`
}
Config controls the behavior of a DB instance.
Only [Config.Dimension] is required. All other fields have sensible defaults.
type DB ¶
type DB struct {
// contains filtered or unexported fields
}
DB is a privacy-preserving vector search database.
It encrypts stored vectors with AES-256-GCM, scores queries against cluster centroids using CKKS homomorphic encryption, and fetches decoy clusters to hide access patterns.
A DB must be built before searching. After DB.Build completes, DB.Search is safe for concurrent use from multiple goroutines.
func Load ¶
Load restores a DB from a directory previously created by DB.Save.
The returned DB is immediately ready for DB.Search — no Build is needed. The blob store is opened in file mode from the saved directory.
To add new vectors after loading, use DB.Add followed by DB.Rebuild.
func NewDB ¶
NewDB creates a new vector search database with the given configuration.
Only [Config.Dimension] is required; all other fields use sensible defaults if zero. No expensive initialization happens here — the heavy work is deferred to DB.Build.
func (*DB) Add ¶
Add buffers a single vector for indexing. The id must be unique within the DB.
Before DB.Build, vectors are buffered for the initial index build. After Build, vectors are buffered for the next DB.Rebuild.
The vector is copied internally, so the caller may modify the slice after Add returns.
func (*DB) AddBatch ¶
AddBatch buffers multiple vectors for indexing. The ids and vectors slices must have the same length. Each vector must have exactly [Config.Dimension] elements.
This is equivalent to calling DB.Add for each vector, but acquires the lock once.
func (*DB) AddBatchWithMetadata ¶
func (db *DB) AddBatchWithMetadata(ctx context.Context, ids []string, vectors [][]float64, metadatas []Metadata) error
AddBatchWithMetadata buffers multiple vectors with associated metadata. The metadatas slice must have the same length as ids and vectors. Use nil for vectors without metadata.
func (*DB) AddWithMetadata ¶
func (db *DB) AddWithMetadata(ctx context.Context, id string, vector []float64, meta Metadata) error
AddWithMetadata buffers a vector with associated metadata for indexing.
Metadata is encrypted alongside the vector and can be used for filtered search with DB.SearchWithFilter. The id must be unique within the DB.
Both the vector and metadata are copied internally.
func (*DB) Build ¶
Build creates the search index from all buffered vectors.
This is the most expensive operation in the lifecycle:
- Runs k-means clustering to partition vectors into clusters
- Encrypts each vector with AES-256-GCM
- Initializes the CKKS homomorphic encryption engine pool
- Pre-encodes cluster centroids as HE plaintexts
After Build returns successfully, DB.Search is ready for use. Build must only be called once; use DB.Rebuild to re-index after adding new vectors.
func (*DB) Close ¶
Close releases all resources held by the DB, including the blob store and HE engine pool. The DB must not be used after Close is called.
func (*DB) ClusterStats ¶
func (db *DB) ClusterStats() ClusterStats
ClusterStats returns statistics about the k-means clustering from the most recent Build. Returns a zero value if the index has not been built yet.
func (*DB) Count ¶
Count returns the number of indexed vectors (in the built index only). Returns 0 if the index has not been built.
func (*DB) Delete ¶
Delete soft-deletes a vector by ID. The vector is excluded from future DB.Search results immediately. The underlying storage is reclaimed on the next DB.Rebuild.
Returns ErrEmptyID if the ID is empty, or ErrNotFound if the ID does not exist in either the pending vectors or the built index.
Delete is safe for concurrent use with DB.Search.
func (*DB) Get ¶
Get retrieves a vector by ID, decrypting it from the blob store.
Returns ErrNotReady if the index has not been built, or ErrNotFound if no vector with the given ID exists. Get is safe for concurrent use.
func (*DB) GetMetadata ¶
GetMetadata retrieves the metadata for a vector by ID.
Returns nil if the vector has no metadata, or ErrNotFound if the ID does not exist.
func (*DB) Has ¶
Has reports whether a vector with the given ID exists in the DB.
It checks both the built index (blob store) and pending vectors. Has is safe for concurrent use.
func (*DB) IsReady ¶
IsReady reports whether the index has been built and the DB is ready for search.
func (*DB) List ¶
List returns a paginated slice of vector IDs from the built index.
IDs are returned in sorted order. offset and limit control pagination. Returns ErrNotReady if the index has not been built.
func (*DB) Rebuild ¶
Rebuild re-indexes all vectors including any added since the last Build.
This performs a full rebuild: the old index is discarded and a new one is created from all accumulated vectors. Use this after adding vectors to a built DB:
db.Build(ctx) // initial build // ... later ... db.Rebuild(ctx) // add pending vectors, rebuild from scratch
Rebuild is not safe for concurrent use with Search.
func (*DB) Save ¶
Save persists a built DB to the given directory path.
The directory must not already contain a saved DB (no metadata.json). After Save, the DB can be restored with Load in a new process.
Save is safe for concurrent use with DB.Search — it acquires a read lock.
func (*DB) Search ¶
Search returns the topK most similar vectors to the query.
Results are sorted by descending cosine similarity score. The query vector must have exactly [Config.Dimension] elements.
Search uses SIMD-optimized batch HE operations internally for best performance. It is safe for concurrent use from multiple goroutines after DB.Build completes.
func (*DB) SearchWithFilter ¶
func (db *DB) SearchWithFilter(ctx context.Context, query []float64, topK int, filter Filter) ([]Result, error)
SearchWithFilter returns the topK most similar vectors matching the filter.
This runs a normal search and then post-filters results by metadata. Filtered-out results are not replaced, so fewer than topK results may be returned. For better recall with filters, increase topK.
All conditions in [Filter.Where] must match (AND logic). Matching uses exact equality for string, int, float64, and bool values.
func (*DB) Update ¶
Update replaces a vector's data. This is equivalent to DB.Delete followed by DB.Add — the old vector is soft-deleted and the new one is buffered for the next DB.Rebuild.
The updated vector takes effect in search results after Rebuild. Until then, the old vector is excluded from search (soft-deleted) and the new one is pending.
Returns ErrEmptyID if the ID is empty, ErrNotFound if the ID does not exist, or ErrDimensionMismatch if the vector has the wrong length.
type DBStats ¶
type DBStats struct {
// TotalVectors is the total number of vectors (pending + indexed).
TotalVectors int
// IndexedVectors is the number of vectors in the built index.
// Zero if the index has not been built.
IndexedVectors int
// PendingVectors is the number of vectors buffered but not yet indexed.
PendingVectors int
// ClusterStats contains k-means clustering statistics (zero if not built).
ClusterStats ClusterStats
// StorageBackend is the storage backend in use.
StorageBackend StorageBackend
// HasPCA is true if PCA dimensionality reduction is enabled.
HasPCA bool
// IsReady is true if the index is built and ready for search.
IsReady bool
}
DBStats contains aggregate statistics about the database.
type Filter ¶
type Filter struct {
// Where contains exact-match conditions. A result must match ALL conditions.
// Supported value types: string, int, float64, bool.
Where map[string]any
}
Filter specifies criteria for filtered search.
type Metadata ¶
Metadata is a map of key-value pairs attached to a vector. Keys are strings; values can be string, int, float64, or bool.
Metadata is stored encrypted alongside vectors and can be used for filtered search via DB.SearchWithFilter.
type Result ¶
type Result struct {
// ID is the identifier passed to [DB.Add] when the vector was indexed.
ID string
// Score is the cosine similarity between the query and this vector.
// Higher is more similar. Range: [-1, 1] for normalized vectors.
Score float64
}
Result is a single search result containing the vector ID and its similarity score.
type StorageBackend ¶
type StorageBackend int
StorageBackend selects where encrypted vector blobs are stored.
const ( // Memory stores all data in RAM. Fast but not persistent across restarts. Memory StorageBackend = iota // File stores encrypted blobs on disk at the path specified by [Config.StoragePath]. // Persistent across restarts, slower than memory for large datasets. File )
Directories
¶
| Path | Synopsis |
|---|---|
|
api
|
|
|
cmd
|
|
|
cli
command
Command cli provides a command-line interface for testing Opaque locally.
|
Command cli provides a command-line interface for testing Opaque locally. |
|
devserver
command
Development server for local testing of the privacy-preserving vector search.
|
Development server for local testing of the privacy-preserving vector search. |
|
search-service
command
Command search-service runs the Opaque search gRPC server.
|
Command search-service runs the Opaque search gRPC server. |
|
examples
|
|
|
basic
command
Example sdk-basic demonstrates the simplest Opaque workflow: create a DB, add vectors, build the index, and search.
|
Example sdk-basic demonstrates the simplest Opaque workflow: create a DB, add vectors, build the index, and search. |
|
file-storage
command
Example sdk-file-storage demonstrates using file-backed storage instead of in-memory storage.
|
Example sdk-file-storage demonstrates using file-backed storage instead of in-memory storage. |
|
http-server
command
Example http-server wraps Opaque in a lightweight HTTP API, demonstrating a realistic self-hosted deployment pattern.
|
Example http-server wraps Opaque in a lightweight HTTP API, demonstrating a realistic self-hosted deployment pattern. |
|
large-scale
command
Example sdk-large-scale demonstrates tuning Opaque for larger datasets.
|
Example sdk-large-scale demonstrates tuning Opaque for larger datasets. |
|
metadata
command
Example sdk-metadata demonstrates adding metadata to vectors and using filtered search to narrow results by metadata fields.
|
Example sdk-metadata demonstrates adding metadata to vectors and using filtered search to narrow results by metadata fields. |
|
persistence
command
Example sdk-persistence demonstrates saving a built index to disk and loading it back in a new process.
|
Example sdk-persistence demonstrates saving a built index to disk and loading it back in a new process. |
|
internal
|
|
|
service
Package service implements the Opaque search service.
|
Package service implements the Opaque search service. |
|
session
Package session provides session management for client keys.
|
Package session provides session management for client keys. |
|
store
Package store provides vector storage backends.
|
Package store provides vector storage backends. |
|
pkg
|
|
|
auth
Package auth provides token-based authentication and key distribution for Tier 2.5 hierarchical private search (Option B).
|
Package auth provides token-based authentication and key distribution for Tier 2.5 hierarchical private search (Option B). |
|
blob
Package blob provides encrypted blob storage for Tier 2 data-private search.
|
Package blob provides encrypted blob storage for Tier 2 data-private search. |
|
cache
Package cache provides caching for expensive HE operations.
|
Package cache provides caching for expensive HE operations. |
|
client
Package client provides the Opaque SDK for privacy-preserving search.
|
Package client provides the Opaque SDK for privacy-preserving search. |
|
cluster
Package cluster provides clustering algorithms for vector indexing.
|
Package cluster provides clustering algorithms for vector indexing. |
|
crypto
Package crypto provides homomorphic encryption operations using Lattigo CKKS scheme.
|
Package crypto provides homomorphic encryption operations using Lattigo CKKS scheme. |
|
embeddings
Package embeddings provides a client for the local embedding service.
|
Package embeddings provides a client for the local embedding service. |
|
encrypt
Package encrypt provides symmetric encryption for Tier 2 data-private storage.
|
Package encrypt provides symmetric encryption for Tier 2 data-private storage. |
|
enterprise
Package enterprise provides per-enterprise configuration and secret management for Tier 2.5 hierarchical private search.
|
Package enterprise provides per-enterprise configuration and secret management for Tier 2.5 hierarchical private search. |
|
grpcserver
Package grpcserver implements the gRPC service for privacy-preserving vector search.
|
Package grpcserver implements the gRPC service for privacy-preserving vector search. |
|
hierarchical
Package hierarchical implements a three-level privacy-preserving vector search.
|
Package hierarchical implements a three-level privacy-preserving vector search. |
|
lsh
Package lsh provides locality-sensitive hashing for approximate nearest neighbor search.
|
Package lsh provides locality-sensitive hashing for approximate nearest neighbor search. |
|
pca
Package pca provides Principal Component Analysis for dimensionality reduction.
|
Package pca provides Principal Component Analysis for dimensionality reduction. |
|
server
Package server provides the REST API server for privacy-preserving vector search.
|
Package server provides the REST API server for privacy-preserving vector search. |