vectorstore

package

v1.0.0 Latest Latest Go to latest Published: Mar 31, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/syntrex-lab/gomcp

Links

Open Source Insights

Documentation ¶

Overview ¶

Package vectorstore implements persistent storage for intent vectors (DIP H2.1).

Intent vectors are the output of the Intent Distiller (H0.2). Storing them enables neuroplastic routing — matching new intents against known patterns to determine optimal processing paths.

Features:

In-memory store with capacity management (LRU eviction)
Cosine similarity search for nearest-neighbor matching
Route labels for categorized intent patterns
Pluggable Embedder interface (ONNX, FTS5 fallback)
Thread-safe for concurrent access

Package vectorstore — PolarQuant multi-bit vector compression.

Based on Google's TurboQuant research (ICLR 2026, §3.2). Exploits the key insight: after random orthogonal rotation, vector coordinates become approximately uniformly distributed regardless of the original data distribution. This makes uniform scalar quantization near-optimal without any calibration data.

Pipeline:

Random orthogonal rotation R (data-oblivious, seeded)
y = R · x (rotate)
Uniform quantization: each y_i ∈ [-1, 1] → [0, 2^b - 1]
Compact byte packing (2 values per byte at 4-bit)

Combined with QJL (1-bit approximate search):

QJL signatures: fast approximate filtering (Phase 1)
PolarQuant codes: compressed exact reranking (Phase 2)
Together: TurboQuant = PolarQuant(main bits) + QJL(1-bit residual)

Memory at 4-bit, 128-dim: 64 bytes + 4 bytes radius = 68 bytes vs float64 original: 1024 bytes → 15x compression

Package vectorstore — QJL (Quantized Johnson-Lindenstrauss) 1-bit quantization.

Based on Google's TurboQuant research (ICLR 2026, AAAI 2025). Projects high-dimensional float64 vectors to compact bit signatures via random projection + sign quantization. Enables O(d/64) approximate similarity using POPCNT-accelerated Hamming distance.

Properties:

Data-oblivious: no training, no codebook, no dataset-specific tuning
Deterministic: seeded PRNG → reproducible projections
Zero accuracy loss on ordering for well-separated vectors
32x memory reduction (256-bit signature vs 128-dim float64 vector)

Package vectorstore implements persistent storage for intent vectors (DIP H2.1).

Intent vectors are the output of the Intent Distiller (H0.2). Storing them enables neuroplastic routing — matching new intents against known patterns to determine optimal processing paths.

Features:

In-memory store with capacity management (LRU eviction)
Cosine similarity search for nearest-neighbor matching
QJL 1-bit quantized approximate search (TurboQuant §20)
Route labels for categorized intent patterns
Thread-safe for concurrent access

Index ¶

func CosineSimilarity(a, b []float64) float64
func EstimatedCosineSimilarity(hammingSim float64) float64
func HammingSimilarity(a, b QJLSignature, numBits int) float64
type CompressedVector
type Config
- func DefaultConfig() Config
type Embedder
type FTS5Embedder
- func NewFTS5Embedder() *FTS5Embedder
- func (e *FTS5Embedder) Dimension() int
- func (e *FTS5Embedder) Embed(_ context.Context, text string) ([]float64, error)
- func (e *FTS5Embedder) Mode() OracleMode
- func (e *FTS5Embedder) Name() string
type IntentRecord
type OracleMode
- func (m OracleMode) String() string
type PolarQuantCodec
- func NewPolarQuantCodec(dim, bitsPerDim int, seed int64) *PolarQuantCodec
- func (c *PolarQuantCodec) BitsPerDim() int
- func (c *PolarQuantCodec) CompressedBytes() int
- func (c *PolarQuantCodec) CompressedSimilarity(a, b CompressedVector) float64
- func (c *PolarQuantCodec) CompressionRatio() float64
- func (c *PolarQuantCodec) Decode(cv CompressedVector) []float64
- func (c *PolarQuantCodec) Dim() int
- func (c *PolarQuantCodec) Encode(vector []float64) CompressedVector
type QJLProjection
- func NewQJLProjection(numProjections, vectorDim int, seed int64) *QJLProjection
- func (p *QJLProjection) NumProjections() int
- func (p *QJLProjection) Quantize(vector []float64) QJLSignature
- func (p *QJLProjection) VectorDim() int
type QJLSignature
type SearchResult
type Stats
type Store
- func New(cfg *Config) *Store
- func (s *Store) Add(rec *IntentRecord) string
- func (s *Store) Count() int
- func (s *Store) Get(id string) *IntentRecord
- func (s *Store) GetStats() Stats
- func (s *Store) PQEnabled() bool
- func (s *Store) QJLEnabled() bool
- func (s *Store) Search(vector []float64, k int) []SearchResult
- func (s *Store) SearchByRoute(route string) []*IntentRecord
- func (s *Store) SearchQJL(vector []float64, k int) []SearchResult

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CosineSimilarity ¶

func CosineSimilarity(a, b []float64) float64

CosineSimilarity computes cosine similarity between two vectors. Returns value in [-1, 1], where 1 = identical direction.

func EstimatedCosineSimilarity ¶

func EstimatedCosineSimilarity(hammingSim float64) float64

EstimatedCosineSimilarity converts Hamming similarity to an estimated cosine similarity using the relationship from the JL sign-random-projection theorem: cos(θ) ≈ cos(π * (1 - hamming_similarity)).

This gives a more accurate similarity estimate than raw Hamming for ranking.

func HammingSimilarity ¶

func HammingSimilarity(a, b QJLSignature, numBits int) float64

HammingSimilarity computes normalized Hamming similarity between two QJL signatures. Returns a value in [0, 1] where 1 = all bits match (identical direction), 0.5 = uncorrelated (orthogonal), 0 = all bits differ (opposite direction).

Uses math/bits.OnesCount64 which maps to hardware POPCNT on x86.

Types ¶

type CompressedVector ¶

type CompressedVector struct {
	Data   []byte  // Packed quantized values (2 per byte at 4-bit)
	Radius float32 // Original L2 norm for denormalization
}

CompressedVector holds a PolarQuant-compressed representation of a vector.

type Config ¶

type Config struct {
	Capacity       int   // Max records before LRU eviction. Default: 1000.
	QJLProjections int   // Number of QJL random projections (bits). -1 = disabled. Default: 256.
	QJLSeed        int64 // PRNG seed for reproducible QJL projections. Default: 42.
	QJLVectorDim   int   // Expected vector dimensionality for QJL. Default: 128.
	PQBitsPerDim   int   // PolarQuant bits per dimension (0 = disabled, 4 or 8). Default: 0.
	PQSeed         int64 // PRNG seed for PolarQuant rotation matrix. Default: 7.
	PQDropFloat64  bool  // If true, discard the original float64 vectors to save memory. Default: false.
}

Config configures the vector store.

func DefaultConfig ¶

func DefaultConfig() Config

DefaultConfig returns sensible defaults.

type Embedder ¶

type Embedder interface {
	// Embed computes a vector embedding for the given text.
	// Returns a float64 slice of length Dimension().
	Embed(ctx context.Context, text string) ([]float64, error)

	// Dimension returns the embedding vector dimensionality.
	// MiniLM-L12-v2: 384. FTS5 fallback: len(vocabulary).
	Dimension() int

	// Name returns the embedder identifier (e.g. "onnx:MiniLM-L12-v2", "fts5:fallback").
	Name() string

	// Mode returns the current oracle mode.
	// FULL = neural embeddings, DEGRADED = text-based fallback.
	Mode() OracleMode
}

Embedder generates vector embeddings from text. Implementations: ONNXEmbedder (full), FTS5Embedder (fallback, pure Go).

type FTS5Embedder ¶

type FTS5Embedder struct {
	// contains filtered or unexported fields
}

FTS5Embedder is a pure-Go fallback embedder that uses character n-gram frequency vectors instead of neural embeddings. No external deps required.

Quality is lower than MiniLM but sufficient for basic intent matching. Used when ONNX runtime is not available → [ORACLE: DEGRADED].

func NewFTS5Embedder ¶

func NewFTS5Embedder() *FTS5Embedder

NewFTS5Embedder creates a fallback embedder with character n-grams. Uses tri-grams (n=3) projected to a fixed dimension via hashing.

func (*FTS5Embedder) Dimension ¶

func (e *FTS5Embedder) Dimension() int

Dimension returns the fixed output dimension (128).

func (*FTS5Embedder) Embed ¶

func (e *FTS5Embedder) Embed(_ context.Context, text string) ([]float64, error)

Embed generates a character n-gram frequency vector. Text is lowercased, split into n-grams, each hashed to a bucket.

func (*FTS5Embedder) Mode ¶

func (e *FTS5Embedder) Mode() OracleMode

Mode returns DEGRADED — this is a fallback embedder.

func (*FTS5Embedder) Name ¶

func (e *FTS5Embedder) Name() string

Name returns the embedder identifier.

type IntentRecord ¶

type IntentRecord struct {
	ID             string    `json:"id"`
	Text           string    `json:"text"`            // Original text
	CompressedText string    `json:"compressed_text"` // Distilled form
	Vector         []float64 `json:"vector"`          // Intent embedding vector
	Route          string    `json:"route"`           // Assigned route label
	Verdict        string    `json:"verdict"`         // Oracle verdict (ALLOW/DENY/REVIEW)
	SincerityScore float64   `json:"sincerity_score"`
	Entropy        float64   `json:"entropy"`
	CreatedAt      time.Time `json:"created_at"`
}

IntentRecord stores a distilled intent with metadata.

type OracleMode ¶

type OracleMode int

OracleMode indicates the operational mode of the embedding engine.

const (
	// OracleModeFull indicates neural ONNX embeddings are active.
	OracleModeFull OracleMode = iota
	// OracleModeDegraded indicates fallback text-based search (FTS5/Levenshtein).
	OracleModeDegraded
)

func (OracleMode) String ¶

func (m OracleMode) String() string

String returns human-readable oracle mode.

type PolarQuantCodec ¶

type PolarQuantCodec struct {
	// contains filtered or unexported fields
}

PolarQuantCodec encodes/decodes vectors using rotation + uniform quantization. Thread-safe after construction (read-only rotation matrix).

func NewPolarQuantCodec ¶

func NewPolarQuantCodec(dim, bitsPerDim int, seed int64) *PolarQuantCodec

NewPolarQuantCodec creates a PolarQuant codec with random orthogonal rotation.

Parameters:

dim: expected vector dimensionality (must match embedder output)
bitsPerDim: quantization bits per dimension (1-8). Default: 4. 4-bit → 16x compression, 8-bit → 8x compression.
seed: PRNG seed for reproducible rotation. Same seed → same codec.

func (*PolarQuantCodec) BitsPerDim ¶

func (c *PolarQuantCodec) BitsPerDim() int

BitsPerDim returns the quantization precision.

func (*PolarQuantCodec) CompressedBytes ¶

func (c *PolarQuantCodec) CompressedBytes() int

CompressedBytes returns bytes per compressed vector (excluding radius).

func (*PolarQuantCodec) CompressedSimilarity ¶

func (c *PolarQuantCodec) CompressedSimilarity(a, b CompressedVector) float64

CompressedSimilarity computes approximate cosine similarity between two compressed vectors WITHOUT full decompression. Decompresses to the rotated domain and computes dot product there (rotation preserves inner products).

func (*PolarQuantCodec) CompressionRatio ¶

func (c *PolarQuantCodec) CompressionRatio() float64

CompressionRatio returns the ratio of original to compressed size.

func (*PolarQuantCodec) Decode ¶

func (c *PolarQuantCodec) Decode(cv CompressedVector) []float64

Decode reconstructs a float64 vector from its compressed representation.

Steps:

Unpack quantized values
Dequantize to [-1, 1] midpoints
Inverse rotation: x = R^T · y
Denormalize by radius

func (*PolarQuantCodec) Dim ¶

func (c *PolarQuantCodec) Dim() int

Dim returns the expected vector dimensionality.

func (*PolarQuantCodec) Encode ¶

func (c *PolarQuantCodec) Encode(vector []float64) CompressedVector

Encode compresses a float64 vector to a compact PolarQuant representation.

Steps:

Compute and store L2 norm (radius)
L2-normalize the vector
Rotate through random orthogonal matrix
Uniform quantization of each coordinate
Pack into bytes

type QJLProjection ¶

type QJLProjection struct {
	// contains filtered or unexported fields
}

QJLProjection holds the random projection matrix for QJL quantization. Thread-safe after construction (read-only).

func NewQJLProjection ¶

func NewQJLProjection(numProjections, vectorDim int, seed int64) *QJLProjection

NewQJLProjection creates a random projection matrix for QJL quantization.

Parameters:

numProjections: number of random projections (bits in output signature). Higher = more accurate but more memory. Recommended: 256.
vectorDim: dimensionality of input vectors (must match embedder output).
seed: PRNG seed for reproducibility. Same seed → same projections.

func (*QJLProjection) NumProjections ¶

func (p *QJLProjection) NumProjections() int

NumProjections returns the total number of projection bits.

func (*QJLProjection) Quantize ¶

func (p *QJLProjection) Quantize(vector []float64) QJLSignature

Quantize projects a float64 vector through the random matrix and returns a compact bit-packed QJLSignature. Each bit is the sign of one projection.

Memory: numProjections/64 uint64s (e.g., 256 bits = 4 uint64s = 32 bytes). Compare: 128-dim float64 vector = 1024 bytes → 32x reduction.

func (*QJLProjection) VectorDim ¶

func (p *QJLProjection) VectorDim() int

VectorDim returns the expected input dimensionality.

type QJLSignature ¶

type QJLSignature []uint64

QJLSignature is a bit-packed sign vector produced by QJL quantization. Each uint64 holds 64 sign bits from random projections.

type SearchResult ¶

type SearchResult struct {
	Record     *IntentRecord `json:"record"`
	Similarity float64       `json:"similarity"` // Cosine similarity [0, 1]
}

SearchResult holds a similarity search result.

type Stats ¶

type Stats struct {
	TotalRecords      int            `json:"total_records"`
	Capacity          int            `json:"capacity"`
	RouteCount        map[string]int `json:"route_counts"`
	VerdictCount      map[string]int `json:"verdict_counts"`
	AvgEntropy        float64        `json:"avg_entropy"`
	QJLEnabled        bool           `json:"qjl_enabled"`
	QJLProjections    int            `json:"qjl_projections"`
	QJLBitsPerVec     int            `json:"qjl_bits_per_vector"`
	QJLBytesPerVec    int            `json:"qjl_bytes_per_vector"`
	PQEnabled         bool           `json:"pq_enabled"`
	PQBitsPerDim      int            `json:"pq_bits_per_dim"`
	PQBytesPerVec     int            `json:"pq_bytes_per_vector"`
	PQCompressionRate float64        `json:"pq_compression_ratio"`
	PQDropFloat64     bool           `json:"pq_drop_float64"`
}

Stats holds store statistics.

type Store ¶

type Store struct {
	// contains filtered or unexported fields
}

Store is an in-memory intent vector store with similarity search.

func New ¶

func New(cfg *Config) *Store

New creates a new vector store.

func (*Store) Add ¶

func (s *Store) Add(rec *IntentRecord) string

Add stores an intent record. Returns the assigned ID.

func (*Store) Count ¶

func (s *Store) Count() int

Count returns the total number of records.

func (*Store) Get ¶

func (s *Store) Get(id string) *IntentRecord

Get retrieves a record by ID.

func (*Store) GetStats ¶

func (s *Store) GetStats() Stats

GetStats returns store statistics.

func (*Store) PQEnabled ¶

func (s *Store) PQEnabled() bool

PQEnabled returns whether PolarQuant compressed storage is active.

func (*Store) QJLEnabled ¶

func (s *Store) QJLEnabled() bool

QJLEnabled returns whether QJL quantization is active.

func (*Store) Search ¶

func (s *Store) Search(vector []float64, k int) []SearchResult

Search finds the k most similar records to the given vector.

func (*Store) SearchByRoute ¶

func (s *Store) SearchByRoute(route string) []*IntentRecord

SearchByRoute finds records matching a specific route.

func (*Store) SearchQJL ¶

func (s *Store) SearchQJL(vector []float64, k int) []SearchResult

SearchQJL performs two-phase approximate nearest-neighbor search using QJL.

Phase 1: Score all records via POPCNT Hamming similarity on QJL signatures (O(bits/64) per record). Phase 2: Take top-2k candidates and rerank with exact CosineSimilarity. Returns top-k results with exact cosine similarity scores.

Falls back to brute-force searchLocked() if QJL is not enabled.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL