Documentation
¶
Overview ¶
Package embedding provides text embedding infrastructure for VaultMind.
Index ¶
- Constants
- func Acceleration() string
- func BackendName() string
- func ColBERTHead(hiddenStates [][]float32, weights [][]float32, bias []float32) [][]float32
- func DefaultCacheDir() string
- func DefaultModel() string
- func DenseHead(hiddenStates [][]float32) []float32
- func DownloadBGEM3(cacheDir string) (string, error)
- func L2Normalize(vec []float32) []float32
- func LoadLinearWeights(path string) (weight [][]float32, bias []float32, err error)
- func MaxSimScore(queryTokens, docTokens [][]float32) float64
- func SparseDotProduct(a, b map[int32]float32) float64
- func SparseHead(hiddenStates [][]float32, tokenIDs, specialMask []uint32, weights []float32, ...) map[int32]float32
- func TruncateForEmbedding(text string, maxTokens int) string
- type BGEM3Embedder
- func (e *BGEM3Embedder) Close() error
- func (e *BGEM3Embedder) Dims() int
- func (e *BGEM3Embedder) Embed(ctx context.Context, text string) ([]float32, error)
- func (e *BGEM3Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
- func (e *BGEM3Embedder) EmbedColBERT(ctx context.Context, text string) ([][]float32, error)
- func (e *BGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)
- func (e *BGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)
- func (e *BGEM3Embedder) EmbedSparse(ctx context.Context, text string) (map[int32]float32, error)
- type BGEM3Output
- type Embedder
- type FullEmbedder
- type HugotConfig
- type HugotEmbedder
- type SidecarBGEM3Config
- type SidecarBGEM3Embedder
- func (e *SidecarBGEM3Embedder) Close() error
- func (e *SidecarBGEM3Embedder) Device() string
- func (e *SidecarBGEM3Embedder) Dims() int
- func (e *SidecarBGEM3Embedder) Embed(ctx context.Context, text string) ([]float32, error)
- func (e *SidecarBGEM3Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
- func (e *SidecarBGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)
- func (e *SidecarBGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)
Constants ¶
const ( DefaultModelName = "sentence-transformers/all-MiniLM-L6-v2" DefaultDims = 384 DefaultMaxTokens = 510 // MiniLM max is 512 minus 2 for CLS/SEP tokens DefaultOnnxFilePath = "onnx/model.onnx" )
Default model configuration for the all-MiniLM-L6-v2 embedder.
const ( BGEM3ModelName = "BAAI/bge-m3" BGEM3Dims = 1024 BGEM3MaxTokens = 8190 // 8192 minus 2 for CLS/SEP BGEM3OnnxFilePath = "onnx/model.onnx" )
BGE-M3 model configuration.
Variables ¶
This section is empty.
Functions ¶
func Acceleration ¶
func Acceleration() string
Acceleration mirrors the ORT-build's Acceleration() so callers don't need to special-case build tags. Pure-Go has no GPU path; "go-cpu" names the slow path explicitly.
func BackendName ¶
func BackendName() string
BackendName identifies which hugot backend the binary was built against. Consumers (e.g. the index command) use this to warn when BGE-M3 indexing is about to run on the slow pure-Go path so operators don't mistake "hours-long indexing" for a hang or OOM. Reported by the build tag.
func ColBERTHead ¶
ColBERTHead projects each non-CLS token through a linear layer and L2-normalizes. Input: hiddenStates[seq_len][dims], weights[out_dims][in_dims], bias[out_dims]. Output: [seq_len-1][out_dims] (CLS at index 0 is skipped).
func DefaultCacheDir ¶
func DefaultCacheDir() string
DefaultCacheDir returns the default model cache directory (~/.vaultmind/models).
func DefaultModel ¶
func DefaultModel() string
DefaultModel returns the embedding model to use when the operator hasn't picked one explicitly. Adapts to the backend the binary was built against:
- ORT-tagged binaries → "bge-m3" (4-way hybrid retrieval — fast on this build path; what the README's retrieval description is built around).
- Pure-Go binaries → "minilm" (BGE-M3 indexing on pure-Go takes hours per medium vault; minilm is the always-fast baseline).
The default is conservative: it never picks a model the binary can't run reasonably. Users who want minilm on an ORT binary (e.g. for fast re-indexing during development) can pass --model minilm explicitly. Users who want bge-m3 on a pure-Go binary can opt in via --model bge-m3 + --allow-slow-backend.
The 2026-05-05 dogfood surfaced this gap: the prior hardcoded "minilm" default contradicted the system's own framing. A user running `vaultmind index --embed` on an ORT-capable build silently got MiniLM-only embeddings, learning about it only from doctor's post-hoc warning. The runtime-aware default closes that gap by matching the model to what the binary can actually run well.
func DenseHead ¶
DenseHead extracts the CLS token embedding (index 0) and L2-normalizes it. Input: hiddenStates[seq_len][dims]. Output: [dims] unit vector.
func DownloadBGEM3 ¶
DownloadBGEM3 downloads BGE-M3 model files from HuggingFace if not already cached. Returns the path to the model directory.
func L2Normalize ¶
L2Normalize returns a unit vector. Returns zero vector if magnitude is zero.
func LoadLinearWeights ¶
LoadLinearWeights loads a PyTorch nn.Linear layer's weight and bias from a .pt file. Returns weight as [out_features][in_features] and bias as [out_features]. The .pt file must be a state_dict saved via torch.save(state_dict, path).
func MaxSimScore ¶
MaxSimScore computes the ColBERT MaxSim score between query and document token matrices. For each query token, finds max similarity across all doc tokens, then sums. Assumes both query and doc tokens are L2-normalized (from ColBERTHead), so dot product = cosine.
func SparseDotProduct ¶
SparseDotProduct computes the dot product between two sparse vectors. Only overlapping keys contribute.
func SparseHead ¶
func SparseHead(hiddenStates [][]float32, tokenIDs, specialMask []uint32, weights []float32, bias float32) map[int32]float32
SparseHead computes learned lexical weights per token. For each non-special token: weight = ReLU(dot(hidden, w) + bias). Weights scattered to vocabulary positions via tokenIDs. Duplicate token IDs keep the maximum weight.
func TruncateForEmbedding ¶
TruncateForEmbedding truncates text to fit within the model's token limit. Uses a character-based approximation (2 chars/token, empirically derived). Breaks at word boundaries when possible.
Tail loss: content beyond maxTokens × 2 chars is dropped before tokenization — the head is embedded correctly but the tail is invisible to semantic retrieval (lexical FTS still sees the full body). For long-form notes where the tail carries information not in the head, this under-covers retrieval. Tracked as a quality improvement in vaultmind#30 (chunk-and-pool); not a silent failure or robustness bug, just a coverage limit. Build the chunking fix when retrieval visibly misses tail content; don't preempt.
Types ¶
type BGEM3Embedder ¶
type BGEM3Embedder struct {
// contains filtered or unexported fields
}
BGEM3Embedder produces dense, sparse, and ColBERT embeddings using BGE-M3.
func NewBGEM3Embedder ¶
func NewBGEM3Embedder(cfg HugotConfig) (*BGEM3Embedder, error)
NewBGEM3Embedder creates a BGE-M3 embedder with all three heads.
func (*BGEM3Embedder) Close ¶
func (e *BGEM3Embedder) Close() error
Close releases the hugot session.
func (*BGEM3Embedder) Dims ¶
func (e *BGEM3Embedder) Dims() int
Dims returns the embedding dimensionality (1024).
func (*BGEM3Embedder) EmbedBatch ¶
EmbedBatch returns dense embeddings (Embedder interface compatibility).
func (*BGEM3Embedder) EmbedColBERT ¶
EmbedColBERT produces only the ColBERT per-token embeddings (used by ColBERTRetriever).
func (*BGEM3Embedder) EmbedFull ¶
func (e *BGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)
EmbedFull produces all three embedding types for a single text.
func (*BGEM3Embedder) EmbedFullBatch ¶
func (e *BGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)
EmbedFullBatch produces all three embedding types for multiple texts. Bypasses hugot's Postprocess to access raw per-token hidden states.
func (*BGEM3Embedder) EmbedSparse ¶
EmbedSparse produces only the sparse embedding (used by SparseRetriever).
type BGEM3Output ¶
type BGEM3Output struct {
Dense []float32 // [1024] CLS-pooled, L2-normalized
Sparse map[int32]float32 // vocab_id -> weight (non-zero only)
ColBERT [][]float32 // [seq_len-1][1024] per-token, L2-normalized
}
BGEM3Output contains all three embedding types from a BGE-M3 forward pass.
type Embedder ¶
type Embedder interface {
// Embed produces a single embedding vector for the given text.
Embed(ctx context.Context, text string) ([]float32, error)
// EmbedBatch produces embedding vectors for multiple texts.
EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
// Dims returns the dimensionality of the embedding vectors.
Dims() int
// Close releases resources (model session, etc.).
Close() error
}
Embedder converts text into dense vector representations.
type FullEmbedder ¶
type FullEmbedder interface {
Embedder
EmbedFullBatch(ctx context.Context, texts []string) ([]*BGEM3Output, error)
}
FullEmbedder extends Embedder with multi-output capability (BGE-M3).
type HugotConfig ¶
type HugotConfig struct {
// ModelPath is the local path to the ONNX model directory.
// If empty, the model will be downloaded from HuggingFace.
ModelPath string
// ModelName is the HuggingFace model ID (e.g., "sentence-transformers/all-MiniLM-L6-v2").
// Used for downloading if ModelPath is not set.
ModelName string
// CacheDir is where downloaded models are stored.
CacheDir string
// Dims is the embedding dimensionality (e.g., 384 for MiniLM, 1024 for BGE-M3).
Dims int
// OnnxFilePath specifies which ONNX file to use when a model has multiple variants.
// E.g., "onnx/model.onnx" for the default, "onnx/model_O2.onnx" for optimized.
OnnxFilePath string
// MaxTokens is the model's context window size. Texts longer than this (in approximate
// tokens) are truncated before embedding. 0 means no truncation.
MaxTokens int
}
HugotConfig configures the HugotEmbedder.
func DefaultHugotConfig ¶
func DefaultHugotConfig() HugotConfig
DefaultHugotConfig returns the standard HugotConfig for all-MiniLM-L6-v2.
type HugotEmbedder ¶
type HugotEmbedder struct {
// contains filtered or unexported fields
}
HugotEmbedder wraps the hugot library to produce embeddings using ONNX models.
func NewHugotEmbedder ¶
func NewHugotEmbedder(cfg HugotConfig) (*HugotEmbedder, error)
NewHugotEmbedder creates an embedder using hugot with the Go backend. For ORT backend (faster, supports larger models), build with -tags ORT.
func (*HugotEmbedder) Close ¶
func (e *HugotEmbedder) Close() error
Close releases the hugot session.
func (*HugotEmbedder) Dims ¶
func (e *HugotEmbedder) Dims() int
Dims returns the dimensionality of the embedding vectors.
func (*HugotEmbedder) EmbedBatch ¶
EmbedBatch produces embedding vectors for multiple texts. Texts exceeding the model's token limit are truncated automatically.
type SidecarBGEM3Config ¶
type SidecarBGEM3Config struct {
// Python is the interpreter path. Must have torch + transformers
// installed and be on a platform where torch.backends.mps.is_available()
// returns true (Apple Silicon). When empty, falls back to "python3" on
// PATH.
Python string
// ScriptPath is the absolute path to embed_server.py. When empty, the
// embedder looks for the script alongside this Go file (resolved via
// the project's $CLAUDE_PROJECT_DIR or the executable's directory).
ScriptPath string
}
SidecarBGEM3Config controls how the sidecar process is launched.
type SidecarBGEM3Embedder ¶
type SidecarBGEM3Embedder struct {
// contains filtered or unexported fields
}
SidecarBGEM3Embedder runs BGE-M3 inference in an external Python process that uses PyTorch + MPS (Apple Silicon GPU). The Go side handles tokenization context (none — Python sidecar tokenizes with HF tokenizer loaded from cache) and the heads run inside the sidecar so the per-modality tensors flow through MPS without round-tripping to CPU mid-batch.
Why a sidecar instead of in-process: in-process ORT (via hugot) saturates CPU during indexing on Apple Silicon — there's no GPU acceleration path (vaultmind#34). The sidecar pattern moves heavy inference behind a JSON contract, isolating vaultmind core from the inference engine choice. Today the engine is PyTorch+MPS; tomorrow it could be CoreML or MLX without touching the Go side.
Lifecycle: the embedder spawns the Python subprocess in NewSidecarBGEM3. Close() tears it down. Per-batch round-trips happen via Send (write JSON line to stdin, read JSON line from stdout). Mutex serializes access since the protocol is synchronous request/response on a single FD pair.
func NewSidecarBGEM3 ¶
func NewSidecarBGEM3(cfg SidecarBGEM3Config) (*SidecarBGEM3Embedder, error)
NewSidecarBGEM3 spawns the Python sidecar and waits for its ready signal. Returns an error if the subprocess fails to start, the Python imports fail, or the model can't be loaded. The caller MUST defer Close() to reap the subprocess.
func (*SidecarBGEM3Embedder) Close ¶
func (e *SidecarBGEM3Embedder) Close() error
Close terminates the sidecar process. Safe to call multiple times.
func (*SidecarBGEM3Embedder) Device ¶
func (e *SidecarBGEM3Embedder) Device() string
Device reports the device the sidecar selected ("mps" or "cpu"). Useful for the doctor / index summary so the operator sees acceleration.
func (*SidecarBGEM3Embedder) Dims ¶
func (e *SidecarBGEM3Embedder) Dims() int
Dims reports the dense embedding dimensionality.
func (*SidecarBGEM3Embedder) EmbedBatch ¶
EmbedBatch produces dense embeddings for a batch via the sidecar.
func (*SidecarBGEM3Embedder) EmbedFull ¶
func (e *SidecarBGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)
EmbedFull is the singleton form of EmbedFullBatch.
func (*SidecarBGEM3Embedder) EmbedFullBatch ¶
func (e *SidecarBGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)
EmbedFullBatch sends a batch of texts to the sidecar and parses the response. Tokens are sparse-key strings in the JSON; we parse to int32 here so the sidecar protocol stays portable across languages.