vecindex

package module
v0.0.0-...-a8943ed Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2026 License: MIT Imports: 21 Imported by: 0

Documentation

Overview

Package vecindex implements IVF (Inverted File Index) vector similarity search using local segment files, overlay journals, and shared metric/kmeans primitives.

Index

Constants

View Source
const (
	MemberEncodingRawPreparedF32 int64 = iota
	MemberEncodingResidualInt8
	MemberEncodingResidualPQ8
)
View Source
const (
	// MetricL2 is squared Euclidean distance.
	MetricL2 = metric.MetricL2
	// MetricDot is negative inner product (higher dot = closer).
	MetricDot = metric.MetricDot
	// MetricCosine is cosine distance (1 - cosine similarity).
	MetricCosine = metric.MetricCosine
)
View Source
const MaxIndexNameLen = 48

Maximum length of a user-supplied index name. Keeps the derived object names (table, index, trigger) well under SQLite's practical identifier limits.

View Source
const MaxNlist = 16384

MaxNlist is the maximum number of IVF clusters allowed.

View Source
const MemberResidualBlockSize = quantize.DefaultResidualBlockSize
View Source
const (
	SegmentStoreVersion = 4
)
View Source
const StablePQMinInternalDim = 512

Variables

View Source
var ErrNoCentroidsLoaded = errors.New("vecindex: no centroids loaded")

Functions

func DecodeCentroidBlob

func DecodeCentroidBlob(data []byte) (*kmeans.CentroidSet, error)

DecodeCentroidBlob deserialises a CentroidSet from a zstd-compressed msgpack blob.

func DefaultSegmentBlockRows

func DefaultSegmentBlockRows(encoding int64) int

func EncodeCentroidBlob

func EncodeCentroidBlob(cs *kmeans.CentroidSet) ([]byte, error)

EncodeCentroidBlob serialises a CentroidSet to a zstd-compressed msgpack blob. The active centroid blob is embedded in the local segment manifest.

func EncodeSegmentCurrent

func EncodeSegmentCurrent(current *SegmentCurrent) ([]byte, error)

func EncodeSegmentManifest

func EncodeSegmentManifest(manifest *SegmentManifest) ([]byte, error)

func EncodeStableMemberCodecBlob

func EncodeStableMemberCodecBlob(codec *StableMemberCodec) ([]byte, error)

func Float32ToBytes

func Float32ToBytes(v []float32) []byte

Float32ToBytes encodes a []float32 as little-endian bytes. Used during sampling for k-means to round-trip through SQL BLOB columns.

func IsVecLocalTable

func IsVecLocalTable(tableName string) bool

IsVecLocalTable reports whether tableName is a legacy local-only vector object. Used by the CDC applier to tolerate DROP races against pre-cutover SQLite artifacts without treating them as replicated user tables.

func OverlayJournalPath

func OverlayJournalPath(dir string) string

func ScoreEncodedMember

func ScoreEncodedMember(spec IVFSpec, cs *kmeans.CentroidSet, query []float32, queryNorm2 float32, clusterID int64, enc int64, vec []byte) (float32, error)

func SegmentBlockMetaPath

func SegmentBlockMetaPath(dir string, generation uint64) string

func SegmentBlockPath

func SegmentBlockPath(dir string, generation uint64) string

func SegmentCurrentPath

func SegmentCurrentPath(dir string) string

func SegmentDataPath

func SegmentDataPath(dir string, generation uint64) string

func SegmentManifestPath

func SegmentManifestPath(dir string, generation uint64) string

func SegmentRowMapPath

func SegmentRowMapPath(dir string, generation uint64) string

func SegmentStoreDir

func SegmentStoreDir(dbPath, indexName string) string

func SegmentStoreV1Compat

func SegmentStoreV1Compat() uint32

func SegmentStoreV2Compat

func SegmentStoreV2Compat() uint32

func SegmentStoreV3Compat

func SegmentStoreV3Compat() uint32

func StableMemberEncodingSpec

func StableMemberEncodingSpec(spec IVFSpec) (int64, int)

func ValidateIndexName

func ValidateIndexName(name string) error

ValidateIndexName rejects names that would produce unsafe or ambiguous SQLite identifiers. Rules:

  • non-empty, length <= MaxIndexNameLen
  • first character is an ASCII letter
  • remaining characters are ASCII letters, digits, or underscore
  • must not begin with the reserved `marmot` prefix (reserved for generated names such as `_marmot_vec_*` and `__marmot_vec_*`).

Types

type BulkEntry

type BulkEntry struct {
	// ExternalID is the caller-supplied identifier for the vector.
	ExternalID []byte
	// Vector holds the raw float32 values.
	Vector []float32
}

BulkEntry is a single vector supplied during index creation.

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine manages in-memory state for all active vector indexes. It implements VectorUDFProvider so it can be installed as the global provider via db.SetVectorUDFProvider(engine).

All methods are safe for concurrent use.

func NewEngine

func NewEngine() *Engine

NewEngine creates a new Engine with no registered indexes.

func (*Engine) AssignNearest

func (e *Engine) AssignNearest(indexName string, vec []byte) (int64, error)

AssignNearest implements VectorUDFProvider. Returns the 1-based cluster ID for the nearest centroid (0 is reserved for delta rows). Returns MARMOT-VEC-013 if the index is unknown.

func (*Engine) Detach

func (e *Engine) Detach(indexName string) (*IndexState, bool)

Detach removes an index from the lookup map without retiring its resources. Callers that do not restore the state must eventually call Retire.

func (*Engine) Lookup

func (e *Engine) Lookup(indexName string) (*IndexState, bool)

Lookup returns the IndexState for indexName and whether it was found.

func (*Engine) LookupRef

func (e *Engine) LookupRef(indexName string) (*IndexState, func(), bool)

LookupRef returns a serving reference to the current state. The caller must invoke the returned release function when it is done reading from file-backed segment or overlay resources.

func (*Engine) Register

func (e *Engine) Register(indexName string, state *IndexState)

Register stores the IndexState for indexName in the engine, replacing any existing state for the same name.

func (*Engine) RegisterWithCentroidSet

func (e *Engine) RegisterWithCentroidSet(indexName string, spec IVFSpec, cs *kmeans.CentroidSet) *IndexState

RegisterWithCentroidSet is a convenience wrapper that creates a new IndexState from spec+cs and registers it. Returns the created state.

func (*Engine) TopNprobeClusters

func (e *Engine) TopNprobeClusters(indexName string, vec []byte, n int) ([]int64, error)

TopNprobeClusters implements VectorUDFProvider. Thin wrapper over TopNprobeClustersWithEpoch that discards the epoch.

func (*Engine) TopNprobeClustersWithEpoch

func (e *Engine) TopNprobeClustersWithEpoch(indexName string, vec []byte, n int) ([]int64, uint64, error)

TopNprobeClustersWithEpoch is TopNprobeClusters + the probe-state epoch the cluster IDs were computed against. Used by the coordinator's cache path to detect when cluster IDs produced under an older probe would be indexed into a post-reindex cache at a different epoch.

func (*Engine) Unregister

func (e *Engine) Unregister(indexName string)

Unregister removes the IndexState for indexName from the engine. It is a no-op if indexName is not registered.

type ForcePlan

type ForcePlan int

ForcePlan controls the query plan selection strategy (§6.3).

const (
	// ForcePlanAuto lets the cost-based planner decide.
	ForcePlanAuto ForcePlan = iota
	// ForcePlanPre forces pre-filter (exact brute-force over predicate results).
	ForcePlanPre
	// ForcePlanPost forces post-filter (IVF probe then predicate).
	ForcePlanPost
)

func (ForcePlan) String

func (p ForcePlan) String() string

String returns the SQL-level string for a ForcePlan.

type IVFSpec

type IVFSpec struct {
	// ID is the unique index identifier.
	ID string
	// Dim is the vector dimensionality of caller-supplied vectors.
	Dim int
	// Metric is the distance function to use.
	Metric Metric
	// Nlist is the number of IVF centroids (clusters).
	Nlist int
	// Nprobe is the number of clusters searched at query time.
	Nprobe int
	// Seed is the RNG seed used for k-means initialisation.
	Seed uint64
	// Epoch tracks the centroid generation; incremented on retrain.
	Epoch uint64
	// MaxNorm is the fixed upper bound on vector L2 norms used for MIPS→L2
	// augmentation when Metric == MetricDot.
	MaxNorm float32
}

IVFSpec describes the configuration for a single IVF vector index.

func (IVFSpec) InternalDim

func (s IVFSpec) InternalDim() int

InternalDim returns the dimensionality of vectors as stored in the index. For MetricDot this is Dim+1 (one augmented dimension for MIPS→L2 reduction). For all other metrics it equals Dim.

func (IVFSpec) InternalMetric

func (s IVFSpec) InternalMetric() Metric

InternalMetric returns the distance metric used for centroid assignment. For MetricDot the MIPS→L2 reduction stores augmented vectors, so centroid assignment uses MetricL2. For all other metrics this equals Metric.

type IndexState

type IndexState struct {
	// contains filtered or unexported fields
}

IndexState holds the in-memory state for a single vector index.

probeState is the active routing centroid set used by query transpile and assignment. All pointers are stored as atomic.Pointer so readers are lock-free.

func NewIndexState

func NewIndexState(spec IVFSpec, cs *kmeans.CentroidSet) *IndexState

NewIndexState creates an IndexState rooted at cs.

func (*IndexState) Acquire

func (s *IndexState) Acquire() bool

Acquire pins the state for a serving reader. It returns false when the state has already been retired and its file-backed resources may be closing.

func (*IndexState) AddRetireCallback

func (s *IndexState) AddRetireCallback(fn func())

AddRetireCallback registers fn to run after all readers have released and file-backed serving resources have been closed.

func (*IndexState) AssignNearest

func (s *IndexState) AssignNearest(vecBytes []byte) (int64, error)

AssignNearest returns the 1-based cluster ID for the nearest centroid.

vecBytes must be a little-endian float32 BLOB of exactly spec.Dim*4 bytes for L2/Cosine, or spec.Dim*4 bytes for Dot (augmentation is applied here). Returns an error on dimension mismatch or missing centroids.

func (*IndexState) AssignNearestPrepared

func (s *IndexState) AssignNearestPrepared(vecBytes []byte) (int64, error)

AssignNearestPrepared returns the 1-based cluster ID for the nearest centroid for a vector that is already in the internal search space.

vecBytes must be a little-endian float32 BLOB of exactly spec.InternalDim()*4 bytes. Unlike AssignNearest, dot-metric vectors are not augmented here.

func (*IndexState) ClearOverlay

func (s *IndexState) ClearOverlay()

func (*IndexState) ClearSegmentStore

func (s *IndexState) ClearSegmentStore()

ClearSegmentStore drops the active stable on-disk snapshot.

func (*IndexState) HotClusterScores

func (s *IndexState) HotClusterScores(limit int) map[int64]uint64

func (*IndexState) HotClusters

func (s *IndexState) HotClusters(limit int) []int64

func (*IndexState) LoadMaintenanceState

func (s *IndexState) LoadMaintenanceState() *MaintenanceState

func (*IndexState) LoadOverlay

func (s *IndexState) LoadOverlay() *JournaledOverlay

func (*IndexState) LoadSegmentStore

func (s *IndexState) LoadSegmentStore() *SegmentGeneration

LoadSegmentStore returns the active stable on-disk generation snapshot.

func (*IndexState) ProbeState

func (s *IndexState) ProbeState() *kmeans.CentroidSet

ProbeState returns the current probe centroid set. Nil when no centroids are loaded (empty-table bootstrap). The returned CentroidSet is immutable.

func (*IndexState) ProbeVersion

func (s *IndexState) ProbeVersion() uint64

ProbeVersion returns the epoch of the active probe centroid set. Returns 0 when no centroid set is loaded.

func (*IndexState) RecordClusterHits

func (s *IndexState) RecordClusterHits(clusterIDs []int64)

func (*IndexState) RecordClusterMutation

func (s *IndexState) RecordClusterMutation(oldCluster int64, oldVec []float32, newCluster int64, newVec []float32)

func (*IndexState) RecordRowsModified

func (s *IndexState) RecordRowsModified(delta uint64)

func (*IndexState) Release

func (s *IndexState) Release()

Release drops a serving reader pin acquired by Acquire.

func (*IndexState) Retire

func (s *IndexState) Retire()

Retire prevents new serving readers from pinning this state and closes file-backed resources as soon as all existing readers have left.

func (*IndexState) Spec

func (s *IndexState) Spec() IVFSpec

Spec returns the immutable IVF configuration for this index.

func (*IndexState) StoreMaintenanceState

func (s *IndexState) StoreMaintenanceState(ms *MaintenanceState)

func (*IndexState) StoreOverlay

func (s *IndexState) StoreOverlay(overlay *JournaledOverlay)

func (*IndexState) StoreSegmentStore

func (s *IndexState) StoreSegmentStore(generation *SegmentGeneration)

StoreSegmentStore installs generation as the active stable on-disk snapshot.

func (*IndexState) SwapProbeState

func (s *IndexState) SwapProbeState(cs *kmeans.CentroidSet) *kmeans.CentroidSet

SwapProbeState atomically replaces the probe centroid set and returns the previous one. Called at REINDEX commit (design §8.3 step 7 in-txn swap).

func (*IndexState) TopNprobeClusters

func (s *IndexState) TopNprobeClusters(vecBytes []byte, n int) ([]int64, error)

TopNprobeClusters returns the top-n 1-based cluster IDs ordered by ascending distance to the query vector (closest first). Thin wrapper over TopNprobeClustersWithEpoch that discards the epoch.

func (*IndexState) TopNprobeClustersWithEpoch

func (s *IndexState) TopNprobeClustersWithEpoch(vecBytes []byte, n int) ([]int64, uint64, error)

TopNprobeClustersWithEpoch is TopNprobeClusters + the probe-state epoch the cluster IDs were computed against. The epoch is read from the SAME probeState pointer used for the top-N computation, so callers can later gate cache reads on epoch equality and avoid indexing IDs computed under the old probe into a freshly-rebuilt cache (task #16 coherence).

type JournaledOverlay

type JournaledOverlay struct {
	// contains filtered or unexported fields
}

JournaledOverlay couples a crash-safe local journal with an in-memory overlay snapshot. Callers append committed batches here and readers consume immutable snapshots.

func OpenJournaledOverlay

func OpenJournaledOverlay(path string) (*JournaledOverlay, error)

OpenJournaledOverlay opens path, replays the journal, and returns the live in-memory overlay view.

func OpenJournaledOverlayForRewrite

func OpenJournaledOverlayForRewrite(path string) (*JournaledOverlay, error)

OpenJournaledOverlayForRewrite opens the journal file without replaying the current overlay state. It is intended for callers that already have the mutations they want to persist and will immediately call Rewrite.

func (*JournaledOverlay) ApplyCommittedBatch

func (o *JournaledOverlay) ApplyCommittedBatch(mutations []OverlayMutation) error

ApplyCommittedBatch validates, journals, fsyncs, and publishes a committed overlay batch.

func (*JournaledOverlay) Close

func (o *JournaledOverlay) Close() error

Close closes the underlying journal handle.

func (*JournaledOverlay) CompactAfter

func (o *JournaledOverlay) CompactAfter(minSequence uint64) error

func (*JournaledOverlay) Reset

func (o *JournaledOverlay) Reset(epoch uint64) error

Reset clears both the journal file and the in-memory overlay view.

func (*JournaledOverlay) Rewrite

func (o *JournaledOverlay) Rewrite(epoch uint64, minSequence uint64, mutations []OverlayMutation) error

func (*JournaledOverlay) Snapshot

func (o *JournaledOverlay) Snapshot() *OverlaySnapshot

Snapshot returns the current immutable in-memory overlay view.

type MaintenanceState

type MaintenanceState struct {
	ClusterRowCounts         []uint64
	ClusterVectorSums        [][]float32
	PendingClusterRowDelta   []int64
	PendingClusterVectorSums [][]float32
	RowsModifiedSinceRebuild uint64
	LastRebuildRowCount      uint64
	ConsecutiveSkewCycles    uint32
	// contains filtered or unexported fields
}

func (*MaintenanceState) Clone

func (m *MaintenanceState) Clone() *MaintenanceState

func (*MaintenanceState) LiveCentroids

func (m *MaintenanceState) LiveCentroids(base [][]float32) [][]float32

func (*MaintenanceState) LiveClusterRowCounts

func (m *MaintenanceState) LiveClusterRowCounts() []uint64

func (*MaintenanceState) LiveClusterVectorSums

func (m *MaintenanceState) LiveClusterVectorSums() [][]float32

func (*MaintenanceState) RecordClusterMutation

func (m *MaintenanceState) RecordClusterMutation(oldCluster int64, oldVec []float32, newCluster int64, newVec []float32)

func (*MaintenanceState) RecordRowsModified

func (m *MaintenanceState) RecordRowsModified(delta uint64)

func (*MaintenanceState) ResetPending

func (m *MaintenanceState) ResetPending()

func (*MaintenanceState) RowCount

func (m *MaintenanceState) RowCount() uint64

func (*MaintenanceState) SetConsecutiveSkewCycles

func (m *MaintenanceState) SetConsecutiveSkewCycles(v uint32)

func (*MaintenanceState) Stats

func (m *MaintenanceState) Stats() (rowsModified uint64, lastRebuild uint64, skewCycles uint32)

type Metric

type Metric = metric.Metric

Metric identifies the distance function used for vector comparisons. It is an alias for metric.Metric so callers can use either import path.

type OverlayBuffer

type OverlayBuffer struct {
	// contains filtered or unexported fields
}

OverlayBuffer stores immutable overlay snapshots behind an atomic pointer so readers can take stable snapshots without locking while writers publish whole-snapshot updates.

func NewOverlayBuffer

func NewOverlayBuffer() *OverlayBuffer

NewOverlayBuffer returns an empty overlay buffer.

func (*OverlayBuffer) ApplyBatch

func (b *OverlayBuffer) ApplyBatch(mutations []OverlayMutation) error

ApplyBatch applies mutations copy-on-write and publishes the resulting snapshot atomically.

func (*OverlayBuffer) Reset

func (b *OverlayBuffer) Reset(epoch uint64)

Reset clears the overlay and publishes an empty snapshot tied to epoch.

func (*OverlayBuffer) Snapshot

func (b *OverlayBuffer) Snapshot() *OverlaySnapshot

Snapshot returns the current immutable overlay view.

func (*OverlayBuffer) StoreSnapshot

func (b *OverlayBuffer) StoreSnapshot(snapshot *OverlaySnapshot)

StoreSnapshot replaces the current overlay view.

type OverlayJournal

type OverlayJournal struct {
	// contains filtered or unexported fields
}

func OpenOverlayJournal

func OpenOverlayJournal(path string) (*OverlayJournal, error)

OpenOverlayJournal opens or creates a crash-safe append-only overlay journal. A truncated tail left by a torn write is discarded automatically.

func OpenOverlayJournalForRewrite

func OpenOverlayJournalForRewrite(path string) (*OverlayJournal, error)

OpenOverlayJournalForRewrite opens or creates the journal file without replaying its contents. Use this only for callers that will immediately replace the on-disk contents via Rewrite/Reset.

func (*OverlayJournal) AppendBatch

func (j *OverlayJournal) AppendBatch(mutations []OverlayMutation) ([]overlayVecRef, error)

AppendBatch appends committed overlay mutations atomically and fsyncs once per batch so replay after a crash sees either the full batch or none of the torn tail.

func (*OverlayJournal) Close

func (j *OverlayJournal) Close() error

Close closes the underlying file handle.

func (*OverlayJournal) Replay

func (j *OverlayJournal) Replay() (*OverlaySnapshot, error)

Replay rebuilds the overlay snapshot from the journal contents.

func (*OverlayJournal) Reset

func (j *OverlayJournal) Reset(epoch uint64) error

Reset compacts the journal to an empty state for epoch.

func (*OverlayJournal) Rewrite

func (j *OverlayJournal) Rewrite(epoch uint64, lastSequence uint64, mutations []OverlayMutation) ([]overlayVecRef, error)

type OverlayMutation

type OverlayMutation struct {
	Kind              OverlayMutationKind
	Epoch             uint64
	Sequence          uint64
	ClusterID         int64
	RowID             int64
	AppliedAtUnixNano int64
	CommitTxnID       uint64
	CommitSeqNum      uint64
	VecEncoding       OverlayVecEncoding
	Vec               []byte
}

OverlayMutation is one committed local overlay change.

Sequence is a caller-provided monotonically increasing local watermark. Epoch tracks the centroid generation the assignment was computed against.

type OverlayMutationKind

type OverlayMutationKind uint8
const (
	OverlayMutationUpsert  OverlayMutationKind = 1
	OverlayMutationReplace OverlayMutationKind = 2
	OverlayMutationDelete  OverlayMutationKind = 3
)

type OverlaySnapshot

type OverlaySnapshot struct {
	// contains filtered or unexported fields
}

OverlaySnapshot is an immutable in-memory view of the local overlay state. It mirrors the serving overlay semantics used during the vector-index cutover: replacement/deletes tombstone the stable view while upserts only contribute overlay rows.

func (*OverlaySnapshot) BacklogStats

func (s *OverlaySnapshot) BacklogStats(minSequence uint64) (rows int, bytes int64, oldestUnixNano int64)

func (*OverlaySnapshot) Epoch

func (s *OverlaySnapshot) Epoch() uint64

Epoch returns the centroid generation the snapshot is tied to.

func (*OverlaySnapshot) HasTombstone

func (s *OverlaySnapshot) HasTombstone(rowID int64) bool

HasTombstone reports whether rowID masks a stable-row copy.

func (*OverlaySnapshot) HasTombstoneAfter

func (s *OverlaySnapshot) HasTombstoneAfter(rowID int64, minSequence uint64) bool

func (*OverlaySnapshot) LastSequence

func (s *OverlaySnapshot) LastSequence() uint64

LastSequence returns the highest applied local journal sequence.

func (*OverlaySnapshot) Len

func (s *OverlaySnapshot) Len() int

Len returns the number of overlay rows currently present.

func (*OverlaySnapshot) MutationsAfter

func (s *OverlaySnapshot) MutationsAfter(minSequence uint64) []OverlayMutation

func (*OverlaySnapshot) NewestUnixNanoAfter

func (s *OverlaySnapshot) NewestUnixNanoAfter(minSequence uint64) int64

func (*OverlaySnapshot) ReadVec

func (s *OverlaySnapshot) ReadVec(rowID int64) ([]byte, error)

ReadVec returns the current overlay vector bytes for rowID.

func (*OverlaySnapshot) RowCluster

func (s *OverlaySnapshot) RowCluster(rowID int64) (int64, bool)

RowCluster returns the overlay cluster for rowID, if present.

func (*OverlaySnapshot) RowClusterAfter

func (s *OverlaySnapshot) RowClusterAfter(rowID int64, minSequence uint64) (int64, bool)

func (*OverlaySnapshot) VisitAllAfter

func (s *OverlaySnapshot) VisitAllAfter(minSequence uint64, visit func(clusterID, rowID int64, vec []byte) bool)

func (*OverlaySnapshot) VisitAllEncodedAfter

func (s *OverlaySnapshot) VisitAllEncodedAfter(minSequence uint64, visit func(clusterID, rowID int64, encoding OverlayVecEncoding, vec []byte) bool)

func (*OverlaySnapshot) VisitCluster

func (s *OverlaySnapshot) VisitCluster(clusterID int64, visit func(rowID int64, vec []byte) bool)

VisitCluster visits every overlay row in clusterID.

func (*OverlaySnapshot) VisitClusterAfter

func (s *OverlaySnapshot) VisitClusterAfter(clusterID int64, minSequence uint64, visit func(rowID int64, vec []byte) bool)

func (*OverlaySnapshot) VisitClusterEncodedAfter

func (s *OverlaySnapshot) VisitClusterEncodedAfter(clusterID int64, minSequence uint64, visit func(rowID int64, encoding OverlayVecEncoding, vec []byte) bool)

func (*OverlaySnapshot) VisitClusterRange

func (s *OverlaySnapshot) VisitClusterRange(clusterID int64, minSequence uint64, maxSequence uint64, visit func(rowID int64, vec []byte) bool)

func (*OverlaySnapshot) VisitMutationHeadersAfter

func (s *OverlaySnapshot) VisitMutationHeadersAfter(minSequence uint64, visit func(OverlayMutation) bool)

func (*OverlaySnapshot) VisitMutationHeadersAfterUnordered

func (s *OverlaySnapshot) VisitMutationHeadersAfterUnordered(minSequence uint64, visit func(OverlayMutation) bool)

VisitMutationHeadersAfterUnordered visits mutation headers without building a sorted intermediate slice. Use it for maintenance accounting where sequence order is not required.

func (*OverlaySnapshot) VisitMutationsAfter

func (s *OverlaySnapshot) VisitMutationsAfter(minSequence uint64, visit func(OverlayMutation) bool)

func (*OverlaySnapshot) VisitTombstones

func (s *OverlaySnapshot) VisitTombstones(visit func(rowID int64) bool)

VisitTombstones visits every tombstoned rowid.

func (*OverlaySnapshot) VisitTombstonesAfter

func (s *OverlaySnapshot) VisitTombstonesAfter(minSequence uint64, visit func(rowID int64) bool)

type OverlayVecEncoding

type OverlayVecEncoding uint8
const (
	OverlayPreparedF32  OverlayVecEncoding = 0
	OverlayResidualInt8 OverlayVecEncoding = 1
)

type SegmentBlockMetaStore

type SegmentBlockMetaStore struct {
	// contains filtered or unexported fields
}

func OpenSegmentBlockMetaStore

func OpenSegmentBlockMetaStore(path string) (*SegmentBlockMetaStore, error)

func (*SegmentBlockMetaStore) BlockRows

func (s *SegmentBlockMetaStore) BlockRows() int

func (*SegmentBlockMetaStore) Close

func (s *SegmentBlockMetaStore) Close() error

func (*SegmentBlockMetaStore) Dim

func (s *SegmentBlockMetaStore) Dim() int

func (*SegmentBlockMetaStore) Encoding

func (s *SegmentBlockMetaStore) Encoding() int64

func (*SegmentBlockMetaStore) Epoch

func (s *SegmentBlockMetaStore) Epoch() uint64

func (*SegmentBlockMetaStore) Generation

func (s *SegmentBlockMetaStore) Generation() uint64

func (*SegmentBlockMetaStore) InternalDim

func (s *SegmentBlockMetaStore) InternalDim() int

func (*SegmentBlockMetaStore) MaxCluster

func (s *SegmentBlockMetaStore) MaxCluster() int

func (*SegmentBlockMetaStore) Metric

func (s *SegmentBlockMetaStore) Metric() Metric

func (*SegmentBlockMetaStore) Path

func (s *SegmentBlockMetaStore) Path() string

func (*SegmentBlockMetaStore) ReadClusterBlocks

func (s *SegmentBlockMetaStore) ReadClusterBlocks(clusterIDs []int64) ([]SegmentBlockRecord, error)

func (*SegmentBlockMetaStore) RecordCount

func (s *SegmentBlockMetaStore) RecordCount() uint64

func (*SegmentBlockMetaStore) RecordQueryStats

func (s *SegmentBlockMetaStore) RecordQueryStats(considered, wouldSkip, skipped, scored, rowsScored uint64)

func (*SegmentBlockMetaStore) ResetScanStats

func (s *SegmentBlockMetaStore) ResetScanStats()

func (*SegmentBlockMetaStore) SnapshotScanStats

func (s *SegmentBlockMetaStore) SnapshotScanStats() SegmentBlockScanStats

func (*SegmentBlockMetaStore) ValidateCoverage

func (s *SegmentBlockMetaStore) ValidateCoverage(data *SegmentDataStore) error

type SegmentBlockMetaWriter

type SegmentBlockMetaWriter struct {
	// contains filtered or unexported fields
}

func CreateSegmentBlockMetaWriter

func CreateSegmentBlockMetaWriter(path string, spec IVFSpec, codec *StableMemberCodec, blockRows, maxCluster int, epoch, generation uint64) (*SegmentBlockMetaWriter, error)

func (*SegmentBlockMetaWriter) Abort

func (w *SegmentBlockMetaWriter) Abort()

func (*SegmentBlockMetaWriter) Append

func (w *SegmentBlockMetaWriter) Append(clusterID, rowID int64, dataOffset uint64, dataBytes int, encoded []byte) error

func (*SegmentBlockMetaWriter) AppendRawBlocks

func (w *SegmentBlockMetaWriter) AppendRawBlocks(clusterID int64, blocks []SegmentBlockRecord, offsetDelta int64) error

func (*SegmentBlockMetaWriter) Close

type SegmentBlockRecord

type SegmentBlockRecord struct {
	ClusterID       int64
	FirstRowOrdinal uint64
	RowCount        uint64
	DataOffset      int64
	DataBytes       int64
	MinRowID        int64
	MaxRowID        int64
	MinNorm2        float32
	MaxNorm2        float32
	Stats           []byte
}

type SegmentBlockScanStats

type SegmentBlockScanStats struct {
	MetaReadBytes uint64
	MetaReads     uint64
	Considered    uint64
	WouldSkip     uint64
	Skipped       uint64
	Scored        uint64
	RowsScored    uint64
}

type SegmentCurrent

type SegmentCurrent struct {
	Version      uint32 `msgpack:"version"`
	Generation   uint64 `msgpack:"generation"`
	ManifestFile string `msgpack:"manifest_file"`
}

func DecodeSegmentCurrent

func DecodeSegmentCurrent(data []byte) (*SegmentCurrent, error)

type SegmentDataStore

type SegmentDataStore struct {
	// contains filtered or unexported fields
}

SegmentDataStore is a read-only stable-vector generation file opened with explicit ReadAt-based scans. Stable rows are laid out as [rowid uint64][vec bytes] in cluster_id,rowid order.

vecBytes yielded during ScanCluster/ScanClusters is only valid for the callback duration; callers that retain it must copy it.

func OpenSegmentDataStore

func OpenSegmentDataStore(path string) (*SegmentDataStore, error)

func (*SegmentDataStore) Close

func (s *SegmentDataStore) Close() error

func (*SegmentDataStore) ClusterCount

func (s *SegmentDataStore) ClusterCount(clusterID int64) uint64

func (*SegmentDataStore) ClusterRowCounts

func (s *SegmentDataStore) ClusterRowCounts() []uint64

func (*SegmentDataStore) ClusterSpan

func (s *SegmentDataStore) ClusterSpan(clusterID int64) (offset int64, bytes int64, count uint64, ok bool)

func (*SegmentDataStore) CopyClusterTo

func (s *SegmentDataStore) CopyClusterTo(clusterID int64, w io.Writer) (uint64, error)

func (*SegmentDataStore) Dim

func (s *SegmentDataStore) Dim() int

func (*SegmentDataStore) Encoding

func (s *SegmentDataStore) Encoding() int64

func (*SegmentDataStore) Epoch

func (s *SegmentDataStore) Epoch() uint64

func (*SegmentDataStore) FileOrderedClusters

func (s *SegmentDataStore) FileOrderedClusters() []int64

func (*SegmentDataStore) FileSize

func (s *SegmentDataStore) FileSize() int64

func (*SegmentDataStore) Generation

func (s *SegmentDataStore) Generation() uint64

func (*SegmentDataStore) InternalDim

func (s *SegmentDataStore) InternalDim() int

func (*SegmentDataStore) MaxCluster

func (s *SegmentDataStore) MaxCluster() int

func (*SegmentDataStore) Metric

func (s *SegmentDataStore) Metric() Metric

func (*SegmentDataStore) Path

func (s *SegmentDataStore) Path() string

func (*SegmentDataStore) ReadEntryAt

func (s *SegmentDataStore) ReadEntryAt(offset uint64) (int64, []byte, error)

func (*SegmentDataStore) ResetScanStats

func (s *SegmentDataStore) ResetScanStats()

func (*SegmentDataStore) RowCount

func (s *SegmentDataStore) RowCount() uint64

func (*SegmentDataStore) ScanBlockRecordsFileOrder

func (s *SegmentDataStore) ScanBlockRecordsFileOrder(blocks []SegmentBlockRecord, yield func(clusterID int64, rows []byte, count uint64, entrySize int) bool) error

func (*SegmentDataStore) ScanCluster

func (s *SegmentDataStore) ScanCluster(clusterID int64, yield func(rowid int64, vecBytes []byte) bool) error

func (*SegmentDataStore) ScanClusters

func (s *SegmentDataStore) ScanClusters(clusterIDs []int64, yield func(clusterID, rowid int64, vecBytes []byte) bool) error

func (*SegmentDataStore) ScanClustersFileOrder

func (s *SegmentDataStore) ScanClustersFileOrder(clusterIDs []int64, yield func(clusterID, rowid int64, vecBytes []byte) bool) error

func (*SegmentDataStore) ScanClustersFileOrderSpans

func (s *SegmentDataStore) ScanClustersFileOrderSpans(clusterIDs []int64, yield func(clusterID int64, rows []byte, count uint64, entrySize int) bool) error

func (*SegmentDataStore) SnapshotScanStats

func (s *SegmentDataStore) SnapshotScanStats() SegmentScanStats

func (*SegmentDataStore) VecBytes

func (s *SegmentDataStore) VecBytes() int

func (*SegmentDataStore) WarmClusters

func (s *SegmentDataStore) WarmClusters(clusterIDs []int64, maxBytes int64) error

type SegmentDataWriter

type SegmentDataWriter struct {
	// contains filtered or unexported fields
}

func CreateSegmentDataWriter

func CreateSegmentDataWriter(path string, metric Metric, encoding int64, dim, internalDim, vecBytes, maxCluster int, epoch, generation uint64) (*SegmentDataWriter, error)

func (*SegmentDataWriter) Abort

func (w *SegmentDataWriter) Abort()

func (*SegmentDataWriter) Append

func (w *SegmentDataWriter) Append(clusterID, rowid int64, vec []byte) error

func (*SegmentDataWriter) AppendRawCluster

func (w *SegmentDataWriter) AppendRawCluster(clusterID int64, count uint64, rows io.Reader) error

func (*SegmentDataWriter) Close

func (w *SegmentDataWriter) Close() (*SegmentDataStore, error)

func (*SegmentDataWriter) EntrySize

func (w *SegmentDataWriter) EntrySize() int

func (*SegmentDataWriter) NextOffset

func (w *SegmentDataWriter) NextOffset() uint64

type SegmentGeneration

type SegmentGeneration struct {
	Data                     *SegmentDataStore
	RowMap                   *SegmentRowMap
	Blocks                   *SegmentBlockMetaStore
	ProbeCentroids           *kmeans.CentroidSet
	StableCentroids          *kmeans.CentroidSet
	StableCodec              *StableMemberCodec
	AppliedOverlaySeq        uint64
	ClusterRowCounts         []uint64
	ClusterVectorSums        [][]float32
	RowsModifiedSinceRebuild uint64
	LastRebuildRowCount      uint64
	ConsecutiveSkewCycles    uint32
	LayoutHotClusters        []int64
}

SegmentGeneration is the active stable on-disk serving snapshot for one vector index epoch.

func (*SegmentGeneration) Close

func (g *SegmentGeneration) Close() error

type SegmentManifest

type SegmentManifest struct {
	Version                  uint32      `msgpack:"version"`
	Database                 string      `msgpack:"database"`
	IndexName                string      `msgpack:"index_name"`
	IndexCreatedAt           int64       `msgpack:"index_created_at"`
	Metric                   string      `msgpack:"metric"`
	Dim                      uint32      `msgpack:"dim"`
	InternalDim              uint32      `msgpack:"internal_dim"`
	ProbeCentroidEpoch       uint64      `msgpack:"probe_centroid_epoch,omitempty"`
	ProbeCentroidBlob        []byte      `msgpack:"probe_centroid_blob,omitempty"`
	StableCentroidEpoch      uint64      `msgpack:"stable_centroid_epoch,omitempty"`
	StableCentroidBlob       []byte      `msgpack:"stable_centroid_blob,omitempty"`
	StableMemberCodecBlob    []byte      `msgpack:"stable_member_codec_blob,omitempty"`
	CentroidEpoch            uint64      `msgpack:"centroid_epoch,omitempty"`
	CentroidBlob             []byte      `msgpack:"centroid_blob,omitempty"`
	AppliedOverlaySeq        uint64      `msgpack:"applied_overlay_seq"`
	Generation               uint64      `msgpack:"generation"`
	DataFile                 string      `msgpack:"data_file"`
	DataFileSize             uint64      `msgpack:"data_file_size"`
	DataFileSHA256           string      `msgpack:"data_file_sha256"`
	RowMapFile               string      `msgpack:"rowmap_file"`
	RowMapFileSize           uint64      `msgpack:"rowmap_file_size"`
	RowMapFileSHA256         string      `msgpack:"rowmap_file_sha256"`
	BlockMetaFile            string      `msgpack:"block_meta_file,omitempty"`
	BlockMetaFileSize        uint64      `msgpack:"block_meta_file_size,omitempty"`
	BlockMetaFileSHA256      string      `msgpack:"block_meta_file_sha256,omitempty"`
	BlockRows                uint32      `msgpack:"block_rows,omitempty"`
	MaxCluster               uint32      `msgpack:"max_cluster"`
	RowCount                 uint64      `msgpack:"row_count"`
	ClusterRowCounts         []uint64    `msgpack:"cluster_row_counts,omitempty"`
	ClusterVectorSums        [][]float32 `msgpack:"cluster_vector_sums,omitempty"`
	RowsModifiedSinceRebuild uint64      `msgpack:"rows_modified_since_rebuild,omitempty"`
	LastRebuildRowCount      uint64      `msgpack:"last_rebuild_row_count,omitempty"`
	ConsecutiveSkewCycles    uint32      `msgpack:"consecutive_skew_cycles,omitempty"`
	LayoutHotClusters        []uint32    `msgpack:"layout_hot_clusters,omitempty"`
	CreatedAtUnixNano        int64       `msgpack:"created_at_unix_nano"`
}

func DecodeSegmentManifest

func DecodeSegmentManifest(data []byte) (*SegmentManifest, error)

func (*SegmentManifest) NormalizeCentroidFields

func (m *SegmentManifest) NormalizeCentroidFields()

func (*SegmentManifest) ProbeBlobValue

func (m *SegmentManifest) ProbeBlobValue() []byte

func (*SegmentManifest) ProbeEpochValue

func (m *SegmentManifest) ProbeEpochValue() uint64

func (*SegmentManifest) StableBlobValue

func (m *SegmentManifest) StableBlobValue() []byte

func (*SegmentManifest) StableEpochValue

func (m *SegmentManifest) StableEpochValue() uint64

type SegmentRowLocation

type SegmentRowLocation struct {
	RowID     int64
	ClusterID int64
	Offset    uint64
}

type SegmentRowMap

type SegmentRowMap struct {
	// contains filtered or unexported fields
}

func OpenSegmentRowMap

func OpenSegmentRowMap(path string) (*SegmentRowMap, error)

func (*SegmentRowMap) Close

func (m *SegmentRowMap) Close() error

func (*SegmentRowMap) EntryCount

func (m *SegmentRowMap) EntryCount() uint64

func (*SegmentRowMap) Epoch

func (m *SegmentRowMap) Epoch() uint64

func (*SegmentRowMap) FileSize

func (m *SegmentRowMap) FileSize() int64

func (*SegmentRowMap) Generation

func (m *SegmentRowMap) Generation() uint64

func (*SegmentRowMap) Lookup

func (m *SegmentRowMap) Lookup(rowID int64) (SegmentRowLocation, bool, error)

func (*SegmentRowMap) Path

func (m *SegmentRowMap) Path() string

func (*SegmentRowMap) Scan

func (m *SegmentRowMap) Scan(visit func(SegmentRowLocation) bool) error

type SegmentRowMapWriter

type SegmentRowMapWriter struct {
	// contains filtered or unexported fields
}

func CreateSegmentRowMapWriter

func CreateSegmentRowMapWriter(path string, epoch, generation uint64) (*SegmentRowMapWriter, error)

func (*SegmentRowMapWriter) Abort

func (w *SegmentRowMapWriter) Abort()

func (*SegmentRowMapWriter) Append

func (w *SegmentRowMapWriter) Append(rowID, clusterID int64, offset uint64) error

func (*SegmentRowMapWriter) Close

func (w *SegmentRowMapWriter) Close() (*SegmentRowMap, error)

type SegmentScanStats

type SegmentScanStats struct {
	ReadBytes          uint64
	LogicalBytes       uint64
	ReadBatches        uint64
	BlockMetaReadBytes uint64
	BlockMetaReads     uint64
	BlocksConsidered   uint64
	BlocksWouldSkip    uint64
	BlocksSkipped      uint64
	BlocksScored       uint64
	BlockRowsScored    uint64
}

type StableCodecTrainingVector

type StableCodecTrainingVector struct {
	ClusterID int64
	Vec       []float32
}

type StableMemberCodec

type StableMemberCodec struct {
	// contains filtered or unexported fields
}

func BuildStableMemberCodec

func BuildStableMemberCodec(spec IVFSpec, cs *kmeans.CentroidSet, training []StableCodecTrainingVector, seed uint64) (*StableMemberCodec, error)

func DecodeStableMemberCodecBlob

func DecodeStableMemberCodecBlob(spec IVFSpec, cs *kmeans.CentroidSet, enc int64, blob []byte) (*StableMemberCodec, error)

func NewStableMemberCodec

func NewStableMemberCodec(spec IVFSpec, cs *kmeans.CentroidSet, enc int64, pq *quantize.PQ8Codec) (*StableMemberCodec, error)

func (*StableMemberCodec) DecodePrepared

func (c *StableMemberCodec) DecodePrepared(clusterID int64, vecBytes []byte) ([]float32, error)

func (*StableMemberCodec) Encode

func (c *StableMemberCodec) Encode(clusterID int64, prepared []byte) (int64, []byte, error)

func (*StableMemberCodec) EncodedSize

func (c *StableMemberCodec) EncodedSize() int

func (*StableMemberCodec) Encoding

func (c *StableMemberCodec) Encoding() int64

func (*StableMemberCodec) Validate

func (c *StableMemberCodec) Validate() error

func (*StableMemberCodec) WithCentroids

func (c *StableMemberCodec) WithCentroids(cs *kmeans.CentroidSet) (*StableMemberCodec, error)

type StableMemberQueryScorer

type StableMemberQueryScorer struct {
	// contains filtered or unexported fields
}

func NewStableMemberQueryScorerWithCodec

func NewStableMemberQueryScorerWithCodec(codec *StableMemberCodec, query []float32, queryNorm2 float32) (*StableMemberQueryScorer, error)

func (*StableMemberQueryScorer) BlockLowerBound

func (q *StableMemberQueryScorer) BlockLowerBound(clusterID int64, block SegmentBlockRecord) (float32, bool)

func (*StableMemberQueryScorer) ClusterScorer

func (q *StableMemberQueryScorer) ClusterScorer(clusterID int64) (*StableMemberScorer, error)

type StableMemberScorer

type StableMemberScorer struct {
	// contains filtered or unexported fields
}

func NewStableMemberScorer

func NewStableMemberScorer(spec IVFSpec, cs *kmeans.CentroidSet, query []float32, queryNorm2 float32, clusterID int64, enc int64) (*StableMemberScorer, error)

func NewStableMemberScorerWithCodec

func NewStableMemberScorerWithCodec(codec *StableMemberCodec, query []float32, queryNorm2 float32, clusterID int64) (*StableMemberScorer, error)

func (*StableMemberScorer) Score

func (s *StableMemberScorer) Score(vec []byte) (float32, error)

func (*StableMemberScorer) ScoreSpan

func (s *StableMemberScorer) ScoreSpan(rows []byte, entrySize int, out []float32) error

type VecSessionVars

type VecSessionVars struct {
	// Nprobe is the number of IVF clusters probed per query (0 = index default).
	Nprobe int
	// ForcePlan overrides the cost-based plan selection.
	ForcePlan ForcePlan
	// PrefilterCap is the maximum pre-filter candidate set size.
	PrefilterCap int
	// Fallback enables short-result fallback (§7.5).
	Fallback bool
	// UseGoRank enables the Go-side ranking path that bypasses per-row UDF
	// dispatch on post-filter plans (§7.6).
	UseGoRank bool

	// Auto-retrain (§8.7)
	RetrainEnabled       bool
	RetrainCheckInterval int     // seconds between retrain checks
	RetrainGrowthRatio   float64 // per-cluster growth ratio threshold (>= 1.0)
	RetrainDeltaRatio    float64 // delta/total ratio threshold (0 < x <= 1.0)
	ReindexChunkRows     int     // rows per chunked-populate txn (§8.3)
}

VecSessionVars holds per-connection @@marmot_vec_* session variables (§6.3, §6.4). Zero value is not valid; use DefaultVecSessionVars().

All fields carry the current session value. Zero Nprobe means "use index default".

func DefaultVecSessionVars

func DefaultVecSessionVars() VecSessionVars

DefaultVecSessionVars returns session vars initialised to their documented defaults.

func (*VecSessionVars) Apply

func (v *VecSessionVars) Apply(name, value string) error

Apply sets a single @@marmot_vec_* variable by its unprefixed name (e.g., "marmot_vec_nprobe") and a string value extracted from the Vitess AST literal.

Returns MARMOT-VEC-012 for an invalid enum value, a typed parse error for out-of-range numerics, and a generic error for unknown variable names.

type VectorUDFProvider

type VectorUDFProvider interface {
	// AssignNearest returns the cluster id the given vector should be
	// assigned to for the named index. Implementations should error with
	// MARMOT-VEC-013 if the index is unknown.
	AssignNearest(indexName string, vec []byte) (int64, error)

	// TopNprobeClusters returns the nearest n 1-based cluster IDs for the query
	// vector against the named index's probeState. Error MARMOT-VEC-013 if the
	// index is unknown.
	TopNprobeClusters(indexName string, vec []byte, n int) ([]int64, error)
}

VectorUDFProvider is the minimal surface a vector index engine must expose so that SQLite UDFs registered on a per-connection basis can reach back into the engine. Implementations land in P1-C.

Methods must be safe for concurrent invocation from arbitrary SQLite connection goroutines.

Directories

Path Synopsis
pkg
kmeans
Package kmeans provides k-means++ clustering and centroid management for IVF indexes.
Package kmeans provides k-means++ clustering and centroid management for IVF indexes.
metric
Package metric provides SIMD-accelerated vector distance functions.
Package metric provides SIMD-accelerated vector distance functions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL