index

package

v2.3.0 Latest Latest Go to latest Published: May 2, 2026 License: MIT Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dusk-network/stroma

Links

Open Source Insights

Documentation ¶

Overview ¶

Package index orchestrates atomic Stroma index rebuilds and searches.

Index ¶

Constants
Variables
type ArmEvidence
type BuildOptions
type BuildResult
- func Rebuild(ctx context.Context, records []corpus.Record, options BuildOptions) (*BuildResult, error)
- func RebuildFromSource(ctx context.Context, source RecordSource, options BuildOptions) (*BuildResult, error)
type ChunkContextualizer
type ContextOptions
type FusionStrategy
- func DefaultFusion() FusionStrategy
type HitProvenance
type RRFFusion
- func (r RRFFusion) Fuse(arms []RetrievalArm, limit int) ([]SearchHit, error)
type RecordQuery
type RecordSource
type RecordSourceFunc
- func (f RecordSourceFunc) Next(ctx context.Context) (corpus.Record, bool, error)
type Reranker
type RetrievalArm
type ReuseStatus
type SearchHit
- func Search(ctx context.Context, query SearchQuery) ([]SearchHit, error)
type SearchParams
type SearchQuery
type Section
type SectionQuery
type Snapshot
- func OpenSnapshot(ctx context.Context, path string) (*Snapshot, error)
- func (s *Snapshot) Close() error
- func (s *Snapshot) ExpandContext(ctx context.Context, chunkID int64, opts ContextOptions) ([]Section, error)
- func (s *Snapshot) Path() string
- func (s *Snapshot) Records(ctx context.Context, query RecordQuery) ([]corpus.Record, error)
- func (s *Snapshot) Search(ctx context.Context, query SnapshotSearchQuery) ([]SearchHit, error)
- func (s *Snapshot) SearchVector(ctx context.Context, query VectorSearchQuery) ([]SearchHit, error)
- func (s *Snapshot) Sections(ctx context.Context, query SectionQuery) ([]Section, error)
- func (s *Snapshot) Stats(ctx context.Context) (*Stats, error)
type SnapshotSearchQuery
type Stats
- func ReadStats(ctx context.Context, path string) (*Stats, error)
type UpdateOptions
type UpdateResult
- func Update(ctx context.Context, added []corpus.Record, removed []string, ...) (*UpdateResult, error)
type VectorSearchQuery

Constants ¶

View Source

const (
	ArmVector = "vector"
	ArmFTS    = "fts"
)

Arm name constants used by the default Snapshot.Search pipeline. Custom FusionStrategy implementations may introduce additional arm names.

View Source

const DefaultMaxChunkSections = 10_000

DefaultMaxChunkSections caps the number of heading-aware sections a single record can contribute to the index when the caller hasn't overridden it. 10,000 is generous for legitimate technical documents (few real specs exceed a few hundred headings) while still preventing a pathological or hostile body from expanding into millions of embedder calls + rows.

View Source

const DefaultSearchLimit = 10

DefaultSearchLimit is the hit cap applied to Snapshot.Search and Snapshot.SearchVector when SearchParams.Limit / VectorSearchQuery.Limit is zero or negative. The choice is conservative; pick an explicit Limit if throughput matters or if the caller needs a stable shortlist size across snapshots.

View Source

const MaxSearchLimit = 250

MaxSearchLimit is the largest accepted SearchParams.Limit or VectorSearchQuery.Limit. Search uses bounded in-memory shortlists for vector/FTS fusion and reranking; callers needing more than this should page or shard at a higher layer rather than relying on an unbounded single-query scan.

Variables ¶

View Source

var ErrStaleUpdatePlan = errors.New("index changed while planning update")

ErrStaleUpdatePlan signals that Update planned added records against one committed snapshot, but the snapshot content changed before the write transaction applied those plans. Callers can retry the Update so chunk reuse and embeddings are recomputed against the new base snapshot.

View Source

var ErrUnsupportedSchemaVersion = errors.New("unsupported snapshot schema version")

ErrUnsupportedSchemaVersion is returned when an operation encounters a snapshot whose schema_version is neither the current schema nor one the library knows how to migrate from. It is surfaced by OpenSnapshot and wrapped via fmt.Errorf with %w so callers can use errors.Is to detect it.

View Source

var ErrUpdateCommittedIntegrityCheckFailed = errors.New("update committed but post-commit integrity check failed")

ErrUpdateCommittedIntegrityCheckFailed signals that Update's transaction committed successfully — the record, chunk, and metadata changes are durable on disk — but the post-commit PRAGMA integrity_check / foreign_key_check reported corruption. The enclosing error wraps this sentinel via fmt.Errorf with %w so callers can use errors.Is to detect it. This case is non-retriable: re-running Update will not unroll the already-durable changes, and the underlying file likely needs operator inspection (see index/ARCHITECTURE.md). Contrast with plain errors returned by Update, which come from pre-commit failures and leave the file byte-identical to its pre-call state.

View Source

var ErrUpdatePlanTooLarge = errors.New("update plan exceeds MaxPlannedRecords")

ErrUpdatePlanTooLarge signals that UpdateOptions.MaxPlannedRecords rejected the added-record set before Update opened the write transaction. Callers can split added records into smaller Update calls and retry.

Functions ¶

This section is empty.

Types ¶

type ArmEvidence ¶

type ArmEvidence struct {
	// Rank is the zero-based position of the hit within the arm.
	Rank int
	// Score is the arm-native score at the time the arm returned the hit
	// (cosine derivative for vector, negative bm25 for FTS).
	Score float64
}

ArmEvidence is one arm's contribution to a fused hit.

type BuildOptions ¶

type BuildOptions struct {
	// Path is the OS-native filesystem path where the built snapshot
	// is written. On Windows both forward and back slashes are
	// accepted — the store package normalizes drive prefixes on open.
	Path string

	// ReuseFromPath points at an existing Stroma snapshot whose embeddings
	// should be reused at the section level: a new section reuses its
	// stored embedding whenever its title, heading, and body match a
	// section already present in the prior snapshot. Records that are
	// fully unchanged are the maximal case, but sections carried over
	// from an edited record still reuse their embeddings. The snapshot is
	// opened read-only and queried per-record during the rebuild, so
	// resident memory scales with a single record's chunks rather than
	// with the whole corpus. Leave empty to disable reuse.
	ReuseFromPath string

	Embedder embed.Embedder

	// Contextualizer optionally produces a per-chunk prefix string that
	// gets prepended before the embedding text and the FTS5 content. When
	// set, the prefix persists on the chunk and participates in reuse
	// keying so a changed contextualizer invalidates stale reuse without
	// corrupting the stored representation. Nil disables contextualization
	// and leaves the build identical to the non-contextual path.
	Contextualizer ChunkContextualizer

	// MaxChunkTokens sets the approximate maximum number of tokens (words)
	// per chunk. Sections that exceed this limit are split into smaller
	// sub-sections. Zero disables token-budget splitting.
	MaxChunkTokens int

	// ChunkOverlapTokens sets the approximate number of overlapping tokens
	// between adjacent sub-sections when a section is split. Zero disables
	// overlap.
	ChunkOverlapTokens int

	// MaxChunkSections caps how many sections any single record is allowed
	// to produce. A pathological Markdown body (e.g., 10^6 heading lines)
	// would otherwise translate into 10^6 embedder calls and 10^6
	// chunk/vector rows — a DoS vector for shared embedders. Zero means
	// DefaultMaxChunkSections; a negative value disables the cap for
	// callers who have their own upstream validation. When the cap is
	// exceeded, Rebuild returns an error wrapping chunk.ErrTooManySections
	// instead of silently admitting the record.
	MaxChunkSections int

	// Quantization controls the vector storage format. See the
	// store.Quantization* constants for the accept-listed values:
	// store.QuantizationFloat32 (default), store.QuantizationInt8 (4x
	// smaller, minor precision loss), and store.QuantizationBinary
	// (32x smaller via 1-bit sign packing, full-precision rescore on a
	// companion table preserves ranking).
	Quantization string

	// ChunkPolicy selects the chunking strategy. Nil defaults to
	// chunk.MarkdownPolicy{Options: chunk.Options{
	//     MaxTokens: MaxChunkTokens,
	//     OverlapTokens: ChunkOverlapTokens,
	//     MaxSections: <resolved>,
	// }}, which reproduces the pre-1.0 chunking pipeline exactly.
	// Setting a non-nil policy overrides the per-Build chunking shape:
	// MaxChunkTokens/ChunkOverlapTokens/MaxChunkSections are read by
	// the default MarkdownPolicy but ignored when ChunkPolicy is set
	// (the policy carries its own configuration). Hierarchical
	// policies like chunk.LateChunkPolicy emit parent + leaf chunks
	// linked via parent_chunk_id; ExpandContext can surface the
	// parent on demand. See docs/superpowers/specs for the design.
	ChunkPolicy chunk.Policy
}

BuildOptions controls how a Stroma index is rebuilt.

type BuildResult ¶

type BuildResult struct {
	Path                string
	RecordCount         int
	ChunkCount          int
	ReusedRecordCount   int
	ReusedChunkCount    int
	EmbeddedChunkCount  int
	ReuseStatus         ReuseStatus
	ReuseDisabledReason string
	EmbedderDimension   int
	EmbedderFingerprint string
	ContentFingerprint  string
}

BuildResult summarizes a completed rebuild.

func Rebuild ¶

func Rebuild(ctx context.Context, records []corpus.Record, options BuildOptions) (*BuildResult, error)

Rebuild atomically recreates the index at the requested path.

func RebuildFromSource ¶ added in v2.3.0

func RebuildFromSource(ctx context.Context, source RecordSource, options BuildOptions) (*BuildResult, error)

RebuildFromSource atomically recreates the index at the requested path from a streaming record source.

Unlike Rebuild, this API does not require callers to materialize a []corpus.Record with every BodyText resident at once. Records are consumed one at a time in source order, normalized, chunked, embedded, and flushed in bounded internal batches. Duplicate refs are rejected by the staging snapshot's primary key. Source order determines snapshot-local chunk IDs; callers that need repeatable chunk IDs across streaming rebuilds should emit records in a stable order.

type ChunkContextualizer ¶

type ChunkContextualizer interface {
	ContextualizeChunks(ctx context.Context, record corpus.Record, sections []chunk.Section) ([]string, error)
}

ChunkContextualizer produces a short explanatory prefix for each section of a record. The returned slice must be the same length as sections and aligned with it index-for-index. An empty prefix is allowed and disables contextual retrieval for that section. The returned prefix is prepended to the embedding text and to the FTS5 content column; it is persisted so reuse keying can detect when a changed contextualizer needs to invalidate the stored embedding.

type ContextOptions ¶

type ContextOptions struct {
	// IncludeParent walks the requested chunk's parent_chunk_id one level
	// up and includes the parent row in the returned slice when the chunk
	// has a parent. Multi-level ancestry walks are explicit recursion by
	// the caller.
	//
	// Against snapshots built before schema v5 (#16), there is no
	// parent_chunk_id column to walk; IncludeParent is a no-op.
	IncludeParent bool

	// NeighborWindow includes up to N sibling chunks on each side of the
	// requested chunk, ordered by chunk_index. Two chunks are siblings
	// when they share the same parent_chunk_id (NULL counts as a single
	// sibling group), so for a leaf the neighborhood stays inside the
	// same parent span and for a flat or parent chunk the neighborhood
	// is other top-level chunks under the same record. Zero means no
	// neighbors are included; the requested chunk is still returned by
	// itself.
	//
	// Against snapshots built before schema v5 (#16), the parent grouping
	// is unavailable, so neighbors degrade to "other chunks in the same
	// record_ref with chunk_index in the requested window."
	NeighborWindow int
}

ContextOptions controls how Snapshot.ExpandContext widens a single chunk hit into a local-context payload.

type FusionStrategy ¶

type FusionStrategy interface {
	Fuse(arms []RetrievalArm, limit int) ([]SearchHit, error)
}

FusionStrategy combines one or more RetrievalArms into a single ranked list, truncated to limit. Implementations must be deterministic and must attach HitProvenance to every returned hit covering each arm that contributed.

Fuse returns an error when inputs are malformed (for example Available=true with a non-nil Err, or an arm with an empty Name) or when the strategy fails closed on an upstream arm error. Callers treat errors the same way as any other retrieval failure. Strategies that want to tolerate partial-arm failures do so internally and return a nil error.

Aliasing contract: implementations must treat each input arm's Hits slice and every SearchHit it contains as read-only. They must not mutate Hit fields, must not mutate a Hit's Metadata map (which may alias storage shared across arms when the same ChunkID matched on more than one retrieval path), and must return a freshly allocated []SearchHit rather than repurposing an input arm's slice.

func DefaultFusion ¶

func DefaultFusion() FusionStrategy

DefaultFusion returns the FusionStrategy used when SearchQuery.Fusion is nil. Ordering is identical to pre-#17 Snapshot.Search on every path, and SearchHit.Score is identical on every path except one: when the vector arm returns zero hits and the FTS arm is non-empty, DefaultFusion preserves the bm25-derived arm-native Score instead of the pre-#17 RRF-rewritten score. Callers who read Score on that specific path can recover both the arm-native and pre-#17-style scores via the HitProvenance attached to each hit.

type HitProvenance ¶

type HitProvenance struct {
	Arms map[string]ArmEvidence
}

HitProvenance records which arms found a fused hit. The map is keyed by arm name; arms that did not return the hit are absent from the map.

type RRFFusion ¶

type RRFFusion struct {
	K                      int
	PreserveSingleArmScore bool
}

RRFFusion is the default FusionStrategy. K controls the RRF constant; K<=0 is treated as K=60 for backward compatibility with the pre-#17 mergeRRF helper.

PreserveSingleArmScore controls the single-arm degenerate case. When true (the default used by DefaultFusion) and exactly one arm is available-and-non-empty, Fuse returns that arm's hits in arm order with arm-native Score preserved. When false, Fuse rewrites Score to the RRF-derived 1/(K+rank+1) on every path. Callers that want numerically uniform fused scores across single-arm and multi-arm paths opt in by setting this to false.

func (RRFFusion) Fuse ¶

func (r RRFFusion) Fuse(arms []RetrievalArm, limit int) ([]SearchHit, error)

Fuse implements FusionStrategy. See RRFFusion for the single-arm contract. Ties in RRF score are broken by (more contributing arms first) then (better cross-arm rank first).

type RecordQuery ¶

type RecordQuery struct {
	Refs  []string
	Kinds []string
}

RecordQuery filters records from an opened snapshot.

type RecordSource ¶ added in v2.3.0

type RecordSource interface {
	Next(ctx context.Context) (corpus.Record, bool, error)
}

RecordSource streams records into RebuildFromSource.

Next returns the next record and true while input remains. Returning false ends the stream and ignores the returned record. Implementations should return any loading or decoding failure directly. RebuildFromSource calls Next serially with a non-nil context, propagates source errors, and leaves the destination snapshot unchanged.

type RecordSourceFunc ¶ added in v2.3.0

type RecordSourceFunc func(ctx context.Context) (corpus.Record, bool, error)

RecordSourceFunc adapts a function to RecordSource.

func (RecordSourceFunc) Next ¶ added in v2.3.0

func (f RecordSourceFunc) Next(ctx context.Context) (corpus.Record, bool, error)

Next calls f(ctx). A nil function panics like any other nil function call.

type Reranker ¶

type Reranker interface {
	Rerank(ctx context.Context, query string, candidates []SearchHit) ([]SearchHit, error)
}

Reranker optionally refines one search candidate shortlist before the final limit truncation.

Aliasing contract: implementations must treat the input candidates slice and every SearchHit it contains as read-only. They must not mutate Hit fields, must not mutate a Hit's Metadata map (which may alias storage shared with other hits), and must not return the input slice — return a freshly allocated []SearchHit instead. Snapshot.Search defensively shallow-copies the candidates slice before handing it to the reranker, but that copy is shallow so maps and sub-slices inside each SearchHit remain shared. Reorderings and truncations are fine; mutations are not.

type RetrievalArm ¶

type RetrievalArm struct {
	Name      string
	Hits      []SearchHit
	Available bool
	Err       error
}

RetrievalArm is one candidate list from one retrieval path, ordered by the arm's own ranking. Hits[i].Score is the arm-native score (cosine distance derivative for vector, negative bm25-equivalent for FTS).

Available and Err distinguish three otherwise identical-looking states:

Available=true, Err=nil, len(Hits)==0: arm ran, zero matches.
Available=false, Err=nil: arm unavailable on this snapshot (for example a legacy snapshot without fts_chunks). Hits must be empty.
Available=false, Err!=nil: arm failed. Hits must be empty.

Available=true with a non-nil Err is invalid; FusionStrategy implementations should return an error when they observe it.

type ReuseStatus ¶ added in v2.3.0

type ReuseStatus string

ReuseStatus reports whether BuildOptions.ReuseFromPath was usable during Rebuild. Reuse setup remains non-fatal by default; callers can inspect BuildResult.ReuseStatus and BuildResult.ReuseDisabledReason to distinguish "nothing reusable" from "reuse could not start".

const (
	// ReuseStatusDisabled means BuildOptions.ReuseFromPath was empty.
	ReuseStatusDisabled ReuseStatus = "disabled"
	// ReuseStatusActive means the prior snapshot opened and passed
	// compatibility checks, so section-level reuse was attempted.
	ReuseStatusActive ReuseStatus = "active"
	// ReuseStatusUnavailable means the configured path did not name a
	// readable snapshot file, for example because it was missing or a
	// directory.
	ReuseStatusUnavailable ReuseStatus = "unavailable"
	// ReuseStatusIncompatible means the snapshot exists but cannot seed
	// this build because schema, embedder, dimension, or quantization
	// metadata does not match.
	ReuseStatusIncompatible ReuseStatus = "incompatible"
	// ReuseStatusError means setup hit an operational error while
	// checking the configured snapshot.
	ReuseStatusError ReuseStatus = "error"
)

type SearchHit ¶

type SearchHit struct {
	ChunkID   int64
	Ref       string
	Kind      string
	Title     string
	SourceRef string
	Heading   string
	Content   string
	Metadata  map[string]string
	Score     float64
	// Provenance records which retrieval arms contributed to this hit.
	// It is populated by FusionStrategy implementations; non-fusion paths
	// (SearchVector, direct searchFTS callers) leave it nil.
	Provenance *HitProvenance
}

SearchHit is one retrieved section.

func Search ¶

func Search(ctx context.Context, query SearchQuery) ([]SearchHit, error)

Search returns semantically close sections from an existing index.

type SearchParams ¶

type SearchParams struct {
	// Text is the free-form query text. Empty rejects with a
	// "search text is required" error — this field has no default.
	Text string
	// Limit caps the number of SearchHits returned. Zero or negative
	// selects DefaultSearchLimit (10). Values above MaxSearchLimit
	// reject with an error instead of being silently capped.
	Limit int
	// Kinds filters candidate records to the supplied kind list. Nil
	// or empty means "no filter, all kinds".
	Kinds []string
	// Embedder produces the query vector(s) used by the dense arm.
	// Nil rejects with a "search embedder is required" error — this
	// field has no default.
	Embedder embed.Embedder
	// Fusion optionally overrides the hybrid fusion strategy. Nil
	// uses DefaultFusion().
	Fusion FusionStrategy
	// Reranker optionally refines the candidate shortlist after
	// fusion. Nil skips reranking.
	Reranker Reranker

	// SearchDimension optionally runs a truncated-prefix vector prefilter
	// at this dimension, then rescores the shortlist with full-dim cosine.
	// Zero (default) uses the full stored dimension throughout. Positive
	// values must be <= the stored embedder dimension. Only valid when the
	// stored quantization is float32; returns an error against int8 indexes.
	// This is the shape Matryoshka Representation Learning (MRL) embeddings
	// rely on — callers who use non-MRL embeddings should leave it zero.
	//
	// The truncated path is a brute-force scan over chunks_vec, not a
	// vec0 kNN MATCH, so it is not asymptotically cheaper than the default
	// path: its win is constant-factor (fewer floats per cosine) and only
	// pays off when the truncated prefix preserves ranking. Treat this as
	// a tuning knob for MRL snapshots rather than a blanket speedup.
	SearchDimension int
}

SearchParams are the retrieval parameters shared by SearchQuery (the top-level one-shot API against an index path) and SnapshotSearchQuery (the long-lived API against an open Snapshot). Extracting the shared shape lets downstream adapters thread one value through both surfaces and lets the top-level Search forward its params verbatim instead of hand-copying six fields.

type SearchQuery ¶

type SearchQuery struct {
	// Path is the OS-native filesystem path to the snapshot. On
	// Windows both forward and back slashes are accepted — the store
	// package normalizes drive prefixes on open.
	Path string
	SearchParams
}

SearchQuery defines one semantic search against an index path. Retrieval parameters live on the embedded SearchParams so the same shape flows through Search, Snapshot.Search, and any downstream adapter wrapper.

type Section ¶

type Section struct {
	ChunkID       int64
	Ref           string
	Kind          string
	Title         string
	SourceRef     string
	Heading       string
	Content       string
	ContextPrefix string
	Metadata      map[string]string
	Embedding     []float64
}

Section is one stored section from a Stroma snapshot.

type SectionQuery ¶

type SectionQuery struct {
	Refs  []string
	Kinds []string

	// IncludeEmbeddings asks Sections() to populate Section.Embedding
	// from the stored vector column. Snapshots produced by hierarchical
	// policies (e.g., chunk.LateChunkPolicy) hold parent rows that are
	// storage-only context with no vector — those rows are filtered
	// out of an IncludeEmbeddings = true query because the underlying
	// chunks → chunks_vec join is inner. Set IncludeEmbeddings = false
	// to receive every chunk row (parents + leaves) without embeddings.
	IncludeEmbeddings bool
}

SectionQuery filters sections from an opened snapshot.

type Snapshot ¶

type Snapshot struct {
	// contains filtered or unexported fields
}

Snapshot is one opened Stroma index snapshot.

Safe for concurrent use by multiple goroutines once returned from OpenSnapshot: *sql.DB is goroutine-safe per the database/sql contract, and all Snapshot read methods (Stats, Records, Sections, Search, SearchVector, ExpandContext) invoke it through that contract. Cached metadata fields (quantization, storedDimension, hasFTS, …) are populated at open time and read-only thereafter, so no additional synchronization is required around Snapshot itself.

func OpenSnapshot ¶

func OpenSnapshot(ctx context.Context, path string) (*Snapshot, error)

OpenSnapshot opens a read-only Stroma snapshot at path. The path is OS-native; on Windows both forward and back slashes are accepted (the store package normalizes drive prefixes on open). The snapshot's schema_version metadata must be one of the accept-listed versions — schemaVersion (current), prevSchemaVersion, legacySchemaVersionV3, or legacySchemaVersionV2 — all of which read paths can decode directly without forcing an Update. Anything else returns ErrUnsupportedSchemaVersion wrapped with the observed version, so callers can surface a clear upgrade/downgrade message instead of silently misdecoding data against a future schema.

The returned *Snapshot is safe for concurrent use by multiple goroutines once constructed: *sql.DB is goroutine-safe per the database/sql contract, and Snapshot's cached metadata fields are populated at open time and read-only thereafter.

func (*Snapshot) Close ¶

func (s *Snapshot) Close() error

Close releases the opened snapshot handle.

func (*Snapshot) ExpandContext ¶

func (s *Snapshot) ExpandContext(ctx context.Context, chunkID int64, opts ContextOptions) ([]Section, error)

ExpandContext returns the chunk identified by chunkID together with the caller-requested local context, in document order:

[parent (if IncludeParent and the chunk has one), neighbors before,
 the chunk itself, neighbors after]

The chunk itself is always included, so callers do not have to reconcile the original SearchHit with the expansion. Embeddings are never populated by ExpandContext — the API is for context retrieval, not for re-ranking against fresh vectors. Callers that need embeddings should use Sections() with IncludeEmbeddings = true.

Returns an empty slice + nil error when chunkID does not exist; the substrate treats "no such chunk" as an empty result rather than an error, matching the section-read APIs.

Against snapshots built before schema v5 (#16), the v5 lineage column is absent: IncludeParent becomes a no-op and NeighborWindow scopes by record_ref alone (no parent grouping). ExpandContext stays useful on legacy files; it just cannot surface lineage that was never recorded.

Internally ExpandContext issues a small bounded number of parameterized reads: at most one to locate the requested chunk, one to fetch the parent (when IncludeParent + parent_chunk_id present), and one range scan over the sibling window. There is no per-result parameter expansion (no `WHERE id IN (?, ?, ?, ...)`), so the query never approaches SQLite's parameter cap regardless of NeighborWindow.

func (*Snapshot) Path ¶

func (s *Snapshot) Path() string

Path returns the opened snapshot path.

func (*Snapshot) Records ¶

func (s *Snapshot) Records(ctx context.Context, query RecordQuery) ([]corpus.Record, error)

Records returns records from the opened snapshot.

func (*Snapshot) Search ¶

func (s *Snapshot) Search(ctx context.Context, query SnapshotSearchQuery) ([]SearchHit, error)

Search runs a hybrid text search (vector + FTS5) against the opened snapshot.

func (*Snapshot) SearchVector ¶

func (s *Snapshot) SearchVector(ctx context.Context, query VectorSearchQuery) ([]SearchHit, error)

SearchVector runs a vector search against the opened snapshot.

func (*Snapshot) Sections ¶

func (s *Snapshot) Sections(ctx context.Context, query SectionQuery) ([]Section, error)

Sections returns sections from the opened snapshot.

func (*Snapshot) Stats ¶

func (s *Snapshot) Stats(ctx context.Context) (*Stats, error)

Stats inspects the opened snapshot.

type SnapshotSearchQuery ¶

type SnapshotSearchQuery struct {
	SearchParams
}

SnapshotSearchQuery defines one text search against an opened snapshot. Retrieval parameters live on the embedded SearchParams so the same value can be forwarded verbatim from SearchQuery.SearchParams without hand-copying fields.

type Stats ¶

type Stats struct {
	Path                string
	RecordCount         int
	ChunkCount          int
	KindCounts          map[string]int
	SchemaVersion       string
	EmbedderDimension   int
	EmbedderFingerprint string
	ContentFingerprint  string
	CreatedAt           string
}

Stats describes a built Stroma index.

func ReadStats ¶

func ReadStats(ctx context.Context, path string) (*Stats, error)

ReadStats inspects an existing index.

type UpdateOptions ¶

type UpdateOptions struct {
	// Path is the OS-native filesystem path to the existing snapshot
	// to update in place. On Windows both forward and back slashes are
	// accepted — the store package normalizes drive prefixes on open.
	Path     string
	Embedder embed.Embedder

	// Contextualizer optionally produces a per-chunk prefix string. See
	// BuildOptions.Contextualizer for the contract. Leaving it nil
	// preserves the non-contextual path and produces chunks with an
	// empty persisted prefix.
	Contextualizer ChunkContextualizer

	// MaxChunkTokens sets the approximate maximum number of tokens (words)
	// per chunk. It should match the chunking policy used to build the current
	// index if callers want incremental updates to remain section-compatible.
	MaxChunkTokens int

	// ChunkOverlapTokens sets the approximate number of overlapping tokens
	// between adjacent sub-sections when a section is split. It should match
	// the chunking policy used to build the current index.
	ChunkOverlapTokens int

	// MaxChunkSections mirrors BuildOptions.MaxChunkSections for the
	// incremental-update path. Zero → DefaultMaxChunkSections; negative
	// → no cap.
	MaxChunkSections int

	// MaxPlannedRecords caps how many added/replaced records Update will
	// chunk, reuse-plan, and embed before opening its write transaction.
	// This bounds resident pre-transaction plan memory for callers that
	// split large ingests into repeated Update calls. Zero keeps the
	// historical unbounded behavior; negative values reject. The cap
	// applies only to added/replaced records, not removals.
	MaxPlannedRecords int

	// Quantization, when provided, must match the existing index — see
	// the store.Quantization* constants (float32, int8, binary) for the
	// accept-listed values. Leaving it empty reuses the stored
	// quantization metadata.
	Quantization string

	// ChunkPolicy mirrors BuildOptions.ChunkPolicy for the incremental
	// update path. Nil defaults to chunk.MarkdownPolicy with the
	// MaxChunkTokens / ChunkOverlapTokens / MaxChunkSections knobs
	// resolved here. The substrate does not enforce that the policy
	// matches the one used to build the snapshot — callers who switch
	// policies between Build and Update should expect reuse cache
	// misses on the affected sections (the leaves still re-embed
	// correctly; the snapshot just won't share embeddings across
	// rebuilds).
	ChunkPolicy chunk.Policy
}

UpdateOptions controls how an existing Stroma index is updated in place.

type UpdateResult ¶

type UpdateResult struct {
	Path                string
	UpsertedCount       int
	RemovedCount        int
	RecordCount         int
	ChunkCount          int
	ReusedRecordCount   int
	ReusedChunkCount    int
	EmbeddedChunkCount  int
	EmbedderDimension   int
	EmbedderFingerprint string
	ContentFingerprint  string
}

UpdateResult summarizes one incremental update.

func Update ¶

func Update(ctx context.Context, added []corpus.Record, removed []string, options UpdateOptions) (*UpdateResult, error)

Update applies add, replace, and remove operations to an existing Stroma index without rebuilding it from scratch.

type VectorSearchQuery ¶

type VectorSearchQuery struct {
	// Embedding is the precomputed query vector. Empty rejects with
	// a "search embedding is required" error — this field has no
	// default.
	Embedding []float64
	// Limit caps the number of SearchHits returned. Zero or negative
	// selects DefaultSearchLimit (10). Values above MaxSearchLimit
	// reject with an error instead of being silently capped.
	Limit int
	// Kinds filters candidate records to the supplied kind list. Nil
	// or empty means "no filter, all kinds".
	Kinds []string
}

VectorSearchQuery defines one vector search against an opened snapshot.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL