stroma

module
v2.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2026 License: MIT

README

Stroma

Stroma is a neutral corpus and indexing substrate.

It owns the lowest-level operations needed to ingest text artifacts, chunk them, embed them, persist them in SQLite plus sqlite-vec, retrieve semantically close sections, and call OpenAI-compatible embedding and chat completion endpoints over a shared HTTP substrate. Callers consume Stroma through its APIs and treat the SQLite snapshot as an opaque local artifact. It does not own governance, specifications, compliance, drift analysis, prompt templates, output-schema enforcement, MCP, or CLI workflows.

Scope

Stroma is for products that need a reusable text corpus layer with:

  • canonical records with deterministic content fingerprints
  • pluggable chunking strategies (chunk.PolicyMarkdownPolicy default, KindRouterPolicy for per-record-kind dispatch, LateChunkPolicy for parent/leaf hierarchy)
  • pluggable embedders (Embedder / ContextualEmbedder) with a deterministic fixture and an OpenAI-compatible HTTP embedder
  • OpenAI-compatible chat completion client (chat.OpenAI) sharing the same substrate as embed.OpenAI: retry with Retry-After (capped), classified failures (auth / rate_limit / timeout / server / transport / schema_mismatch / dependency_unavailable), and APIToken redaction
  • hybrid retrieval: dense vector + FTS5, fused via a pluggable FusionStrategy (RRFFusion by default) with per-arm provenance surfaced to downstream rerankers
  • quantization knobs: float32 (default), int8 (4× smaller), binary (1-bit sign-packed vec0 prefilter that is 32× smaller for the prefilter representation; full-precision vectors are retained in a companion table for cosine rescoring, so total snapshot size is not 32× smaller)
  • optional Matryoshka prefilter at a truncated dimension with full-dim cosine rescore (SearchParams.SearchDimension)
  • atomic rebuilds and incremental Update with embedding reuse at the section level, chaining schema migrations v2 → v3 → v4 → v5 in one transaction

Stroma is not for:

  • spec governance
  • source discovery or repository scanning
  • code compliance or doc drift analysis
  • prompt templates, system prompts, or output-schema enforcement on chat responses
  • product-specific adapters and transports

Packages

  • corpus — canonical record model, NewRecord helper, Normalize, deterministic Fingerprint
  • chunkPolicy interface with MarkdownPolicy, KindRouterPolicy, LateChunkPolicy; MarkdownWithOptions returns ErrTooManySections when a body exceeds the DoS cap
  • embedEmbedder and ContextualEmbedder interfaces; deterministic Fixture; OpenAI-compatible HTTP embedder with MaxBatchSize batching, deadline scaling across batches, and APIToken redaction in String/GoString/LogValue
  • chat — OpenAI-compatible chat completion client (chat.OpenAI, chat.Message, ChatCompletionText); tolerates string and multi-part array content; APIToken redaction parity with embed.OpenAIConfig
  • provider — shared HTTP substrate used by embed and chat: retry with capped Retry-After, response-size bounding, and a stable FailureClass taxonomy surfaced via *provider.ProviderError. Callers branch on FailureClass to retry / degrade / propagate
  • store — SQLite readiness probes, sqlite-vec readiness, quantization blob helpers (QuantizationFloat32 / QuantizationInt8 / QuantizationBinary)
  • index — atomic Rebuild with embedding reuse, incremental Update, long-lived Snapshot readers, Stats, hybrid Search with provenance, ExpandContext for parent/neighbor walks

Example

ctx := context.Background()

records := []corpus.Record{
    corpus.NewRecord(
        "widget-overview",
        "Widget Overview",
        "# Overview\n\nWidgets are synchronized in batches.",
    ),
}

fixture, err := embed.NewFixture("fixture-demo", 16)
if err != nil {
    log.Fatal(err)
}

if _, err := index.Rebuild(ctx, records, index.BuildOptions{
    Path:     "stroma.db",
    Embedder: fixture,
}); err != nil {
    log.Fatal(err)
}

hits, err := index.Search(ctx, index.SearchQuery{
    Path: "stroma.db",
    SearchParams: index.SearchParams{
        Text:     "synchronized batches",
        Limit:    5,
        Embedder: fixture,
        // Fusion / Reranker / SearchDimension are optional; zero values
        // give hybrid RRF over vector+FTS with the full stored dimension.
    },
})
if err != nil {
    log.Fatal(err)
}

fmt.Println(hits[0].Ref)

See the v2.0.0 release notes for the full API surface.

Status

v2.0.0 (current) ships the stable substrate: hybrid retrieval, pluggable fusion, quantization, matryoshka, contextual retrieval, adaptive chunking, and incremental update. Higher-order products should consume the library rather than re-embedding their own indexing substrate.

Directories

Path Synopsis
Package chat is stroma's OpenAI-compatible chat completion primitive, the sibling of embed.OpenAI.
Package chat is stroma's OpenAI-compatible chat completion primitive, the sibling of embed.OpenAI.
Package chunk splits text bodies into heading-aware sections for indexing.
Package chunk splits text bodies into heading-aware sections for indexing.
Package corpus defines the neutral record unit that Stroma indexes.
Package corpus defines the neutral record unit that Stroma indexes.
Package embed defines embedder interfaces and a deterministic test fixture.
Package embed defines embedder interfaces and a deterministic test fixture.
Package index orchestrates atomic Stroma index rebuilds and searches.
Package index orchestrates atomic Stroma index rebuilds and searches.
Package provider is stroma's shared OpenAI-compatible HTTP substrate.
Package provider is stroma's shared OpenAI-compatible HTTP substrate.
Package store provides SQLite-backed storage primitives for Stroma indexes.
Package store provides SQLite-backed storage primitives for Stroma indexes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL