stepinx

package
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 22, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package indexing provides document indexing pipeline steps for RAG data preparation.

Package indexing provides document indexing pipeline steps for RAG data preparation.

This package contains reusable steps for building indexing pipelines:

  • Discover: File discovery and validation
  • Multi: Multi-parser streaming document parsing
  • Semantic: Semantic chunking of documents
  • Batch: Batch embedding generation
  • Upsert: Vector storage upsert operations
  • Entities: Entity extraction for graph indexing

Example usage:

p := pipeline.New[*core.State]()
p.AddSteps(
    indexing.Discover(),
    indexing.Multi(parsers...),
    indexing.Semantic(chunker),
    indexing.Batch(embedder, metrics),
    indexing.Upsert(vectorStore, metrics),
)

err := p.Execute(ctx, &indexing.State{FilePath: "document.pdf"})

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Batch

func Batch(embedder embedding.Provider, metrics core.Metrics) pipeline.Step[*core.IndexingContext]

Batch creates a new batch embedding step with metrics collection.

Parameters:

  • embedder: embedding provider implementation
  • metrics: metrics collector (optional, can be nil)

Example:

p.AddStep(indexing.Batch(embedder, metrics))

func Chunk

Chunk creates a semantic chunking step.

Parameters:

  • chunker: semantic chunker implementation

Example:

p.AddStep(indexing.Chunk(chunker))

func Discover

func Discover() pipeline.Step[*core.IndexingContext]

Discover creates a new file discovery step.

Example:

p.AddStep(indexing.Discover())

func Entities

func Entities(extractor core.EntityExtractor, logger logging.Logger) pipeline.Step[*core.IndexingContext]

Entities creates a new entity extraction step.

Parameters:

  • extractor: entity extractor implementation
  • logger: structured logger (auto-defaults to NoopLogger if nil)

Example:

p.AddStep(indexing.Entities(extractor, logger))

func ExtractTriples

func ExtractTriples(extractor *base.TriplesExtractor, graphStore store.GraphStore, logger logging.Logger) pipeline.Step[*core.IndexingContext]

ExtractTriples creates a new step for automated knowledge graph construction. It extracts triples (Subject-Predicate-Object) from chunks and upserts them into the GraphStore.

func Multi

func Multi(parsers ...core.Parser) pipeline.Step[*core.IndexingContext]

Multi creates a new multi-parser step supporting multiple parsers. Deprecated: Use MultiFactory to prevent concurrency and state-sharing bugs.

func MultiFactory added in v1.1.3

func MultiFactory(registry *types.ParserRegistry) pipeline.Step[*core.IndexingContext]

MultiFactory creates a new multi-parser step that dynamically spawns parsers.

func MultiStore

func MultiStore(
	vectorStore core.VectorStore,
	docStore store.DocStore,
	graphStore store.GraphStore,
	logger logging.Logger,
	metrics core.Metrics,
) pipeline.Step[*core.IndexingContext]

MultiStore creates a step to store chunks, vectors, and entities to multiple backends.

func MultimodalEmbed

func MultimodalEmbed(provider embedding.MultimodalProvider, metrics core.Metrics) pipeline.Step[*core.IndexingContext]

MultimodalEmbed creates a step for multimodal vector generation.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL