stepinx

package
v1.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package indexing provides document indexing pipeline steps for RAG data preparation.

Package indexing provides document indexing pipeline steps for RAG data preparation.

This package contains reusable steps for building indexing pipelines:

  • Discover: File discovery and validation
  • Multi: Multi-parser streaming document parsing
  • Semantic: Semantic chunking of documents
  • Batch: Batch embedding generation
  • Upsert: Vector storage upsert operations
  • Entities: Entity extraction for graph indexing

Example usage:

p := pipeline.New[*core.State]()
p.AddSteps(
    indexing.Discover(),
    indexing.Multi(parsers...),
    indexing.Semantic(chunker),
    indexing.Batch(embedder, metrics),
    indexing.Upsert(vectorStore, metrics),
)

err := p.Execute(ctx, &indexing.State{FilePath: "document.pdf"})

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Batch

func Batch(embedder embedding.Provider, metrics core.Metrics) pipeline.Step[*core.IndexingContext]

Batch creates a new batch embedding step with metrics collection.

Parameters:

  • embedder: embedding provider implementation
  • metrics: metrics collector (optional, can be nil)

Example:

p.AddStep(indexing.Batch(embedder, metrics))

func Chunk

Chunk creates a semantic chunking step.

Parameters:

  • chunker: semantic chunker implementation

Example:

p.AddStep(indexing.Chunk(chunker))

func DetectCommunities added in v1.1.6

func DetectCommunities(detector core.CommunityDetector, graphStore core.GraphStore, logger logging.Logger) pipeline.Step[*core.IndexingContext]

DetectCommunities creates a step that detects communities in the knowledge graph. Communities are hierarchical groups of related nodes that enable: - Global search (searching community summaries) - Hierarchical summarization (multi-level understanding)

func Discover

func Discover() pipeline.Step[*core.IndexingContext]

Discover creates a new file discovery step.

Example:

p.AddStep(indexing.Discover())

func Entities

func Entities(extractor core.EntityExtractor, logger logging.Logger) pipeline.Step[*core.IndexingContext]

Entities creates a new entity extraction step.

Parameters:

  • extractor: entity extractor implementation
  • logger: structured logger (auto-defaults to NoopLogger if nil)

Example:

p.AddStep(indexing.Entities(extractor, logger))

func ExtractTriples

func ExtractTriples(extractor core.TriplesExtractor, graphStore core.GraphStore, logger logging.Logger) pipeline.Step[*core.IndexingContext]

ExtractTriples creates a new step for automated knowledge graph construction. It extracts triples (Subject-Predicate-Object) from chunks and upserts them into the GraphStore. Following Microsoft GraphRAG design: nodes and edges are bound to source chunks.

func GenerateSummaries added in v1.1.6

func GenerateSummaries(llm chat.Client, graphStore core.GraphStore, logger logging.Logger) pipeline.Step[*core.IndexingContext]

GenerateSummaries creates a step that generates summaries for detected communities. Following Microsoft GraphRAG design, this enables global search by providing high-level descriptions of community content.

func Multi

func Multi(parsers ...core.Parser) pipeline.Step[*core.IndexingContext]

Multi creates a new multi-parser step supporting multiple parsers. Deprecated: Use MultiFactory to prevent concurrency and state-sharing bugs.

func MultiFactory added in v1.1.3

func MultiFactory(registry *types.ParserRegistry) pipeline.Step[*core.IndexingContext]

MultiFactory creates a new multi-parser step that dynamically spawns parsers.

func MultiStore

func MultiStore(
	vectorStore core.VectorStore,
	docStore core.DocStore,
	graphStore core.GraphStore,
	logger logging.Logger,
	metrics core.Metrics,
) pipeline.Step[*core.IndexingContext]

MultiStore creates a step to store chunks, vectors, and entities to multiple backends.

func MultimodalEmbed

func MultimodalEmbed(provider embedding.MultimodalProvider, metrics core.Metrics) pipeline.Step[*core.IndexingContext]

MultimodalEmbed creates a step for multimodal vector generation.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL