pipeline

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 21, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ChunkFunc

type ChunkFunc func(text string, basePath string) ([]ChunkWithPath, error)

ChunkFunc is a function that splits text into chunks with their hierarchical paths The path should follow ltree format (e.g., "doc.chapter1.section2.chunk3")

func DefaultChunker

func DefaultChunker(maxChunkSize int, similarityThreshold float32) ChunkFunc

DefaultChunker creates a semantic chunker that uses embeddings to identify natural boundaries It analyzes semantic similarity between sentences and creates chunks at points where similarity drops

func ParagraphChunker

func ParagraphChunker() ChunkFunc

ParagraphChunker creates a chunker that splits by paragraphs

func SentenceChunker

func SentenceChunker(maxSentencesPerChunk int) ChunkFunc

SentenceChunker creates a chunker that splits by sentences

type ChunkWithPath

type ChunkWithPath struct {
	Content    string
	Path       string // ltree path
	StartPos   *int
	EndPos     *int
	ChunkIndex *int
	Metadata   map[string]interface{}
}

ChunkWithPath represents a chunk with its hierarchical path

type EmbedFunc

type EmbedFunc func(text string) ([]float32, error)

EmbedFunc is a function that generates embeddings for text

func DefaultEmbedder

func DefaultEmbedder() (EmbedFunc, error)

DefaultEmbedder creates an embedder using a real sentence transformer model Uses the all-MiniLM-L6-v2 model which produces 384-dimensional embeddings

type EntityExtractFunc

type EntityExtractFunc func(text string) ([]*model.Entity, error)

EntityExtractFunc extracts entities from text Returns a list of entities with their types and metadata

func DefaultEntityExtractor

func DefaultEntityExtractor() (EntityExtractFunc, error)

DefaultEntityExtractor creates an entity extractor using a NER model Uses distilbert-NER for named entity recognition Detects: PERSON, ORGANIZATION, LOCATION, MISC entities

type Pipeline

type Pipeline struct {
	Chunker           ChunkFunc
	Embedder          EmbedFunc
	EntityExtractor   EntityExtractFunc   // Optional
	RelationExtractor RelationExtractFunc // Optional
}

Pipeline combines chunking and embedding functions

func NewPipeline

func NewPipeline(chunker ChunkFunc, embedder EmbedFunc) *Pipeline

NewPipeline creates a new processing pipeline

func (*Pipeline) Process

func (p *Pipeline) Process(text string, basePath string) ([]*model.Chunk, error)

Process processes text through the pipeline, returning chunks with embeddings

func (*Pipeline) ProcessWithExtraction

func (p *Pipeline) ProcessWithExtraction(text string, basePath string) (*ProcessingResult, error)

ProcessWithExtraction processes text and optionally extracts entities and relations

func (*Pipeline) SetEntityExtractor

func (p *Pipeline) SetEntityExtractor(extractor EntityExtractFunc)

SetEntityExtractor sets the entity extraction function

func (*Pipeline) SetRelationExtractor

func (p *Pipeline) SetRelationExtractor(extractor RelationExtractFunc)

SetRelationExtractor sets the relation extraction function

type ProcessingResult

type ProcessingResult struct {
	Chunks    []*model.Chunk
	Entities  []*model.Entity
	Relations []*model.Edge
}

ProcessingResult contains chunks and optionally extracted entities and relations

type RelationExtractFunc

type RelationExtractFunc func(text string, chunkID string, entities []*model.Entity) ([]*model.Edge, error)

RelationExtractFunc extracts relationships between entities or chunks Returns a list of edges representing the relationships

func DefaultRelationExtractor

func DefaultRelationExtractor() (RelationExtractFunc, error)

DefaultRelationExtractor creates a relation extractor using NER models Uses token classification to detect citation-related entities and references Detects: Citations, references, and relationships between entities

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL