Documentation
¶
Index ¶
- type ChunkFunc
- type ChunkWithPath
- type EmbedFunc
- type EntityExtractFunc
- type Pipeline
- func (p *Pipeline) Process(text string, basePath string) ([]*model.Chunk, error)
- func (p *Pipeline) ProcessWithExtraction(text string, basePath string) (*ProcessingResult, error)
- func (p *Pipeline) SetEntityExtractor(extractor EntityExtractFunc)
- func (p *Pipeline) SetRelationExtractor(extractor RelationExtractFunc)
- type ProcessingResult
- type RelationExtractFunc
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ChunkFunc ¶
type ChunkFunc func(text string, basePath string) ([]ChunkWithPath, error)
ChunkFunc is a function that splits text into chunks with their hierarchical paths The path should follow ltree format (e.g., "doc.chapter1.section2.chunk3")
func DefaultChunker ¶
DefaultChunker creates a semantic chunker that uses embeddings to identify natural boundaries It analyzes semantic similarity between sentences and creates chunks at points where similarity drops
func ParagraphChunker ¶
func ParagraphChunker() ChunkFunc
ParagraphChunker creates a chunker that splits by paragraphs
func SentenceChunker ¶
SentenceChunker creates a chunker that splits by sentences
type ChunkWithPath ¶
type ChunkWithPath struct {
Content string
Path string // ltree path
StartPos *int
EndPos *int
ChunkIndex *int
Metadata map[string]interface{}
}
ChunkWithPath represents a chunk with its hierarchical path
type EmbedFunc ¶
EmbedFunc is a function that generates embeddings for text
func DefaultEmbedder ¶
DefaultEmbedder creates an embedder using a real sentence transformer model Uses the all-MiniLM-L6-v2 model which produces 384-dimensional embeddings
type EntityExtractFunc ¶
EntityExtractFunc extracts entities from text Returns a list of entities with their types and metadata
func DefaultEntityExtractor ¶
func DefaultEntityExtractor() (EntityExtractFunc, error)
DefaultEntityExtractor creates an entity extractor using a NER model Uses distilbert-NER for named entity recognition Detects: PERSON, ORGANIZATION, LOCATION, MISC entities
type Pipeline ¶
type Pipeline struct {
Chunker ChunkFunc
Embedder EmbedFunc
EntityExtractor EntityExtractFunc // Optional
RelationExtractor RelationExtractFunc // Optional
}
Pipeline combines chunking and embedding functions
func NewPipeline ¶
NewPipeline creates a new processing pipeline
func (*Pipeline) Process ¶
Process processes text through the pipeline, returning chunks with embeddings
func (*Pipeline) ProcessWithExtraction ¶
func (p *Pipeline) ProcessWithExtraction(text string, basePath string) (*ProcessingResult, error)
ProcessWithExtraction processes text and optionally extracts entities and relations
func (*Pipeline) SetEntityExtractor ¶
func (p *Pipeline) SetEntityExtractor(extractor EntityExtractFunc)
SetEntityExtractor sets the entity extraction function
func (*Pipeline) SetRelationExtractor ¶
func (p *Pipeline) SetRelationExtractor(extractor RelationExtractFunc)
SetRelationExtractor sets the relation extraction function
type ProcessingResult ¶
type ProcessingResult struct {
Chunks []*model.Chunk
Entities []*model.Entity
Relations []*model.Edge
}
ProcessingResult contains chunks and optionally extracted entities and relations
type RelationExtractFunc ¶
type RelationExtractFunc func(text string, chunkID string, entities []*model.Entity) ([]*model.Edge, error)
RelationExtractFunc extracts relationships between entities or chunks Returns a list of edges representing the relationships
func DefaultRelationExtractor ¶
func DefaultRelationExtractor() (RelationExtractFunc, error)
DefaultRelationExtractor creates a relation extractor using NER models Uses token classification to detect citation-related entities and references Detects: Citations, references, and relationships between entities