Documentation
¶
Index ¶
- type CharacterChunker
- type SemanticChunker
- func (s *SemanticChunker) Chunk(ctx context.Context, doc *core.Document) ([]*core.Chunk, error)
- func (s *SemanticChunker) ContextualChunk(ctx context.Context, doc *core.Document, docSummary string) ([]*core.Chunk, error)
- func (s *SemanticChunker) HierarchicalChunk(ctx context.Context, doc *core.Document) ([]*core.Chunk, []*core.Chunk, error)
- type TokenChunker
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CharacterChunker ¶ added in v1.1.3
CharacterChunker chunks text recursively using a list of separators.
func DefaultCharacterChunker ¶ added in v1.1.3
func DefaultCharacterChunker() *CharacterChunker
NewDefaultCharacterChunker returns a CharacterChunker with optimal default parameters (Size: 1000, Overlap: 150). Ideal for quick start and simple text processing.
func NewCharacterChunker ¶ added in v1.1.3
func NewCharacterChunker(size, overlap int) *CharacterChunker
NewCharacterChunker creates a chunker splitting by runes and logical breaks
type SemanticChunker ¶ added in v1.1.3
type SemanticChunker struct {
BaseChunker core.Chunker
ParentChunkSize int
ChildChunkSize int
Overlap int
}
SemanticChunker implements core.SemanticChunker for advanced RAG techniques. It wraps a base chunker (like TokenChunker or CharacterChunker) and adds hierarchical and contextual capabilities.
func DefaultSemanticChunker ¶ added in v1.1.3
func DefaultSemanticChunker() (*SemanticChunker, error)
DefaultSemanticChunker returns a SemanticChunker with a default TokenChunker as its base. It uses standard hierarchical sizes: Parent(1000), Child(250), Overlap(50).
func NewSemanticChunker ¶ added in v1.1.3
func NewSemanticChunker(base core.Chunker, parentSize, childSize, overlap int) *SemanticChunker
NewSemanticChunker creates a new SemanticChunker wrapping a base text chunker.
func (*SemanticChunker) ContextualChunk ¶ added in v1.1.3
func (s *SemanticChunker) ContextualChunk(ctx context.Context, doc *core.Document, docSummary string) ([]*core.Chunk, error)
ContextualChunk injects a document-level summary into each chunk.
func (*SemanticChunker) HierarchicalChunk ¶ added in v1.1.3
func (s *SemanticChunker) HierarchicalChunk(ctx context.Context, doc *core.Document) ([]*core.Chunk, []*core.Chunk, error)
HierarchicalChunk creates a two-level hierarchy of chunks. Parents are larger chunks (e.g., paragraphs), Children are smaller sub-chunks (e.g., sentences).
type TokenChunker ¶ added in v1.1.3
TokenChunker chunks text based on Token count instead of character count. This ensures that generated chunks strictly adhere to LLM/Embedding context limits.
func DefaultTokenChunker ¶ added in v1.1.3
func DefaultTokenChunker() (*TokenChunker, error)
DefaultTokenChunker returns a TokenChunker with optimal default parameters (Size: 500, Overlap: 50, Model: cl100k_base). Ideal for OpenAI models.
func NewTokenChunker ¶ added in v1.1.3
func NewTokenChunker(size, overlap int, model string) (*TokenChunker, error)
NewTokenChunker creates a new TokenChunker using the specified encoding model. Common models: "cl100k_base" (for text-embedding-3, gpt-4), "cl100k_base" (gpt-3.5)