Documentation
¶
Overview ¶
Package chunking provides fixed-size text chunking with overlap for RAG indexing.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Chunk ¶
type Chunk struct {
// contains filtered or unexported fields
}
Chunk represents a single text chunk with its byte offset and line range in the original content.
func (Chunk) EndLine ¶
EndLine returns the 1-based line number where this chunk ends in the original content.
type ChunkParams ¶
ChunkParams configures the chunking algorithm.
func DefaultChunkParams ¶
func DefaultChunkParams() ChunkParams
DefaultChunkParams returns sensible defaults for code chunking.
type TextChunks ¶
type TextChunks struct {
// contains filtered or unexported fields
}
TextChunks holds the result of splitting content into fixed-size chunks.
func NewTextChunks ¶
func NewTextChunks(content string, params ChunkParams) (TextChunks, error)
NewTextChunks splits content into fixed-size chunks with the given parameters. Size, Overlap, and MinSize are measured in runes (Unicode code points), while the returned Chunk.Offset is a byte offset into the original string.
The algorithm uses three tiers:
- Tier 1: accumulate whole lines until the next line would exceed Size
- Tier 2: for lines exceeding Size, split on whitespace boundaries
- Tier 3: for tokens exceeding Size, split on rune boundaries