chunking

package
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2026 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package chunking provides fixed-size text chunking with overlap for RAG indexing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Chunk

type Chunk struct {
	// contains filtered or unexported fields
}

Chunk represents a single text chunk with its byte offset and line range in the original content.

func (Chunk) Content

func (c Chunk) Content() string

Content returns the chunk text.

func (Chunk) EndLine

func (c Chunk) EndLine() int

EndLine returns the 1-based line number where this chunk ends in the original content.

func (Chunk) Offset

func (c Chunk) Offset() int

Offset returns the byte offset of this chunk in the original content.

func (Chunk) StartLine

func (c Chunk) StartLine() int

StartLine returns the 1-based line number where this chunk begins in the original content.

type ChunkParams

type ChunkParams struct {
	Size    int
	Overlap int
	MinSize int
}

ChunkParams configures the chunking algorithm.

func DefaultChunkParams

func DefaultChunkParams() ChunkParams

DefaultChunkParams returns sensible defaults for code chunking.

type TextChunks

type TextChunks struct {
	// contains filtered or unexported fields
}

TextChunks holds the result of splitting content into fixed-size chunks.

func NewTextChunks

func NewTextChunks(content string, params ChunkParams) (TextChunks, error)

NewTextChunks splits content into fixed-size chunks with the given parameters. Size, Overlap, and MinSize are measured in runes (Unicode code points), while the returned Chunk.Offset is a byte offset into the original string.

The algorithm uses three tiers:

  • Tier 1: accumulate whole lines until the next line would exceed Size
  • Tier 2: for lines exceeding Size, split on whitespace boundaries
  • Tier 3: for tokens exceeding Size, split on rune boundaries

func (TextChunks) All

func (t TextChunks) All() []Chunk

All returns all chunks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL