chunker

package
v0.0.0-beta Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package chunker splits file content into overlapping token-limited chunks for embedding-based search.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Chunk

type Chunk struct {
	Index     int    // 0-based chunk index within the file.
	Content   string // Chunk text.
	StartLine int    // 1-based start line (inclusive).
	EndLine   int    // 1-based end line (inclusive).
}

Chunk represents a piece of a file with positional metadata.

type Chunker

type Chunker struct {
	// contains filtered or unexported fields
}

Chunker splits file content into overlapping chunks.

func New

func New(config Config) *Chunker

New creates a Chunker with the given config.

func (*Chunker) Chunk

func (c *Chunker) Chunk(ctx context.Context, path, content string) Chunks

Chunk splits content into token-limited chunks with overlap. Small files that fit within MaxTokens are returned as a single chunk.

func (*Chunker) Estimate

func (c *Chunker) Estimate(text string) int

Estimate returns the token count for text using the configured estimator.

func (*Chunker) OverlapStart

func (c *Chunker) OverlapStart(lines []string, endLine int) int

OverlapStart walks backward from endLine to find where overlap should begin. Returns a 1-based line number.

type Chunks

type Chunks []Chunk

Chunks is a collection of chunks from a single file.

func (Chunks) Contents

func (cs Chunks) Contents() []string

Contents returns the text content of all chunks.

type Config

type Config struct {
	Strategy      Strategy         // Chunking strategy (default: auto).
	MaxTokens     int              // Target max tokens per chunk (default: 500).
	OverlapTokens int              // Overlap tokens between adjacent chunks (default: 75).
	Estimate      func(string) int // Token estimation function.
}

Config holds chunking parameters.

func (*Config) ApplyDefaults

func (c *Config) ApplyDefaults()

ApplyDefaults sets defaults for zero-valued fields.

type Strategy

type Strategy string

Strategy determines which chunking algorithm to use.

const (
	StrategyAuto Strategy = "auto" // Choose AST if supported, otherwise line-based.
	StrategyAST  Strategy = "ast"  // Parse AST and chunk by semantic nodes.
	StrategyLine Strategy = "line" // Line-based chunking only.
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL