chunk

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 14, 2026 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package chunk splits text bodies into heading-aware sections for indexing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Options added in v0.2.0

type Options struct {
	// MaxTokens is the approximate maximum number of tokens (words) per
	// section. Sections exceeding this limit are split into sub-sections
	// unless OverlapTokens is an invalid value that disables splitting.
	// Zero disables token-budget splitting.
	MaxTokens int

	// OverlapTokens is the approximate number of tokens to overlap between
	// adjacent sub-sections when a section is split. Zero disables overlap.
	// Values greater than or equal to MaxTokens are treated as invalid and
	// leave oversized sections unsplit.
	OverlapTokens int
}

Options controls how Markdown sections are split.

type Section

type Section struct {
	Heading string
	Body    string
}

Section is one heading-aware Markdown chunk.

func Markdown

func Markdown(title, body string) []Section

Markdown splits Markdown into heading-aware sections.

func MarkdownWithOptions added in v0.2.0

func MarkdownWithOptions(title, body string, opts Options) []Section

MarkdownWithOptions splits Markdown into heading-aware sections and then applies token-budget splitting when sections exceed opts.MaxTokens, unless opts.OverlapTokens is invalid for splitting. Zero-value options produce the same output as Markdown.

func SplitSection added in v0.2.0

func SplitSection(s Section, maxTokens, overlapTokens int) []Section

SplitSection splits a single section into sub-sections that each contain at most maxTokens words, with overlapTokens words of overlap between them. The original text is preserved by slicing at word boundaries rather than reconstructing from tokenized words.

If maxTokens is <= 0 or overlapTokens >= maxTokens the section is returned unchanged.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL