Documentation
¶
Overview ¶
Package chunk splits text bodies into heading-aware sections for indexing.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Options ¶ added in v0.2.0
type Options struct {
// MaxTokens is the approximate maximum number of tokens (words) per
// section. Sections exceeding this limit are split into sub-sections
// unless OverlapTokens is an invalid value that disables splitting.
// Zero disables token-budget splitting.
MaxTokens int
// OverlapTokens is the approximate number of tokens to overlap between
// adjacent sub-sections when a section is split. Zero disables overlap.
// Values greater than or equal to MaxTokens are treated as invalid and
// leave oversized sections unsplit.
OverlapTokens int
}
Options controls how Markdown sections are split.
type Section ¶
Section is one heading-aware Markdown chunk.
func MarkdownWithOptions ¶ added in v0.2.0
MarkdownWithOptions splits Markdown into heading-aware sections and then applies token-budget splitting when sections exceed opts.MaxTokens, unless opts.OverlapTokens is invalid for splitting. Zero-value options produce the same output as Markdown.
func SplitSection ¶ added in v0.2.0
SplitSection splits a single section into sub-sections that each contain at most maxTokens words, with overlapTokens words of overlap between them. The original text is preserved by slicing at word boundaries rather than reconstructing from tokenized words.
If maxTokens is <= 0 or overlapTokens >= maxTokens the section is returned unchanged.