chunking

package
v1.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SplitPlaintextOnSentences

func SplitPlaintextOnSentences(text string, chunksize int) []string

SplitPlaintextOnSentences splits a string into chunks of the given size. It intelligently splits on sentence boundaries assuming that ., !, and ? are sentence endings. It guarantees that each chunk is no larger than the given size, therefore it may split in the middle of a sentence. It limits the amount a chunk can be smaller than the given size to 3/4 of the given size. There for the number of chunks may be 25% greater than expected in worst case.

Types

type Chunk

type Chunk struct {
	Content string
	ChunkInfo
}

Chunk represents a piece of text with its chunk metadata

func ChunkText

func ChunkText(content string, opts Options) []Chunk

ChunkText splits text into chunks based on the provided options

type ChunkInfo

type ChunkInfo struct {
	IsChunk     bool
	ChunkIndex  int
	TotalChunks int
}

ChunkInfo contains metadata about a chunk's position within a document

type Options

type Options struct {
	ChunkSize        int     `json:"chunkSize"`        // Maximum size of each chunk in characters
	ChunkOverlap     int     `json:"chunkOverlap"`     // Number of characters to overlap between chunks
	MinChunkSize     float64 `json:"minChunkSize"`     // Minimum chunk size as a fraction of max size (0.0-1.0)
	ChunkingStrategy string  `json:"chunkingStrategy"` // Strategy: sentences, paragraphs, or fixed
}

Options defines options for chunking documents

func DefaultOptions

func DefaultOptions() Options

DefaultOptions returns the default chunking options

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL