chunker

package
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CharacterSplitter

type CharacterSplitter struct {
	ChunkSize    int
	ChunkOverlap int
	Separators   []string
}

CharacterSplitter splits text recursively using a list of separators, similar to RecursiveCharacterTextSplitter in LangChain.

func NewCharacterSplitter

func NewCharacterSplitter(size, overlap int) *CharacterSplitter

NewCharacterSplitter creates a chunker splitting by runes and logical breaks

func (*CharacterSplitter) Chunk

func (c *CharacterSplitter) Chunk(ctx context.Context, doc *core.Document) ([]*core.Chunk, error)

Chunk satisfies the core.Chunker pipeline interface

func (*CharacterSplitter) SplitDocument

func (c *CharacterSplitter) SplitDocument(ctx context.Context, doc *core.Document) ([]*core.Chunk, error)

SplitDocument converts a document into interconnected core.Chunk units (similar to LlamaIndex Nodes).

func (*CharacterSplitter) SplitText

func (c *CharacterSplitter) SplitText(text string) ([]string, error)

SplitText provides raw splitting logic

type TextSplitter

type TextSplitter interface {
	// SplitText turns a raw string into meaningful chunk strings
	SplitText(text string) ([]string, error)

	// SplitDocument extends raw string logic with ID mapping and metadata (Chunk = Node in LlamaIndex)
	SplitDocument(ctx context.Context, doc *core.Document) ([]*core.Chunk, error)
}

TextSplitter is the generalized LlamaIndex "NodeParser" / Langchain "TextSplitter".

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL