treesitter

package
v1.16.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 30, 2025 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DocumentProcessor

type DocumentProcessor struct {
	// contains filtered or unexported fields
}

DocumentProcessor uses tree-sitter to build syntax trees for source files and produce semantically aligned chunks (e.g., whole functions) while still respecting a maximum chunk size where possible.

NOTE: To keep the initial implementation minimal, this currently supports Go source files via the golang grammar. The design is intentionally generic so we can add more languages incrementally.

The processor is thread-safe: it creates a new parser for each Process() call since the underlying tree-sitter C library is not thread-safe.

func NewDocumentProcessor

func NewDocumentProcessor(chunkSize, chunkOverlap int, respectWordBoundaries bool) *DocumentProcessor

NewDocumentProcessor creates a new document processor instance with a language mapping that can be expanded over time. Falls back to text chunking for unsupported file types.

func (*DocumentProcessor) Process

func (p *DocumentProcessor) Process(path string, content []byte) ([]chunk.Chunk, error)

Process implements chunk.DocumentProcessor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL