Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DocumentProcessor ¶
type DocumentProcessor struct {
// contains filtered or unexported fields
}
DocumentProcessor uses tree-sitter to build syntax trees for source files and produce semantically aligned chunks (e.g., whole functions) while still respecting a maximum chunk size where possible.
NOTE: To keep the initial implementation minimal, this currently supports Go source files via the golang grammar. The design is intentionally generic so we can add more languages incrementally.
The processor is thread-safe: it creates a new parser for each Process() call since the underlying tree-sitter C library is not thread-safe.
func NewDocumentProcessor ¶
func NewDocumentProcessor(chunkSize, chunkOverlap int, respectWordBoundaries bool) *DocumentProcessor
NewDocumentProcessor creates a new document processor instance with a language mapping that can be expanded over time. Falls back to text chunking for unsupported file types.