ingest

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: MIT Imports: 19 Imported by: 0

Documentation

Overview

Package ingest - semantic code chunker. Splits code files into meaningful chunks for memory and retrieval, using AST-based parsing for Go, indentation-based parsing for Python, regex-based parsing for TypeScript, and blank-line splitting as a fallback.

Package ingest implements dual-stream memory ingestion. Based on MAGMA (arxiv:2601.03236) and GAM (arxiv:2604.12285).

Yaad is a memory layer — it does NOT call LLM APIs directly. Yaad stores, retrieves, and organizes memories.

Fast path (sync): non-blocking — store node + temporal edge, return immediately. Slow path (async): background goroutine — heuristic causal inference, entity linking.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EstimateTokens

func EstimateTokens(content string) int

EstimateTokens provides a rough token count estimate.

Types

type Chunk

type Chunk struct {
	ID            string
	FilePath      string
	StartLine     int
	EndLine       int
	Content       string
	Type          string // "function", "class", "method", "import", "type", "const", "module"
	Name          string // symbol name
	Language      string
	TokenEstimate int
	ParentChunk   string // ID of parent (e.g., class for a method)
}

Chunk represents a semantically meaningful unit of code.

type Chunker

type Chunker struct {
	MaxChunkTokens int
	MinChunkTokens int
	OverlapLines   int
}

Chunker splits source files into semantic chunks.

func NewChunker

func NewChunker() *Chunker

NewChunker returns a Chunker with default settings.

func (*Chunker) AddOverlap

func (c *Chunker) AddOverlap(chunks []Chunk, overlapLines int) []Chunk

AddOverlap adds lines of overlap from the previous chunk to the start of each chunk.

func (*Chunker) ChunkFile

func (c *Chunker) ChunkFile(path string, content string) ([]Chunk, error)

ChunkFile detects the language from path extension and dispatches to the appropriate language-specific chunker. Falls back to generic line-based chunking.

func (*Chunker) MergeSmallChunks

func (c *Chunker) MergeSmallChunks(chunks []Chunk) []Chunk

MergeSmallChunks combines adjacent small chunks (< MinChunkTokens) into larger ones, respecting type boundaries (don't merge a function into an import).

type DualStream

type DualStream struct {
	// contains filtered or unexported fields
}

DualStream manages fast + slow path ingestion.

func New

func New(eng *engine.Engine) *DualStream

New creates a DualStream ingestion manager.

func (*DualStream) Remember

func (ds *DualStream) Remember(ctx context.Context, in engine.RememberInput) (*storage.Node, error)

Remember is the fast path: stores node + temporal edge synchronously, then enqueues slow-path work (heuristic causal inference) asynchronously.

func (*DualStream) Stop

func (ds *DualStream) Stop()

Stop gracefully shuts down the slow-path worker.

type SlowPathJob

type SlowPathJob struct {
	NodeID  string
	Content string
	Project string
	Eng     *engine.Engine
	Graph   graph.Graph
}

SlowPathJob is a unit of work for the slow path.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL