Documentation
¶
Overview ¶
Package ingest - semantic code chunker. Splits code files into meaningful chunks for memory and retrieval, using AST-based parsing for Go, indentation-based parsing for Python, regex-based parsing for TypeScript, and blank-line splitting as a fallback.
Package ingest implements dual-stream memory ingestion. Based on MAGMA (arxiv:2601.03236) and GAM (arxiv:2604.12285).
Yaad is a memory layer — it does NOT call LLM APIs directly. Yaad stores, retrieves, and organizes memories.
Fast path (sync): non-blocking — store node + temporal edge, return immediately. Slow path (async): background goroutine — heuristic causal inference, entity linking.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EstimateTokens ¶
EstimateTokens provides a rough token count estimate.
Types ¶
type Chunk ¶
type Chunk struct {
ID string
FilePath string
StartLine int
EndLine int
Content string
Type string // "function", "class", "method", "import", "type", "const", "module"
Name string // symbol name
Language string
TokenEstimate int
ParentChunk string // ID of parent (e.g., class for a method)
}
Chunk represents a semantically meaningful unit of code.
type Chunker ¶
Chunker splits source files into semantic chunks.
func (*Chunker) AddOverlap ¶
AddOverlap adds lines of overlap from the previous chunk to the start of each chunk.
func (*Chunker) ChunkFile ¶
ChunkFile detects the language from path extension and dispatches to the appropriate language-specific chunker. Falls back to generic line-based chunking.
func (*Chunker) MergeSmallChunks ¶
MergeSmallChunks combines adjacent small chunks (< MinChunkTokens) into larger ones, respecting type boundaries (don't merge a function into an import).
type DualStream ¶
type DualStream struct {
// contains filtered or unexported fields
}
DualStream manages fast + slow path ingestion.
func (*DualStream) Remember ¶
func (ds *DualStream) Remember(ctx context.Context, in engine.RememberInput) (*storage.Node, error)
Remember is the fast path: stores node + temporal edge synchronously, then enqueues slow-path work (heuristic causal inference) asynchronously.
func (*DualStream) Stop ¶
func (ds *DualStream) Stop()
Stop gracefully shuts down the slow-path worker.