ingest

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0 Imports: 36 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var MaxIngestFileSize int64 = 100 << 20 // 100 MB

isBinaryFile returns true if the file appears to contain binary content. Uses the same heuristic as git: if the first 512 bytes contain a null byte, the file is binary. SQLite files (.db) are handled before this is called. MaxIngestFileSize is the largest file we'll read into memory during ingestion or schema inference. Files above this are silently skipped. Set to 0 to disable the size limit. Configurable via --max-file-size.

Functions

func DetectLanguageFromExt added in v0.2.0

func DetectLanguageFromExt(ext string) (langName string, lang *sitter.Language, ok bool)

DetectLanguageFromExt returns the language name and tree-sitter Language for a given file extension. Returns ok=false for unsupported extensions.

func FlattenAST added in v0.1.1

func FlattenAST(root *sitter.Node) []any

FlattenAST walks the tree and returns a list of records for FCA analysis.

func FlattenASTWithLanguage added in v0.2.0

func FlattenASTWithLanguage(root *sitter.Node, langName string) []any

FlattenASTWithLanguage walks the tree and returns records for FCA analysis, using language-specific enrichment if available.

func GetGitHints added in v0.2.0

func GetGitHints() map[string]string

GetGitHints returns the default inference hints for Git repositories.

func GetLanguage added in v0.2.0

func GetLanguage(langName string) *sitter.Language

GetLanguage returns the tree-sitter language for a language name string. Returns nil for unsupported languages.

func LoadGitCommits added in v0.2.0

func LoadGitCommits(repoPath string) ([]any, error)

LoadGitCommits loads all commits from a repository using git log.

func LoadSQLite

func LoadSQLite(dbPath string) ([]any, error)

LoadSQLite opens a SQLite database, reads all records from the results table, parses each JSON record, and returns them as a slice. Kept for backward compatibility with tests; prefer StreamSQLite for large datasets.

func ParseSize added in v0.5.6

func ParseSize(s string) (int64, error)

ParseSize parses a human-readable size string (e.g. "100MB", "1GB", "0"). Returns bytes. Supported suffixes: KB, MB, GB (case-insensitive).

func RegisterContextQuery added in v0.2.0

func RegisterContextQuery(langName, query string)

RegisterContextQuery registers a context extraction query for a specific language. This should be called during initialization.

func RegisterQualifiedCallQuery added in v0.2.0

func RegisterQualifiedCallQuery(langName, query string)

RegisterQualifiedCallQuery registers a call extraction query that captures both @call (function name) and @pkg (package qualifier) for a language.

func RegisterRefQuery added in v0.2.0

func RegisterRefQuery(langName, query string)

RegisterRefQuery registers a reference extraction query for a specific language. This should be called during initialization.

func RenderTemplate

func RenderTemplate(tmpl string, values map[string]any) (string, error)

RenderTemplate renders a Go text/template with the standard mache template functions. Parsed templates are cached — repeated calls with the same template string skip parsing.

func SchemaUsesTreeSitter added in v0.2.0

func SchemaUsesTreeSitter(schema *api.Topology) bool

SchemaUsesTreeSitter returns true if the schema's selectors are tree-sitter S-expressions rather than JSONPath. S-expressions always start with '('.

func ShouldSkipDir added in v0.6.0

func ShouldSkipDir(base string) bool

ShouldSkipDir returns true for hidden dirs and common build artifact directories.

func ShouldSkipFile added in v0.5.6

func ShouldSkipFile(path string, size int64) bool

ShouldSkipFile returns true if the file should not be ingested. Checks extension blocklist, size limit, and binary content.

func StreamSQLite

func StreamSQLite(dbPath string, fn func(recordID string, record any) error) error

StreamSQLite iterates over all records in a SQLite database, calling fn for each one. Only one parsed record is alive at a time, keeping memory usage constant.

func StreamSQLiteRaw

func StreamSQLiteRaw(dbPath string, fn func(id, raw string) error) error

StreamSQLiteRaw iterates over all records yielding raw (id, json) strings without parsing. Used by the parallel ingestion pipeline where workers handle JSON parsing on their own goroutines.

Types

type Engine

type Engine struct {
	Schema           *api.Topology
	Store            IngestionTarget
	RootPath         string // absolute path to the root of the ingestion
	RespectGitignore bool   // when true, skip files matching .gitignore patterns (default: true)
	// contains filtered or unexported fields
}

Engine drives the ingestion process.

func NewEngine

func NewEngine(schema *api.Topology, store IngestionTarget) *Engine

func (*Engine) Ingest

func (e *Engine) Ingest(path string) error

Ingest processes a file or directory. Safe to call multiple times — internal dedup state is reset on each call.

func (*Engine) IngestRecords added in v0.2.0

func (e *Engine) IngestRecords(records []any) error

IngestRecords processes in-memory records (e.g. from Git).

func (*Engine) PrintRoutingSummary added in v0.2.0

func (e *Engine) PrintRoutingSummary()

PrintRoutingSummary outputs a summary of files routed to _project_files/.

func (*Engine) ReIngestFile added in v0.6.0

func (e *Engine) ReIngestFile(path string) error

ReIngestFile re-ingests a single file, preserving the existing RootPath. Used by the live graph refresher to update stale nodes without a full walk. After re-ingestion, the store's file mtime is updated.

type IngestionTarget

type IngestionTarget interface {
	graph.Graph
	AddNode(n *graph.Node)
	AddRoot(n *graph.Node)
	AddRef(token, nodeID string) error
	AddDef(token, dirID string) error
	DeleteFileNodes(filePath string)
}

IngestionTarget combines Graph reading with writing capabilities.

type JsonWalker

type JsonWalker struct{}

JsonWalker implements Walker for JSON-like data.

func NewJsonWalker

func NewJsonWalker() *JsonWalker

func (*JsonWalker) Query

func (w *JsonWalker) Query(root any, selector string) ([]Match, error)

Query implements Walker.

type LanguageProfile added in v0.2.0

type LanguageProfile struct {
	// EnrichNode adds synthetic attributes to records for languages without field names
	EnrichNode func(n *sitter.Node, rec map[string]any)
}

LanguageProfile defines how to extract semantic structure from AST nodes for languages that don't use field names (like HCL)

func GetLanguageProfile added in v0.2.0

func GetLanguageProfile(langName string) *LanguageProfile

GetLanguageProfile returns a profile for the given language name

type Match

type Match interface {
	// Values returns the captured values.
	// For Tree-sitter, these are the named captures from the query (e.g., "res.type" -> "aws_s3_bucket").
	// For JSONPath, if the match is an object, its fields are returned as values.
	// If the match is a primitive, it might be returned under a default key (e.g., "value").
	Values() map[string]any

	// Context returns the underlying object/node to be used as the root for child queries.
	// For JSONPath, this is the matched object.
	// For Tree-sitter, this is the node captured as @scope (or similar convention).
	Context() any
}

Match represents a single result from a query. It provides a map of values that can be used to render path templates.

type OriginProvider

type OriginProvider interface {
	CaptureOrigin(name string) (startByte, endByte uint32, ok bool)
}

OriginProvider is an optional interface that Match implementations can satisfy to expose source byte ranges for write-back. Type-asserted in engine, not required by JSON walker.

type SQLiteResolver

type SQLiteResolver struct {
	// contains filtered or unexported fields
}

SQLiteResolver resolves ContentRef entries by fetching records from SQLite and re-rendering their content templates.

func NewSQLiteResolver

func NewSQLiteResolver() *SQLiteResolver

func (*SQLiteResolver) Close

func (r *SQLiteResolver) Close()

Close closes all open database connections.

func (*SQLiteResolver) Resolve

func (r *SQLiteResolver) Resolve(ref *graph.ContentRef) ([]byte, error)

Resolve fetches a record from SQLite and renders its content template.

type SQLiteWriter added in v0.2.0

type SQLiteWriter struct {
	// contains filtered or unexported fields
}

SQLiteWriter implements IngestionTarget for the new high-performance schema.

func NewSQLiteWriter added in v0.2.0

func NewSQLiteWriter(dbPath string) (*SQLiteWriter, error)

NewSQLiteWriter creates a new writer and initializes the schema.

func (*SQLiteWriter) Act added in v0.5.0

func (w *SQLiteWriter) Act(id, action, payload string) (*graph.ActionResult, error)

func (*SQLiteWriter) AddDef added in v0.2.0

func (w *SQLiteWriter) AddDef(token, dirID string) error

func (*SQLiteWriter) AddNode added in v0.2.0

func (w *SQLiteWriter) AddNode(n *graph.Node)

AddNode writes a node to the database.

func (*SQLiteWriter) AddRef added in v0.2.0

func (w *SQLiteWriter) AddRef(token, nodeID string) error

func (*SQLiteWriter) AddRoot added in v0.2.0

func (w *SQLiteWriter) AddRoot(n *graph.Node)

func (*SQLiteWriter) Close added in v0.2.0

func (w *SQLiteWriter) Close() error

func (*SQLiteWriter) DeleteFileNodes added in v0.2.0

func (w *SQLiteWriter) DeleteFileNodes(filePath string)

func (*SQLiteWriter) GetCallees added in v0.2.0

func (w *SQLiteWriter) GetCallees(id string) ([]*graph.Node, error)

func (*SQLiteWriter) GetCallers added in v0.2.0

func (w *SQLiteWriter) GetCallers(token string) ([]*graph.Node, error)

func (*SQLiteWriter) GetNode added in v0.2.0

func (w *SQLiteWriter) GetNode(id string) (*graph.Node, error)

func (*SQLiteWriter) Invalidate added in v0.2.0

func (w *SQLiteWriter) Invalidate(id string)

func (*SQLiteWriter) ListChildren added in v0.2.0

func (w *SQLiteWriter) ListChildren(id string) ([]string, error)

func (*SQLiteWriter) ReadContent added in v0.2.0

func (w *SQLiteWriter) ReadContent(id string, buf []byte, offset int64) (int, error)

type SitterRoot

type SitterRoot struct {
	Node     *sitter.Node
	FileRoot *sitter.Node // The top-level file node (for global context)
	Source   []byte
	Lang     *sitter.Language
	LangName string // "go", "python", "hcl", etc.
}

SitterRoot encapsulates the necessary context for querying a Tree-sitter tree. It includes the root node, the source code (for extracting content), and the language (for compiling the query).

type SitterWalker

type SitterWalker struct {
	// contains filtered or unexported fields
}

SitterWalker implements Walker for Tree-sitter parsed code.

func NewSitterWalker

func NewSitterWalker() *SitterWalker

func (*SitterWalker) ExtractCalls

func (w *SitterWalker) ExtractCalls(root *sitter.Node, source []byte, lang *sitter.Language, langName string) ([]string, error)

ExtractCalls finds all function calls in the given node using a predefined query. The compiled query is cached per language to avoid recompilation on every call.

func (*SitterWalker) ExtractContext added in v0.2.0

func (w *SitterWalker) ExtractContext(root *sitter.Node, source []byte, lang *sitter.Language, langName string) ([]byte, error)

ExtractContext finds package-level context nodes.

func (*SitterWalker) ExtractQualifiedCalls added in v0.2.0

func (w *SitterWalker) ExtractQualifiedCalls(root *sitter.Node, source []byte, lang *sitter.Language, langName string) ([]graph.QualifiedCall, error)

ExtractQualifiedCalls finds all function calls with optional package qualifiers. For languages with a registered qualified call query, returns QualifiedCall with both Token and Qualifier. For others, falls back to ExtractCalls (bare tokens).

func (*SitterWalker) Query

func (w *SitterWalker) Query(root any, selector string) ([]Match, error)

Query implements Walker.

type Walker

type Walker interface {
	// Query executes a selector (query) against the given root node and returns a list of matches.
	// The root node can be a *sitter.Node (for code) or a generic Go object (for data).
	Query(root any, selector string) ([]Match, error)
}

Walker abstracts over JSONPath (Data) and Tree-sitter (Code). It provides a unified way to query a tree-like structure and extract values for path templating.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL