indexer

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2026 License: MIT Imports: 29 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CountDeadCandidates

func CountDeadCandidates(db *sql.DB, q DeadCodeQuery) (int, error)

CountDeadCandidates returns the total number of symbols that would be evaluated by FindDeadSymbols for the same query (i.e. the candidate pool before the "no callers" filter is applied). Used to generate the summary line shown when no dead symbols are found.

func DropIndex

func DropIndex(root string) error

DropIndex removes the SQLite index file for root. It is a no-op if the index does not exist yet (os.ErrNotExist is ignored). The caller is responsible for ensuring no open DB handle points to the file.

func ExtensionLanguage added in v0.3.0

func ExtensionLanguage(ext string) string

ExtensionLanguage maps a file extension to a human-readable language name. Exported so callers outside the package (e.g. CLI commands) can resolve a file path to the language string expected by ExtractImports / ExtractCalls.

func GetFileHash

func GetFileHash(db *sql.DB, rel string) (string, error)

GetFileHash returns the stored sha256 for rel, or "" if the file is not in the index. Safe to call from multiple goroutines concurrently.

func GetLastIndexedAt added in v0.2.0

func GetLastIndexedAt(db *sql.DB) (time.Time, error)

GetLastIndexedAt returns the UTC time at which the index was last written by Run(). If the key is absent (index has never been run, or was built by an older binary), it returns a zero time.Time and nil error — callers should treat a zero value as "always stale".

func IndexedPaths

func IndexedPaths(db *sql.DB) (map[string]bool, error)

IndexedPaths returns all file paths currently stored in the index. Used by Run to detect deleted files.

func IsSchemaMismatch

func IsSchemaMismatch(err error) bool

IsSchemaMismatch reports whether err (or any error in its chain) is a *SchemaMismatchError.

func OpenIndex

func OpenIndex(root string) (*sql.DB, error)

OpenIndex opens (or creates) the SQLite index for root, applies the schema, and writes repo metadata. The caller is responsible for closing the DB.

func PruneFiles

func PruneFiles(db *sql.DB, deleted []string) error

PruneFiles removes index entries for files that no longer exist on disk. deleted is the set of relative paths to remove.

func RepoID

func RepoID(root string) string

RepoID derives a stable, human-readable identifier for a repository root. Format: <basename>-<8-hex-chars> where the suffix is the first 8 characters of the SHA-256 of the absolute root path.

func SchemaVersion

func SchemaVersion() int

SchemaVersion returns the current index schema version. Exposed so CLI commands can print it alongside the binary version.

func ShouldRefresh added in v0.2.0

func ShouldRefresh(db *sql.DB, threshold time.Duration) (bool, error)

ShouldRefresh reports whether the index is stale and needs a re-walk. It returns true when:

  • last_indexed_at is absent (fresh or legacy index — treat as stale), or
  • time elapsed since the last Run() exceeds threshold.

The check is a single SQLite point lookup with no filesystem I/O, making it safe to call before every query command.

func WriteFile

func WriteFile(db *sql.DB, rel string, entry FileEntry) error

WriteFile upserts a file entry and its symbols inside a single transaction. Existing symbols for the file are removed via FK cascade before re-inserting.

Types

type CallSite

type CallSite struct {
	CalleeName string `json:"callee_name"`
	Line       int    `json:"line"`
}

CallSite records a single outgoing call extracted from a source file. CalleeName is the bare identifier of the called function or method. Line is 1-based, matching tree-sitter row + 1.

func ExtractCalls

func ExtractCalls(lang string, code []byte) ([]CallSite, error)

ExtractCalls runs the call-site query for the given language name against code and returns every outgoing call site found.

For languages that also have a refQuery (identifier references used as values, e.g. RunE: runIndex), those are appended as additional CallSite entries so that functions passed as values are recorded as "used" in the refs table and do not appear as dead code.

Languages that do not have a registered call query (e.g. Python, JS) return an empty slice without error — callers need not special-case them.

type CallerNode

type CallerNode struct {
	CallerFile string       `json:"caller_file"`
	CallerName string       `json:"caller_name"` // empty string means file scope
	CalleeName string       `json:"callee_name"`
	Line       int          `json:"line"`
	Cycle      bool         `json:"cycle,omitempty"` // true when this node was already seen — not expanded
	Callers    []CallerNode `json:"callers,omitempty"`
}

CallerNode is one node in a recursive call graph tree returned by FindCallersRecursive. Each node represents a single call site; its Callers field holds the next level of callers (who calls this caller), and so on.

func FindCallersRecursive

func FindCallersRecursive(db *sql.DB, calleeName string, maxDepth int) ([]CallerNode, error)

FindCallersRecursive traverses the call graph upward from calleeName up to maxDepth levels. maxDepth == 0 means unlimited depth. Cycles are detected via the visited set and marked with Cycle: true without further expansion.

type CallerRow

type CallerRow struct {
	CallerFile string `json:"caller_file"` // file that contains the call
	CallerName string `json:"caller_name"` // enclosing symbol (empty = file scope)
	CalleeName string `json:"callee_name"` // the symbol being called (== queried name)
	Line       int    `json:"line"`        // 1-based line of the call site
}

CallerRow is a single result from FindCallers: one call site that invokes the requested symbol.

func FindCallers

func FindCallers(db *sql.DB, calleeName string) ([]CallerRow, error)

FindCallers returns every recorded call site that invokes calleeName. Results are ordered by caller_file, then line for deterministic output.

type DeadCodeQuery

type DeadCodeQuery struct {
	// Type restricts results to a specific symbol type (e.g. "function", "method").
	// Empty means all callable types (function + method).
	Type string

	// FilePath filters results to symbols in files whose path contains this substring.
	FilePath string

	// UnexportedOnly restricts results to symbols considered unexported by
	// their language conventions: lowercase first letter for Go, and no
	// filtering for Rust (pub visibility cannot be inferred from the name).
	// Reduces false positives for public APIs.
	UnexportedOnly bool
}

DeadCodeQuery holds optional filters for FindDeadSymbols.

type DeadSymbol

type DeadSymbol struct {
	Name     string `json:"name"`
	Type     string `json:"type"`
	FilePath string `json:"file_path"`
	Line     int    `json:"line"`
}

DeadSymbol is a single result row from FindDeadSymbols.

func FindDeadSymbols

func FindDeadSymbols(db *sql.DB, q DeadCodeQuery) ([]DeadSymbol, error)

FindDeadSymbols returns symbols that have no recorded callers in the refs table. Only function and method symbols are considered by default; set q.Type to override.

The following names are always excluded from results because they are invoked by the Go runtime or testing framework, not by ordinary call sites:

  • main, init (runtime entry points)
  • Test*, Benchmark*, Example*, Fuzz* (testing framework)

type DirNode

type DirNode struct {
	Path        string          `json:"path"`                // relative directory path, e.g. "pkg/indexer"
	FileCount   int             `json:"file_count"`          // files directly in this dir
	SymbolCount int             `json:"symbol_count"`        // symbols across all files in this dir
	Languages   map[string]int  `json:"languages,omitempty"` // language → file count for this dir
	Children    []*DirNode      `json:"children,omitempty"`  // sorted subdirectories
	Files       []FileTreeEntry `json:"files,omitempty"`     // immediate files in this dir
}

DirNode represents a single directory in the file tree. Children are nested DirNodes; Files lists the immediate files in this dir.

func FlattenTree

func FlattenTree(root *DirNode) []*DirNode

FlattenTree returns all DirNodes in the tree as a flat sorted slice, useful for printing a compact directory listing.

func GetFileTree

func GetFileTree(db *sql.DB) (*DirNode, error)

GetFileTree returns the root DirNode of a tree built from the index. The root node's Path is ".".

type FileEntry

type FileEntry struct {
	Language  string       `json:"language"`
	SHA256    string       `json:"sha256"`
	Mtime     string       `json:"mtime"` // RFC3339 UTC mtime recorded at index time
	Size      int64        `json:"size"`  // file size in bytes recorded at index time
	IndexedAt time.Time    `json:"indexed_at"`
	Symbols   []SymbolInfo `json:"symbols"`
	Calls     []CallSite   `json:"calls,omitempty"`
	Imports   []ImportSite `json:"imports,omitempty"`
}

FileEntry holds the per-file metadata and extracted symbols. Path is stored as the key in the DB files table, not repeated here.

type FileError

type FileError struct {
	Path string
	Err  error
}

FileError records a per-file failure during indexing. Fatal walk errors are returned as the Run error return value; per-file read/parse failures are collected in IndexStats so the rest of the index is still saved.

func (FileError) Error

func (e FileError) Error() string

type FileMeta

type FileMeta struct {
	Hash  string // sha256 hex
	Mtime string // RFC3339 UTC modification time at index time
	Size  int64  // file size in bytes at index time
}

FileMeta holds the stored stat+hash metadata for a single indexed file. Used by processJob to determine whether a file needs re-parsing.

func GetFileMeta

func GetFileMeta(db *sql.DB, rel string) (FileMeta, error)

GetFileMeta returns the stored hash, mtime, and size for rel, or a zero-value FileMeta if the file is not in the index. Safe to call from multiple goroutines concurrently.

type FileTreeEntry

type FileTreeEntry struct {
	Path        string `json:"path"`
	Language    string `json:"language"`
	SymbolCount int    `json:"symbol_count"`
}

FileTreeEntry is a single file row inside a DirNode.

type HotspotEntry

type HotspotEntry struct {
	CalleeName string `json:"callee_name"`
	CallCount  int    `json:"call_count"`
	FilePath   string `json:"file_path"` // empty when callee is not in the symbol index (e.g. stdlib)
}

HotspotEntry is a single row returned by HotspotSymbols.

func HotspotSymbols

func HotspotSymbols(db *sql.DB, limit int) ([]HotspotEntry, error)

HotspotSymbols returns the top-limit most-called symbols ranked by inbound call count. The file path is resolved via a LEFT JOIN on the symbols table; it will be empty for callees not present in the index (external/stdlib calls).

type ImportQuery added in v0.3.0

type ImportQuery struct {
	FilePath   string // filter by the file that contains the import
	ImportPath string // filter by the imported module/namespace path
}

ImportQuery holds optional filter fields for SearchImports. Any zero-value field is ignored (no filter applied for that column).

type ImportRow added in v0.3.0

type ImportRow struct {
	FilePath   string `json:"file_path"`
	ImportPath string `json:"import_path"`
	Alias      string `json:"alias,omitempty"`
	Line       int    `json:"line"`
}

ImportRow is a single row returned by SearchImports.

func SearchImports added in v0.3.0

func SearchImports(db *sql.DB, q ImportQuery) ([]ImportRow, error)

SearchImports queries the imports table using the filters in q. All non-empty fields are AND-ed together. An empty ImportQuery returns all rows.

type ImportSite added in v0.3.0

type ImportSite struct {
	ImportPath string `json:"import_path"`
	Alias      string `json:"alias,omitempty"`
	Line       int    `json:"line"`
}

ImportSite records a single import/using statement extracted from a source file. ImportPath is the module or namespace being imported. Alias is the local name given to the import (empty = no alias). Line is 1-based, matching tree-sitter row + 1.

func ExtractImports added in v0.3.0

func ExtractImports(lang string, code []byte) ([]ImportSite, error)

ExtractImports runs the import query for the given language name against code and returns every import statement found.

Languages that do not have a registered import query return nil, nil — callers need not special-case them. The function is goroutine-safe: compiled queries are shared read-only; each call creates its own parser and cursor.

type IndexStats

type IndexStats struct {
	Unchanged  int
	Updated    int
	Added      int
	Removed    int
	Errors     int
	FileErrors []FileError
}

IndexStats summarises the outcome of a Run call.

func AutoRefresh added in v0.2.0

func AutoRefresh(root string, db *sql.DB, threshold time.Duration) (IndexStats, error)

AutoRefresh transparently re-indexes root when the index is stale. It calls ShouldRefresh first — if the index is younger than threshold it returns immediately with zero-value IndexStats and nil error, so callers pay only one SQLite point lookup. If the index is stale it delegates to Run() and returns its stats unchanged.

This is the single entry point all query commands should use instead of calling Run() directly.

func Run

func Run(root string, db *sql.DB) (IndexStats, error)

Run walks root, compares each supported file against the index stored in db, re-parses only changed or new files using a concurrent worker pool, prunes deleted entries, and persists all changes.

Per-file read/parse failures are collected in IndexStats.FileErrors and never abort the run. Fatal errors (unresolvable root, walk failure) are returned as the error return value.

type LanguageStat

type LanguageStat struct {
	Language    string `json:"language"`
	FileCount   int    `json:"file_count"`
	SymbolCount int    `json:"symbol_count"`
}

LanguageStat holds per-language file and symbol counts for RepoReport.

type MuncherFacade

type MuncherFacade struct {
	// contains filtered or unexported fields
}

MuncherFacade is the public entry point for symbol extraction. Use NewMuncher to create an instance.

func NewMuncher

func NewMuncher() *MuncherFacade

NewMuncher creates a MuncherFacade with all registered languages loaded.

func (*MuncherFacade) GetSymbol

func (m *MuncherFacade) GetSymbol(path string, code []byte, name string) (*SymbolInfo, error)

GetSymbol returns the first symbol matching name from the parsed file.

func (*MuncherFacade) GetSymbolContent

func (m *MuncherFacade) GetSymbolContent(path string, startLine, endLine int) (string, error)

GetSymbolContent reads the source lines of a symbol from disk using its StartLine and EndLine. Lines are 1-indexed.

func (*MuncherFacade) GetSymbols

func (m *MuncherFacade) GetSymbols(path string, code []byte) ([]SymbolInfo, error)

GetSymbols parses the given source code using tree-sitter and returns all extracted symbols for the file identified by path.

type RefQuery

type RefQuery struct {
	CallerFile string // filter by the file that contains the call
	CallerName string // filter by the enclosing symbol name
	CalleeName string // filter by the name of the called function/method
}

RefQuery holds optional filter fields for SearchRefs. Any zero-value field is ignored (no filter applied for that column).

type RefRow

type RefRow struct {
	ID         int    `json:"id"`
	CallerFile string `json:"caller_file"`
	CallerName string `json:"caller_name"`
	CalleeName string `json:"callee_name"`
	Line       int    `json:"line"`
}

RefRow is a single row returned by SearchRefs.

func SearchRefs

func SearchRefs(db *sql.DB, q RefQuery) ([]RefRow, error)

SearchRefs queries the refs table using the filters in q. All non-empty fields are AND-ed together. An empty RefQuery returns all rows.

type RepoReport

type RepoReport struct {
	RepoID      string           `json:"repo_id"`
	Root        string           `json:"root"`
	GitHead     string           `json:"git_head,omitempty"`
	IndexedAt   string           `json:"indexed_at"`
	FileCount   int              `json:"file_count"`
	SymbolCount int              `json:"symbol_count"`
	Languages   []LanguageStat   `json:"languages"`
	SymbolTypes []SymbolTypeStat `json:"symbol_types"`
}

RepoReport is the result of ReportIndex: a high-level summary of what is stored in the index for a given repository root.

func ReportIndex

func ReportIndex(db *sql.DB) (RepoReport, error)

ReportIndex returns a high-level summary of what is stored in the index. All data is derived from the existing tables — no schema changes required.

type SchemaMismatchError

type SchemaMismatchError struct {
	Stored  int // version recorded in the existing index
	Current int // version expected by this binary
}

SchemaMismatchError is returned by OpenIndex when the on-disk index was built with a different schema version than the current binary expects. The caller should instruct the user to run `mimir index --rebuild <path>`.

func (*SchemaMismatchError) Error

func (e *SchemaMismatchError) Error() string

type SearchQuery

type SearchQuery struct {
	// Name matches the symbol name exactly (case-sensitive).
	Name string

	// NameLike matches symbol names using SQL LIKE with a trailing wildcard
	// (e.g. "Foo" becomes "Foo%"). Ignored when Name is set.
	NameLike string

	// FuzzyName performs an FTS5 MATCH query over symbol names.
	// Supports prefix queries (e.g. "Exec*"), multi-token ("execute async"),
	// and phrase search ("\"execute async\"").
	// Ignored when Name is set. Takes precedence over NameLike.
	FuzzyName string

	// Parent matches the enclosing class/struct/interface name exactly.
	// Use "*" to match any non-empty parent (i.e. any method of any class).
	Parent string

	// Type filters by symbol type (e.g. Function, Method, Class).
	// Zero value means no type filter.
	Type SymbolType

	// FilePath filters results to symbols in files whose path contains this substring.
	FilePath string

	// Limit caps the number of rows returned. Zero means no limit.
	Limit int
}

SearchQuery defines optional filters for SearchSymbols. All non-zero fields are combined with AND.

Dot notation in Name/NameLike is parsed automatically:

  • "Class.method" → exact parent + exact name
  • "Class.*" → exact parent, any name
  • "*.method" → any parent (non-empty), exact name
  • "Class.meth" → exact parent + name prefix (when using NameLike)

func ParseDotNotation

func ParseDotNotation(q SearchQuery) SearchQuery

ParseDotNotation detects "Parent.Name" syntax in q.Name / q.NameLike and splits it into q.Parent + q.Name (or q.NameLike). The original field is cleared after splitting. Called automatically by SearchSymbols.

Wildcard rules:

  • "Class.*" → Parent="Class", Name="" (no name filter)
  • "*.method" → Parent="*", Name="method"
  • "Class.m" → if from NameLike: Parent="Class", NameLike="m"

type SymbolInfo

type SymbolInfo struct {
	Name      string     `json:"name"`
	Type      SymbolType `json:"type"`
	StartLine int        `json:"start_line"`
	EndLine   int        `json:"end_line"`
	// Parent holds the name of the enclosing class/struct/interface for methods,
	// constructors, and properties. Empty for top-level symbols.
	Parent string `json:"parent,omitempty"`
	// BodySnippet holds a deduplicated bag of semantic tokens (identifiers,
	// string literals, comments) extracted from the symbol's full AST subtree.
	// Populated at index time; not persisted in query results (omitempty keeps
	// JSON output clean).
	BodySnippet string `json:"body_snippet,omitempty"`
}

type SymbolRow

type SymbolRow struct {
	SymbolInfo
	FilePath string `json:"file_path"`
}

SymbolRow is a query result: a SymbolInfo plus the file it lives in.

func SearchSymbols

func SearchSymbols(db *sql.DB, q SearchQuery) ([]SymbolRow, error)

SearchSymbols queries the index for symbols matching q. All non-zero fields in q are applied as additive AND conditions. Dot notation in Name/NameLike is parsed automatically. Returns an empty (non-nil) slice when no rows match.

type SymbolType

type SymbolType string
const (
	Function  SymbolType = "function"
	Class     SymbolType = "class"
	Method    SymbolType = "method"
	Interface SymbolType = "interface"
	TypeAlias SymbolType = "type_alias"
	Enum      SymbolType = "enum"
	Namespace SymbolType = "namespace"
	Variable  SymbolType = "variable"
)

type SymbolTypeStat

type SymbolTypeStat struct {
	Type  string `json:"type"`
	Count int    `json:"count"`
}

SymbolTypeStat holds per-type symbol counts for RepoReport.

Directories

Path Synopsis
languages

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL