Documentation
¶
Overview ¶
Package repomap generates a lightweight code structure map of a repository by scanning files and extracting top-level symbols using regex-based parsers. The result is a token-budgeted summary suitable for injection into LLM prompts.
Index ¶
- func CacheClear()
- func CacheSize() int
- func ComputeFileHash(path string) (string, error)
- func IncrementalReindex(dir string, ignore []string, indexer CodeIndexer) (added, skipped, removed int, err error)
- type CallGraph
- type ChangeSetContext
- type CoChangeAnalysis
- type CodeChunk
- type CodeIndexer
- type CodeSearchResult
- type EnhancedGoParser
- type EnhancedSymbol
- type FileCache
- type FileMap
- type FileSummary
- type FileWatcher
- type GitignoreRules
- type HierarchicalSummary
- type ImportGraph
- func (g *ImportGraph) DependenciesOf(filePath string, maxDepth int) []string
- func (g *ImportGraph) DependentsOf(filePath string, maxDepth int) []string
- func (g *ImportGraph) Edges() map[string][]string
- func (g *ImportGraph) ImpactSet(files []string, maxDepth int) []string
- func (g *ImportGraph) Reverse() map[string][]string
- type IncrementalMap
- type IndexPatterns
- type InterfaceExtraction
- type Options
- type PackageSummary
- type PredictedFile
- type RecentEdit
- type RecentEditTracker
- type RelevancePrediction
- type RepoMap
- type RerankResult
- type SemanticIndex
- type ShapleyRanker
- type ShapleyScore
- type Symbol
- type SymbolGraph
- type SymbolNode
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ComputeFileHash ¶
ComputeFileHash returns the SHA-256 hex digest of a file's contents.
func IncrementalReindex ¶
func IncrementalReindex(dir string, ignore []string, indexer CodeIndexer) (added, skipped, removed int, err error)
IncrementalReindex walks dir, chunks supported source files, and stores them via the indexer. Files whose hash matches the stored hash are skipped. Files that have been removed from disk are cleared from the index. File processing is parallelized across available CPUs. Uses IndexPatterns for include/exclude glob filtering and GitignoreRules for nested .gitignore support.
Types ¶
type CallGraph ¶
type CallGraph struct {
// contains filtered or unexported fields
}
CallGraph maps functions to their callers and callees using Go AST analysis. No CGO required — uses go/parser for static analysis.
func BuildCallGraph ¶
BuildCallGraph parses Go files in root and extracts call relationships.
func (*CallGraph) CalleesOf ¶
CalleesOf returns functions called by the given function (depth levels down).
type ChangeSetContext ¶
type ChangeSetContext struct {
ChangedFiles []string
ImpactedFiles []string // files affected by the changes (dependents)
DependencyFiles []string // files needed to understand the changes (imports)
TotalFiles int
}
ChangeSetContext loads only the code context relevant to current changes. Instead of loading the entire repo map, it: 1. Parses `git diff --name-only` to get changed files 2. For each changed file, finds its imports and dependents 3. Returns a focused context set that's 70-90% smaller than full repo
Research: Change-set-aware loading typically reduces context by 70-90% compared to loading the entire repo-map.
func FromGitDiff ¶
func FromGitDiff(root string, graph *ImportGraph) (*ChangeSetContext, error)
FromGitDiff builds a ChangeSetContext from the current git working tree changes. This includes both staged and unstaged modifications.
func FromGitDiffRange ¶
func FromGitDiffRange(root string, baseRef string, graph *ImportGraph) (*ChangeSetContext, error)
FromGitDiffRange builds a ChangeSetContext from a specific git range (e.g., "main..HEAD").
func (*ChangeSetContext) FormatContext ¶
func (c *ChangeSetContext) FormatContext(maxTokens int) string
FormatContext produces a token-efficient representation of the change set suitable for injection into the system prompt.
type CoChangeAnalysis ¶
type CoChangeAnalysis struct {
// contains filtered or unexported fields
}
CoChangeAnalysis tracks which files frequently change together in git history.
func BuildCoChangeAnalysis ¶
func BuildCoChangeAnalysis(root string, commitLimit int) (*CoChangeAnalysis, error)
BuildCoChangeAnalysis parses the last N commits to find co-change patterns.
func (*CoChangeAnalysis) RelatedFiles ¶
func (ca *CoChangeAnalysis) RelatedFiles(filePath string, topK int) []string
RelatedFiles returns files that frequently co-change with the given file, sorted by co-change frequency.
type CodeChunk ¶
type CodeChunk struct {
Path string
StartLine int
EndLine int
Content string
Vector []float32 // embedding (if computed)
}
CodeChunk represents a chunk of source code for semantic search.
type CodeIndexer ¶
type CodeIndexer interface {
IndexCodeChunk(path, content, symbol, lang string, start, end, tokens int, hash string) error
SearchCode(query string, limit int) ([]CodeSearchResult, error)
GetFileHash(path string) (string, error)
ClearFileChunks(path string) error
ListIndexedPaths() ([]string, error)
}
CodeIndexer is the interface used by IncrementalReindex to store and query code chunks. The memory package's YaadBridge implements this interface.
type CodeSearchResult ¶
type CodeSearchResult struct {
Path string
StartLine int
EndLine int
Content string
Symbol string
Score float64
}
CodeSearchResult represents a code chunk returned by a search.
type EnhancedGoParser ¶
type EnhancedGoParser struct{}
EnhancedGoParser uses go/ast for accurate Go symbol extraction. Replaces regex-based parsing for Go files with zero-CGO AST parsing. This covers what tree-sitter would give us: method receivers, nested types, interface methods, embedded fields — without requiring CGO.
func (*EnhancedGoParser) ParseGoFile ¶
func (p *EnhancedGoParser) ParseGoFile(content, filePath string) []EnhancedSymbol
ParseGoFile extracts all symbols from a Go file using the standard library parser.
type EnhancedSymbol ¶
type EnhancedSymbol struct {
Name string
Kind string // function, method, type, interface, struct, field, variable, interface_method
Line int
File string
Exported bool
References []string // symbols this one references
}
EnhancedSymbol represents a richly-extracted code symbol with references.
func ParsePythonFile ¶
func ParsePythonFile(content, filePath string) []EnhancedSymbol
ParsePythonFile extracts symbols from Python using enhanced regex patterns. Handles: classes, methods (with self), decorators, nested classes.
type FileCache ¶
type FileCache struct {
Hash string `json:"hash"`
Mtime int64 `json:"mtime"`
Symbols []string `json:"symbols"`
}
FileCache holds the cached metadata for a single file.
type FileSummary ¶
type FileSummary struct {
Path string
Functions []string // exported function signatures
Types []string // exported type names
LineCount int
}
FileSummary is a level-3 summary of a single file.
type FileWatcher ¶
type FileWatcher struct {
// contains filtered or unexported fields
}
FileWatcher monitors the project directory for changes and triggers re-indexing.
func NewFileWatcher ¶
func NewFileWatcher(root string, onChange func(path string)) (*FileWatcher, error)
NewFileWatcher creates a watcher for Go/Python/TS files in the given root.
func (*FileWatcher) Start ¶
func (fw *FileWatcher) Start()
Start begins watching for file changes in a goroutine.
type GitignoreRules ¶
type GitignoreRules struct {
// contains filtered or unexported fields
}
GitignoreRules holds composed gitignore patterns from multiple levels.
func LoadGitignoreRules ¶
func LoadGitignoreRules(dir string) *GitignoreRules
LoadGitignoreRules walks from dir up to the filesystem root, loading all .gitignore files. Rules from deeper directories take precedence (are appended last). Also loads the global gitignore (~/.config/git/ignore) if it exists.
func (*GitignoreRules) ShouldIgnore ¶
func (gr *GitignoreRules) ShouldIgnore(path string) bool
ShouldIgnore checks if a path should be ignored according to gitignore rules. The path should be relative to the repository root.
type HierarchicalSummary ¶
type HierarchicalSummary struct {
Root string
Packages []PackageSummary
}
HierarchicalSummary provides 3-level code summarization: Level 1: Project (all packages as one-liners) Level 2: Package (file list with exported symbols) Level 3: File (function signatures, no bodies)
func BuildHierarchy ¶
func BuildHierarchy(root string) (*HierarchicalSummary, error)
BuildHierarchy scans a Go project and builds a 3-level summary.
func (*HierarchicalSummary) FormatLevel1 ¶
func (h *HierarchicalSummary) FormatLevel1(maxTokens int) string
FormatLevel1 returns the project-level summary (one line per package).
func (*HierarchicalSummary) FormatLevel2 ¶
func (h *HierarchicalSummary) FormatLevel2(pkgPath string, maxTokens int) string
FormatLevel2 returns the package-level summary (files + exported symbols).
func (*HierarchicalSummary) FormatLevel3 ¶
func (h *HierarchicalSummary) FormatLevel3(filePath string) string
FormatLevel3 returns a file-level summary (full function signatures).
type ImportGraph ¶
type ImportGraph struct {
// contains filtered or unexported fields
}
ImportGraph builds and queries file-level import/dependency relationships. When file A is identified as relevant, this finds: - Files that A imports (its dependencies) - Files that import A (its dependents)
This is the cheapest cross-file signal. Research shows import graph traversal prevents "undefined symbol" errors by 30-40%.
func BuildImportGraph ¶
func BuildImportGraph(root string) (*ImportGraph, error)
BuildImportGraph scans source files in the given root directory and builds the import graph. For Go, this parses import statements and maps import paths to local files. Also supports basic Python and TypeScript/JavaScript.
func (*ImportGraph) DependenciesOf ¶
func (g *ImportGraph) DependenciesOf(filePath string, maxDepth int) []string
DependenciesOf returns files that the given file imports (up to maxDepth).
func (*ImportGraph) DependentsOf ¶
func (g *ImportGraph) DependentsOf(filePath string, maxDepth int) []string
DependentsOf returns files that import the given file (up to maxDepth).
func (*ImportGraph) Edges ¶
func (g *ImportGraph) Edges() map[string][]string
Edges returns the forward edge map (file -> imports). Useful for inspection.
func (*ImportGraph) ImpactSet ¶
func (g *ImportGraph) ImpactSet(files []string, maxDepth int) []string
ImpactSet returns the union of dependencies and dependents for a set of files. This is used for change-set-aware context: "what other files matter given these changes?"
func (*ImportGraph) Reverse ¶
func (g *ImportGraph) Reverse() map[string][]string
Reverse returns the reverse edge map (file -> dependents). Useful for inspection.
type IncrementalMap ¶
type IncrementalMap struct {
// contains filtered or unexported fields
}
IncrementalMap maintains a cached symbol index that only reprocesses changed files. It stores file hashes (SHA-256 of content) in a cache file (.hawk/repomap-cache.json). On regeneration, only files whose hash changed are re-parsed. Symbols from changed files are merged into the existing map, and symbols from deleted files are removed.
func NewIncrementalMap ¶
func NewIncrementalMap(cacheDir string) (*IncrementalMap, error)
NewIncrementalMap loads or creates a repomap cache. cacheDir is the directory where the cache file will be stored (typically ".hawk").
func (*IncrementalMap) AllSymbols ¶
func (im *IncrementalMap) AllSymbols() map[string][]string
AllSymbols returns every symbol across all cached files.
func (*IncrementalMap) Save ¶
func (im *IncrementalMap) Save() error
Save persists the cache to disk.
func (*IncrementalMap) Symbols ¶
func (im *IncrementalMap) Symbols(path string) []string
Symbols returns all cached symbols for a file.
type IndexPatterns ¶
type IndexPatterns struct {
Include []string `json:"include"` // if non-empty, ONLY files matching these are indexed
Exclude []string `json:"exclude"` // files matching these are NEVER indexed (overrides include)
}
IndexPatterns controls which files are indexed.
func DefaultIndexPatterns ¶
func DefaultIndexPatterns() IndexPatterns
DefaultIndexPatterns returns sensible defaults.
func LoadIndexPatterns ¶
func LoadIndexPatterns() IndexPatterns
LoadIndexPatterns reads from .hawk/index.json or uses defaults.
func (IndexPatterns) ShouldIndex ¶
func (p IndexPatterns) ShouldIndex(path string) bool
ShouldIndex checks if a path should be indexed based on include/exclude patterns.
type InterfaceExtraction ¶
type InterfaceExtraction struct {
Functions []string // "func Name(args) returns"
Types []string // "type Name struct/interface"
Constants []string // "const Name = ..."
Package string
}
InterfaceExtraction shows only exported signatures (no bodies). Uses ~100 tokens per file vs ~500+ for full content.
func ExtractInterface ¶
func ExtractInterface(filePath string) (*InterfaceExtraction, error)
ExtractInterface parses a Go file and returns only its exported API surface.
func (*InterfaceExtraction) Format ¶
func (ie *InterfaceExtraction) Format() string
Format returns the interface as a compact string.
func (*InterfaceExtraction) TokenEstimate ¶
func (ie *InterfaceExtraction) TokenEstimate() int
TokenEstimate returns approximate token count for this interface.
type PackageSummary ¶
type PackageSummary struct {
Path string
Name string
Files []FileSummary
Symbols int
}
PackageSummary is a level-2 summary of a Go package.
type PredictedFile ¶
type PredictedFile struct {
Path string
Score float64
Reason string // why it was predicted relevant
}
PredictedFile is a file predicted to be relevant to the current task.
type RecentEdit ¶
RecentEdit tracks a file that was recently modified.
type RecentEditTracker ¶
type RecentEditTracker struct {
// contains filtered or unexported fields
}
RecentEditTracker maintains a list of recently edited files.
func NewRecentEditTracker ¶
func NewRecentEditTracker(max int) *RecentEditTracker
NewRecentEditTracker creates a tracker with a max capacity.
func (*RecentEditTracker) Recent ¶
func (t *RecentEditTracker) Recent(within time.Duration) []RecentEdit
Recent returns edits within the given duration.
func (*RecentEditTracker) Record ¶
func (t *RecentEditTracker) Record(path string)
Record adds a file edit event.
type RelevancePrediction ¶
type RelevancePrediction struct {
Files []PredictedFile
}
RelevancePrediction holds predicted files with their relevance scores.
func PredictRelevantFiles ¶
func PredictRelevantFiles(prompt string, recentEdits []RecentEdit, graph *ImportGraph, symbols map[string]string) *RelevancePrediction
PredictRelevantFiles predicts which files are likely relevant given: - The user's prompt (keyword matching against repo map symbols) - Recently edited files (locality heuristic) - Import graph relationships - Co-change history
type RepoMap ¶
RepoMap is the full repository map result.
type RerankResult ¶
type RerankResult struct {
Chunk CodeSearchResult
Score float64
}
RerankResult pairs a search result with a re-ranking score.
func Rerank ¶
func Rerank(query string, candidates []CodeSearchResult, topK int) []RerankResult
Rerank re-scores candidates using BM25 (k1=1.2, b=0.75) against the query and returns the top-K results sorted by descending score.
type SemanticIndex ¶
type SemanticIndex struct {
// contains filtered or unexported fields
}
SemanticIndex holds chunked source files and supports TF-IDF search.
func BuildSemanticIndex ¶
func BuildSemanticIndex(dir string, ignore []string, maxFiles int) (*SemanticIndex, error)
BuildSemanticIndex scans dir, chunks files into ~40-line blocks, and builds an index.
func LoadSemanticIndex ¶
func LoadSemanticIndex(path string) (*SemanticIndex, error)
LoadSemanticIndex decodes a previously saved index from a gob file.
func (*SemanticIndex) Save ¶
func (idx *SemanticIndex) Save(path string) error
Save encodes the index to a file using gob.
func (*SemanticIndex) Search ¶
func (idx *SemanticIndex) Search(query string, topK int) []CodeChunk
Search performs TF-IDF based search over the index, returning the top-K chunks.
func (*SemanticIndex) Size ¶
func (idx *SemanticIndex) Size() int
Size returns the number of chunks in the index.
type ShapleyRanker ¶
type ShapleyRanker struct {
// contains filtered or unexported fields
}
ShapleyRanker scores code chunks by their marginal contribution to generation quality using an approximate Shapley value approach.
func NewShapleyRanker ¶
func NewShapleyRanker(chunks []CodeChunk) *ShapleyRanker
NewShapleyRanker creates a ranker from the given code chunks.
func (*ShapleyRanker) ComputeScores ¶
func (sr *ShapleyRanker) ComputeScores(relevantPaths []string, query string) []ShapleyScore
ComputeScores calculates approximate Shapley values for each chunk.
Score = relevance_to_query * centrality_in_graph * recency_bonus - redundancy_penalty
- relevance: keyword overlap with query
- centrality: how many other chunks reference symbols in this chunk
- recency: files in relevantPaths get a boost
- redundancy: chunks similar to already-scored chunks get penalized
func (*ShapleyRanker) Format ¶
func (sr *ShapleyRanker) Format(chunks []CodeChunk) string
Format renders selected chunks as a text block for prompt injection.
func (*ShapleyRanker) SelectOptimalContext ¶
func (sr *ShapleyRanker) SelectOptimalContext(query string, tokenBudget int) []CodeChunk
SelectOptimalContext greedily selects chunks that maximize information within the token budget, recomputing redundancy penalties after each addition.
type ShapleyScore ¶
type ShapleyScore struct {
Path string
StartLine int
EndLine int
Score float64 // higher = more helpful in context
Content string
}
ShapleyScore represents the computed marginal contribution of a code chunk.
type SymbolGraph ¶
type SymbolGraph struct {
// contains filtered or unexported fields
}
SymbolGraph is a directed graph of symbol references used for PageRank computation over a codebase.
func BuildSymbolGraph ¶
func BuildSymbolGraph(dir string, opts Options) (*SymbolGraph, error)
BuildSymbolGraph scans the directory, extracts symbols using the existing repomap parsers, then builds a directed graph by grepping for references.
func (*SymbolGraph) ComputePageRank ¶
func (sg *SymbolGraph) ComputePageRank(iterations int, damping float64)
ComputePageRank runs the standard PageRank algorithm on the symbol graph.
rank[i] = (1-d) + d * sum(rank[j]/outlinks[j]) for all j->i
Default: iterations=20, damping=0.85.
func (*SymbolGraph) FormatMap ¶
func (sg *SymbolGraph) FormatMap(maxTokens int) string
FormatMap renders the ranked symbols as a repo map string, highest-rank first, stopping when the estimated token budget is reached.
func (*SymbolGraph) TopSymbols ¶
func (sg *SymbolGraph) TopSymbols(n int) []SymbolNode
TopSymbols returns the top-N symbols ordered by rank (descending).