repomap

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2026 License: MIT Imports: 22 Imported by: 0

Documentation

Overview

Package repomap generates a lightweight code structure map of a repository by scanning files and extracting top-level symbols using regex-based parsers. The result is a token-budgeted summary suitable for injection into LLM prompts.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CacheClear

func CacheClear()

CacheClear removes all entries from the symbol cache.

func CacheSize

func CacheSize() int

CacheSize returns the number of entries in the cache.

func ComputeFileHash

func ComputeFileHash(path string) (string, error)

ComputeFileHash returns the SHA-256 hex digest of a file's contents.

func IncrementalReindex

func IncrementalReindex(dir string, ignore []string, indexer CodeIndexer) (added, skipped, removed int, err error)

IncrementalReindex walks dir, chunks supported source files, and stores them via the indexer. Files whose hash matches the stored hash are skipped. Files that have been removed from disk are cleared from the index. File processing is parallelized across available CPUs. Uses IndexPatterns for include/exclude glob filtering and GitignoreRules for nested .gitignore support.

Types

type CallGraph

type CallGraph struct {
	// contains filtered or unexported fields
}

CallGraph maps functions to their callers and callees using Go AST analysis. No CGO required — uses go/parser for static analysis.

func BuildCallGraph

func BuildCallGraph(root string) (*CallGraph, error)

BuildCallGraph parses Go files in root and extracts call relationships.

func (*CallGraph) CalleesOf

func (cg *CallGraph) CalleesOf(funcName string, maxDepth int) []string

CalleesOf returns functions called by the given function (depth levels down).

func (*CallGraph) CallersOf

func (cg *CallGraph) CallersOf(funcName string, maxDepth int) []string

CallersOf returns functions that call the given function (depth levels up).

func (*CallGraph) Neighborhood

func (cg *CallGraph) Neighborhood(funcName string, depth int) []string

Neighborhood returns callers + callees within depth.

type ChangeSetContext

type ChangeSetContext struct {
	ChangedFiles    []string
	ImpactedFiles   []string // files affected by the changes (dependents)
	DependencyFiles []string // files needed to understand the changes (imports)
	TotalFiles      int
}

ChangeSetContext loads only the code context relevant to current changes. Instead of loading the entire repo map, it: 1. Parses `git diff --name-only` to get changed files 2. For each changed file, finds its imports and dependents 3. Returns a focused context set that's 70-90% smaller than full repo

Research: Change-set-aware loading typically reduces context by 70-90% compared to loading the entire repo-map.

func FromGitDiff

func FromGitDiff(root string, graph *ImportGraph) (*ChangeSetContext, error)

FromGitDiff builds a ChangeSetContext from the current git working tree changes. This includes both staged and unstaged modifications.

func FromGitDiffRange

func FromGitDiffRange(root string, baseRef string, graph *ImportGraph) (*ChangeSetContext, error)

FromGitDiffRange builds a ChangeSetContext from a specific git range (e.g., "main..HEAD").

func (*ChangeSetContext) FormatContext

func (c *ChangeSetContext) FormatContext(maxTokens int) string

FormatContext produces a token-efficient representation of the change set suitable for injection into the system prompt.

type CoChangeAnalysis

type CoChangeAnalysis struct {
	// contains filtered or unexported fields
}

CoChangeAnalysis tracks which files frequently change together in git history.

func BuildCoChangeAnalysis

func BuildCoChangeAnalysis(root string, commitLimit int) (*CoChangeAnalysis, error)

BuildCoChangeAnalysis parses the last N commits to find co-change patterns.

func (*CoChangeAnalysis) RelatedFiles

func (ca *CoChangeAnalysis) RelatedFiles(filePath string, topK int) []string

RelatedFiles returns files that frequently co-change with the given file, sorted by co-change frequency.

type CodeChunk

type CodeChunk struct {
	Path      string
	StartLine int
	EndLine   int
	Content   string
	Vector    []float32 // embedding (if computed)
}

CodeChunk represents a chunk of source code for semantic search.

type CodeIndexer

type CodeIndexer interface {
	IndexCodeChunk(path, content, symbol, lang string, start, end, tokens int, hash string) error
	SearchCode(query string, limit int) ([]CodeSearchResult, error)
	GetFileHash(path string) (string, error)
	ClearFileChunks(path string) error
	ListIndexedPaths() ([]string, error)
}

CodeIndexer is the interface used by IncrementalReindex to store and query code chunks. The memory package's YaadBridge implements this interface.

type CodeSearchResult

type CodeSearchResult struct {
	Path      string
	StartLine int
	EndLine   int
	Content   string
	Symbol    string
	Score     float64
}

CodeSearchResult represents a code chunk returned by a search.

type EnhancedGoParser

type EnhancedGoParser struct{}

EnhancedGoParser uses go/ast for accurate Go symbol extraction. Replaces regex-based parsing for Go files with zero-CGO AST parsing. This covers what tree-sitter would give us: method receivers, nested types, interface methods, embedded fields — without requiring CGO.

func (*EnhancedGoParser) ParseGoFile

func (p *EnhancedGoParser) ParseGoFile(content, filePath string) []EnhancedSymbol

ParseGoFile extracts all symbols from a Go file using the standard library parser.

type EnhancedSymbol

type EnhancedSymbol struct {
	Name       string
	Kind       string // function, method, type, interface, struct, field, variable, interface_method
	Line       int
	File       string
	Exported   bool
	References []string // symbols this one references
}

EnhancedSymbol represents a richly-extracted code symbol with references.

func ParsePythonFile

func ParsePythonFile(content, filePath string) []EnhancedSymbol

ParsePythonFile extracts symbols from Python using enhanced regex patterns. Handles: classes, methods (with self), decorators, nested classes.

type FileCache

type FileCache struct {
	Hash    string   `json:"hash"`
	Mtime   int64    `json:"mtime"`
	Symbols []string `json:"symbols"`
}

FileCache holds the cached metadata for a single file.

type FileMap

type FileMap struct {
	Path    string
	Symbols []Symbol
}

FileMap holds the extracted symbols for a single file.

type FileSummary

type FileSummary struct {
	Path      string
	Functions []string // exported function signatures
	Types     []string // exported type names
	LineCount int
}

FileSummary is a level-3 summary of a single file.

type FileWatcher

type FileWatcher struct {
	// contains filtered or unexported fields
}

FileWatcher monitors the project directory for changes and triggers re-indexing.

func NewFileWatcher

func NewFileWatcher(root string, onChange func(path string)) (*FileWatcher, error)

NewFileWatcher creates a watcher for Go/Python/TS files in the given root.

func (*FileWatcher) Start

func (fw *FileWatcher) Start()

Start begins watching for file changes in a goroutine.

func (*FileWatcher) Stop

func (fw *FileWatcher) Stop()

Stop terminates the watcher.

type GitignoreRules

type GitignoreRules struct {
	// contains filtered or unexported fields
}

GitignoreRules holds composed gitignore patterns from multiple levels.

func LoadGitignoreRules

func LoadGitignoreRules(dir string) *GitignoreRules

LoadGitignoreRules walks from dir up to the filesystem root, loading all .gitignore files. Rules from deeper directories take precedence (are appended last). Also loads the global gitignore (~/.config/git/ignore) if it exists.

func (*GitignoreRules) ShouldIgnore

func (gr *GitignoreRules) ShouldIgnore(path string) bool

ShouldIgnore checks if a path should be ignored according to gitignore rules. The path should be relative to the repository root.

type HierarchicalSummary

type HierarchicalSummary struct {
	Root     string
	Packages []PackageSummary
}

HierarchicalSummary provides 3-level code summarization: Level 1: Project (all packages as one-liners) Level 2: Package (file list with exported symbols) Level 3: File (function signatures, no bodies)

func BuildHierarchy

func BuildHierarchy(root string) (*HierarchicalSummary, error)

BuildHierarchy scans a Go project and builds a 3-level summary.

func (*HierarchicalSummary) FormatLevel1

func (h *HierarchicalSummary) FormatLevel1(maxTokens int) string

FormatLevel1 returns the project-level summary (one line per package).

func (*HierarchicalSummary) FormatLevel2

func (h *HierarchicalSummary) FormatLevel2(pkgPath string, maxTokens int) string

FormatLevel2 returns the package-level summary (files + exported symbols).

func (*HierarchicalSummary) FormatLevel3

func (h *HierarchicalSummary) FormatLevel3(filePath string) string

FormatLevel3 returns a file-level summary (full function signatures).

type ImportGraph

type ImportGraph struct {
	// contains filtered or unexported fields
}

ImportGraph builds and queries file-level import/dependency relationships. When file A is identified as relevant, this finds: - Files that A imports (its dependencies) - Files that import A (its dependents)

This is the cheapest cross-file signal. Research shows import graph traversal prevents "undefined symbol" errors by 30-40%.

func BuildImportGraph

func BuildImportGraph(root string) (*ImportGraph, error)

BuildImportGraph scans source files in the given root directory and builds the import graph. For Go, this parses import statements and maps import paths to local files. Also supports basic Python and TypeScript/JavaScript.

func (*ImportGraph) DependenciesOf

func (g *ImportGraph) DependenciesOf(filePath string, maxDepth int) []string

DependenciesOf returns files that the given file imports (up to maxDepth).

func (*ImportGraph) DependentsOf

func (g *ImportGraph) DependentsOf(filePath string, maxDepth int) []string

DependentsOf returns files that import the given file (up to maxDepth).

func (*ImportGraph) Edges

func (g *ImportGraph) Edges() map[string][]string

Edges returns the forward edge map (file -> imports). Useful for inspection.

func (*ImportGraph) ImpactSet

func (g *ImportGraph) ImpactSet(files []string, maxDepth int) []string

ImpactSet returns the union of dependencies and dependents for a set of files. This is used for change-set-aware context: "what other files matter given these changes?"

func (*ImportGraph) Reverse

func (g *ImportGraph) Reverse() map[string][]string

Reverse returns the reverse edge map (file -> dependents). Useful for inspection.

type IncrementalMap

type IncrementalMap struct {
	// contains filtered or unexported fields
}

IncrementalMap maintains a cached symbol index that only reprocesses changed files. It stores file hashes (SHA-256 of content) in a cache file (.hawk/repomap-cache.json). On regeneration, only files whose hash changed are re-parsed. Symbols from changed files are merged into the existing map, and symbols from deleted files are removed.

func NewIncrementalMap

func NewIncrementalMap(cacheDir string) (*IncrementalMap, error)

NewIncrementalMap loads or creates a repomap cache. cacheDir is the directory where the cache file will be stored (typically ".hawk").

func (*IncrementalMap) AllSymbols

func (im *IncrementalMap) AllSymbols() map[string][]string

AllSymbols returns every symbol across all cached files.

func (*IncrementalMap) Save

func (im *IncrementalMap) Save() error

Save persists the cache to disk.

func (*IncrementalMap) Symbols

func (im *IncrementalMap) Symbols(path string) []string

Symbols returns all cached symbols for a file.

func (*IncrementalMap) Update

func (im *IncrementalMap) Update(rootDir string) (changed []string, err error)

Update scans the directory and only reprocesses changed files. Returns the list of files that were re-indexed. Uses fast change detection: checks mtime first, only hashes if mtime differs.

type IndexPatterns

type IndexPatterns struct {
	Include []string `json:"include"` // if non-empty, ONLY files matching these are indexed
	Exclude []string `json:"exclude"` // files matching these are NEVER indexed (overrides include)
}

IndexPatterns controls which files are indexed.

func DefaultIndexPatterns

func DefaultIndexPatterns() IndexPatterns

DefaultIndexPatterns returns sensible defaults.

func LoadIndexPatterns

func LoadIndexPatterns() IndexPatterns

LoadIndexPatterns reads from .hawk/index.json or uses defaults.

func (IndexPatterns) ShouldIndex

func (p IndexPatterns) ShouldIndex(path string) bool

ShouldIndex checks if a path should be indexed based on include/exclude patterns.

type InterfaceExtraction

type InterfaceExtraction struct {
	Functions []string // "func Name(args) returns"
	Types     []string // "type Name struct/interface"
	Constants []string // "const Name = ..."
	Package   string
}

InterfaceExtraction shows only exported signatures (no bodies). Uses ~100 tokens per file vs ~500+ for full content.

func ExtractInterface

func ExtractInterface(filePath string) (*InterfaceExtraction, error)

ExtractInterface parses a Go file and returns only its exported API surface.

func (*InterfaceExtraction) Format

func (ie *InterfaceExtraction) Format() string

Format returns the interface as a compact string.

func (*InterfaceExtraction) TokenEstimate

func (ie *InterfaceExtraction) TokenEstimate() int

TokenEstimate returns approximate token count for this interface.

type Options

type Options struct {
	MaxFiles       int
	MaxTokens      int
	IgnorePatterns []string
}

Options configures repo map generation.

type PackageSummary

type PackageSummary struct {
	Path    string
	Name    string
	Files   []FileSummary
	Symbols int
}

PackageSummary is a level-2 summary of a Go package.

type PredictedFile

type PredictedFile struct {
	Path   string
	Score  float64
	Reason string // why it was predicted relevant
}

PredictedFile is a file predicted to be relevant to the current task.

type RecentEdit

type RecentEdit struct {
	Path string
	At   time.Time
}

RecentEdit tracks a file that was recently modified.

type RecentEditTracker

type RecentEditTracker struct {
	// contains filtered or unexported fields
}

RecentEditTracker maintains a list of recently edited files.

func NewRecentEditTracker

func NewRecentEditTracker(max int) *RecentEditTracker

NewRecentEditTracker creates a tracker with a max capacity.

func (*RecentEditTracker) Recent

func (t *RecentEditTracker) Recent(within time.Duration) []RecentEdit

Recent returns edits within the given duration.

func (*RecentEditTracker) Record

func (t *RecentEditTracker) Record(path string)

Record adds a file edit event.

type RelevancePrediction

type RelevancePrediction struct {
	Files []PredictedFile
}

RelevancePrediction holds predicted files with their relevance scores.

func PredictRelevantFiles

func PredictRelevantFiles(prompt string, recentEdits []RecentEdit, graph *ImportGraph, symbols map[string]string) *RelevancePrediction

PredictRelevantFiles predicts which files are likely relevant given: - The user's prompt (keyword matching against repo map symbols) - Recently edited files (locality heuristic) - Import graph relationships - Co-change history

type RepoMap

type RepoMap struct {
	Files    []FileMap
	TokenEst int
}

RepoMap is the full repository map result.

func Generate

func Generate(dir string, opts Options) (*RepoMap, error)

Generate scans dir and produces a RepoMap with symbols from supported files.

func (*RepoMap) Format

func (rm *RepoMap) Format(maxTokens int) string

Format renders the repo map as a text block, truncated to fit maxTokens.

type RerankResult

type RerankResult struct {
	Chunk CodeSearchResult
	Score float64
}

RerankResult pairs a search result with a re-ranking score.

func Rerank

func Rerank(query string, candidates []CodeSearchResult, topK int) []RerankResult

Rerank re-scores candidates using BM25 (k1=1.2, b=0.75) against the query and returns the top-K results sorted by descending score.

type SemanticIndex

type SemanticIndex struct {
	// contains filtered or unexported fields
}

SemanticIndex holds chunked source files and supports TF-IDF search.

func BuildSemanticIndex

func BuildSemanticIndex(dir string, ignore []string, maxFiles int) (*SemanticIndex, error)

BuildSemanticIndex scans dir, chunks files into ~40-line blocks, and builds an index.

func LoadSemanticIndex

func LoadSemanticIndex(path string) (*SemanticIndex, error)

LoadSemanticIndex decodes a previously saved index from a gob file.

func (*SemanticIndex) Save

func (idx *SemanticIndex) Save(path string) error

Save encodes the index to a file using gob.

func (*SemanticIndex) Search

func (idx *SemanticIndex) Search(query string, topK int) []CodeChunk

Search performs TF-IDF based search over the index, returning the top-K chunks.

func (*SemanticIndex) Size

func (idx *SemanticIndex) Size() int

Size returns the number of chunks in the index.

type ShapleyRanker

type ShapleyRanker struct {
	// contains filtered or unexported fields
}

ShapleyRanker scores code chunks by their marginal contribution to generation quality using an approximate Shapley value approach.

func NewShapleyRanker

func NewShapleyRanker(chunks []CodeChunk) *ShapleyRanker

NewShapleyRanker creates a ranker from the given code chunks.

func (*ShapleyRanker) ComputeScores

func (sr *ShapleyRanker) ComputeScores(relevantPaths []string, query string) []ShapleyScore

ComputeScores calculates approximate Shapley values for each chunk.

Score = relevance_to_query * centrality_in_graph * recency_bonus - redundancy_penalty

  • relevance: keyword overlap with query
  • centrality: how many other chunks reference symbols in this chunk
  • recency: files in relevantPaths get a boost
  • redundancy: chunks similar to already-scored chunks get penalized

func (*ShapleyRanker) Format

func (sr *ShapleyRanker) Format(chunks []CodeChunk) string

Format renders selected chunks as a text block for prompt injection.

func (*ShapleyRanker) SelectOptimalContext

func (sr *ShapleyRanker) SelectOptimalContext(query string, tokenBudget int) []CodeChunk

SelectOptimalContext greedily selects chunks that maximize information within the token budget, recomputing redundancy penalties after each addition.

type ShapleyScore

type ShapleyScore struct {
	Path      string
	StartLine int
	EndLine   int
	Score     float64 // higher = more helpful in context
	Content   string
}

ShapleyScore represents the computed marginal contribution of a code chunk.

type Symbol

type Symbol struct {
	Name string
	Kind string
	Line int
}

Symbol represents a top-level code symbol (function, type, class, etc.).

type SymbolGraph

type SymbolGraph struct {
	// contains filtered or unexported fields
}

SymbolGraph is a directed graph of symbol references used for PageRank computation over a codebase.

func BuildSymbolGraph

func BuildSymbolGraph(dir string, opts Options) (*SymbolGraph, error)

BuildSymbolGraph scans the directory, extracts symbols using the existing repomap parsers, then builds a directed graph by grepping for references.

func (*SymbolGraph) ComputePageRank

func (sg *SymbolGraph) ComputePageRank(iterations int, damping float64)

ComputePageRank runs the standard PageRank algorithm on the symbol graph.

rank[i] = (1-d) + d * sum(rank[j]/outlinks[j]) for all j->i

Default: iterations=20, damping=0.85.

func (*SymbolGraph) FormatMap

func (sg *SymbolGraph) FormatMap(maxTokens int) string

FormatMap renders the ranked symbols as a repo map string, highest-rank first, stopping when the estimated token budget is reached.

func (*SymbolGraph) TopSymbols

func (sg *SymbolGraph) TopSymbols(n int) []SymbolNode

TopSymbols returns the top-N symbols ordered by rank (descending).

type SymbolNode

type SymbolNode struct {
	File   string
	Symbol string
	Kind   string
	Rank   float64
}

SymbolNode is a single node in the symbol graph.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL