Documentation
¶
Overview ¶
Package repomap provides functionality for creating a structured map of a code repository.
Overview ¶
RepoMap creates a comprehensive representation of a code repository, including:
- Files in the repository
- Functions, methods, and types in each file
- Content snippets for each component
- Importance ranking of components using a PageRank-like algorithm
Usage ¶
Basic usage:
import "codeberg.org/MadsRC/aigent/internal/repomap" // Create a new repository map with default options rm, err := repomap.NewRepoMap("/path/to/repo", nil) if err != nil { // Handle error } // Get a text representation of the repository map output := rm.String() fmt.Println(output)
With custom options:
// Custom options options := repomap.DefaultTraversalOptions() options.MaxDepth = 5 options.IgnoreDirs = append(options.IgnoreDirs, "build", "dist") rm, err := repomap.NewRepoMap("/path/to/repo", options)
Traversal Options ¶
The TraversalOptions struct allows customization of repository traversal:
- MaxDepth: Maximum directory depth to traverse
- IgnoreDirs: Directories to ignore (e.g., .git, node_modules)
- IgnoreFiles: Files to ignore (e.g., package-lock.json)
- IncludeFiles: File extensions or names to include
- FocusFiles: Files that have user's focus (higher importance)
- MentionedIds: Identifiers that were explicitly mentioned (higher importance)
- MaxTokens: Maximum number of tokens for the repo map output
- Tokenizer: Function to count tokens in text
Importance Calculation ¶
The package uses a weighted PageRank algorithm to determine component importance:
- Builds a weighted graph of component references
- Applies weights based on identifier characteristics, frequency, and user focus
- Runs PageRank iterations to calculate importance scores
- Sorts components and files by importance
Output Formats ¶
RepoMap provides multiple output formats:
- String(): Default output with token limiting
- StringLegacy(): Backward compatible output
- StringGrepAST(): Output compatible with the Python grep_ast.py format
Token Limiting ¶
The package supports token limiting to ensure outputs fit within LLM context windows:
- Uses binary search to find optimal content inclusion
- Respects MaxTokens setting in TraversalOptions
- Prioritizes more important files and components
Tree-sitter Integration ¶
RepoMap uses tree-sitter for accurate code parsing:
- Supports Go and JavaScript languages
- Uses embedded SCM queries for component extraction
- Properly manages tree-sitter resources
Important Notes ¶
When using the go-tree-sitter package, always call Close() on objects that allocate memory from C:
- Parser
- Tree
- TreeCursor
- Query
- QueryCursor
- LookaheadIterator
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DefaultTokenizer ¶
DefaultTokenizer is a simple character-based tokenizer that assumes 1 token ≈ 4 characters
Types ¶
type ComponentInfo ¶
type ComponentInfo struct { Name string Type string // function, method, class, etc. LineStart uint32 LineEnd uint32 Content string Importance float64 // PageRank value }
ComponentInfo represents a function, method, or object in a file
type FileInfo ¶
type FileInfo struct { Path string Components []ComponentInfo Content map[string]string }
FileInfo represents metadata about a file
type RepoMap ¶
type RepoMap struct { Files []*FileInfo // contains filtered or unexported fields }
RepoMap represents a map of a repository
func NewRepoMap ¶
func NewRepoMap(path string, options *TraversalOptions) (*RepoMap, error)
NewRepoMap creates a new repository map for the given path
func (*RepoMap) GetTokenCount ¶
GetTokenCount returns the estimated token count of the current output
func (*RepoMap) RegenerateOutput ¶
RegenerateOutput forces the repository map to regenerate its string output. This is useful when files or components have been updated after the initial map creation.
func (*RepoMap) String ¶
String returns a textual representation of the repository map with token limit applied
func (*RepoMap) StringGrepAST ¶
StringGrepAST returns a textual representation using the grepast package This provides output that's fully compatible with the Python grep_ast.py format
func (*RepoMap) StringLegacy ¶
StringLegacy returns a textual representation using the grepast adapter with token limits This is kept for backwards compatibility
type TokenizerFunc ¶
TokenizerFunc is a function type that counts tokens in a text
type TraversalOptions ¶
type TraversalOptions struct { MaxDepth int IgnoreDirs []string IgnoreFiles []string IncludeFiles []string FocusFiles []string // Files that have user's focus (equivalent to "chat files" in Python version) MentionedIds []string // Identifiers that were explicitly mentioned MaxTokens int // Maximum number of tokens for the repo map output Tokenizer TokenizerFunc // Function to count tokens in text, defaults to character-based approximation }
TraversalOptions provides configuration for repository traversal
func DefaultTraversalOptions ¶
func DefaultTraversalOptions() *TraversalOptions
DefaultTraversalOptions provides sensible defaults for traversal