repomap

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2025 License: AGPL-3.0 Imports: 14 Imported by: 0

Documentation

Overview

Package repomap provides functionality for creating a structured map of a code repository.

Overview

RepoMap creates a comprehensive representation of a code repository, including:

  1. Files in the repository
  2. Functions, methods, and types in each file
  3. Content snippets for each component
  4. Importance ranking of components using a PageRank-like algorithm

Usage

Basic usage:

import "codeberg.org/MadsRC/aigent/internal/repomap"

// Create a new repository map with default options
rm, err := repomap.NewRepoMap("/path/to/repo", nil)
if err != nil {
    // Handle error
}

// Get a text representation of the repository map
output := rm.String()
fmt.Println(output)

With custom options:

// Custom options
options := repomap.DefaultTraversalOptions()
options.MaxDepth = 5
options.IgnoreDirs = append(options.IgnoreDirs, "build", "dist")
rm, err := repomap.NewRepoMap("/path/to/repo", options)

Traversal Options

The TraversalOptions struct allows customization of repository traversal:

  • MaxDepth: Maximum directory depth to traverse
  • IgnoreDirs: Directories to ignore (e.g., .git, node_modules)
  • IgnoreFiles: Files to ignore (e.g., package-lock.json)
  • IncludeFiles: File extensions or names to include
  • FocusFiles: Files that have user's focus (higher importance)
  • MentionedIds: Identifiers that were explicitly mentioned (higher importance)
  • MaxTokens: Maximum number of tokens for the repo map output
  • Tokenizer: Function to count tokens in text

Importance Calculation

The package uses a weighted PageRank algorithm to determine component importance:

  1. Builds a weighted graph of component references
  2. Applies weights based on identifier characteristics, frequency, and user focus
  3. Runs PageRank iterations to calculate importance scores
  4. Sorts components and files by importance

Output Formats

RepoMap provides multiple output formats:

  • String(): Default output with token limiting
  • StringLegacy(): Backward compatible output
  • StringGrepAST(): Output compatible with the Python grep_ast.py format

Token Limiting

The package supports token limiting to ensure outputs fit within LLM context windows:

  • Uses binary search to find optimal content inclusion
  • Respects MaxTokens setting in TraversalOptions
  • Prioritizes more important files and components

Tree-sitter Integration

RepoMap uses tree-sitter for accurate code parsing:

  • Supports Go and JavaScript languages
  • Uses embedded SCM queries for component extraction
  • Properly manages tree-sitter resources

Important Notes

When using the go-tree-sitter package, always call Close() on objects that allocate memory from C:

  • Parser
  • Tree
  • TreeCursor
  • Query
  • QueryCursor
  • LookaheadIterator

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DefaultTokenizer

func DefaultTokenizer(text string) int

DefaultTokenizer is a simple character-based tokenizer that assumes 1 token ≈ 4 characters

Types

type ComponentInfo

type ComponentInfo struct {
	Name       string
	Type       string // function, method, class, etc.
	LineStart  uint32
	LineEnd    uint32
	Content    string
	Importance float64 // PageRank value
}

ComponentInfo represents a function, method, or object in a file

type FileInfo

type FileInfo struct {
	Path       string
	Components []ComponentInfo
	Content    map[string]string
}

FileInfo represents metadata about a file

type Language

type Language struct {
	TSLanguage  *sitter.Language
	QueryString string
}

Language representation

type Reference

type Reference struct {
	Destination string
	Weight      float64
}

Reference represents an edge in our graph with a weight

type RepoMap

type RepoMap struct {
	Files []*FileInfo
	// contains filtered or unexported fields
}

RepoMap represents a map of a repository

func NewRepoMap

func NewRepoMap(path string, options *TraversalOptions) (*RepoMap, error)

NewRepoMap creates a new repository map for the given path

func (*RepoMap) GetTokenCount

func (rm *RepoMap) GetTokenCount() int

GetTokenCount returns the estimated token count of the current output

func (*RepoMap) RegenerateOutput

func (rm *RepoMap) RegenerateOutput() string

RegenerateOutput forces the repository map to regenerate its string output. This is useful when files or components have been updated after the initial map creation.

func (*RepoMap) String

func (rm *RepoMap) String() string

String returns a textual representation of the repository map with token limit applied

func (*RepoMap) StringGrepAST

func (rm *RepoMap) StringGrepAST() string

StringGrepAST returns a textual representation using the grepast package This provides output that's fully compatible with the Python grep_ast.py format

func (*RepoMap) StringLegacy

func (rm *RepoMap) StringLegacy() string

StringLegacy returns a textual representation using the grepast adapter with token limits This is kept for backwards compatibility

type TokenizerFunc

type TokenizerFunc func(text string) int

TokenizerFunc is a function type that counts tokens in a text

type TraversalOptions

type TraversalOptions struct {
	MaxDepth     int
	IgnoreDirs   []string
	IgnoreFiles  []string
	IncludeFiles []string
	FocusFiles   []string      // Files that have user's focus (equivalent to "chat files" in Python version)
	MentionedIds []string      // Identifiers that were explicitly mentioned
	MaxTokens    int           // Maximum number of tokens for the repo map output
	Tokenizer    TokenizerFunc // Function to count tokens in text, defaults to character-based approximation
}

TraversalOptions provides configuration for repository traversal

func DefaultTraversalOptions

func DefaultTraversalOptions() *TraversalOptions

DefaultTraversalOptions provides sensible defaults for traversal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL