repoguide
Tree-sitter repository map in TOON format for LLM consumption.
What it does
repoguide parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.
The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.
Written in Go for fast parallel parsing across CPU cores.
Installation
Requires Go 1.24+ and a C compiler (for tree-sitter CGo bindings).
go install github.com/phobologic/repoguide@latest
Or build from source:
git clone https://github.com/phobologic/repoguide.git
cd repoguide
go build -o repoguide .
Set the version at build time:
go build -ldflags "-X main.version=1.0.0" -o repoguide .
Usage
repoguide [ROOT] [OPTIONS]
| Option |
Description |
ROOT |
Repository root directory (default: .) |
--max-files, -n |
Limit output to top N files by PageRank (min: 1) |
--langs, -l |
Comma-separated languages to include (e.g., python,go) |
--cache |
Cache file path; reuses if newer than all source files |
--max-file-size |
Skip files larger than this many bytes (default: 1MB) |
--raw |
Output raw TOON without agent context header |
--version, -V |
Show version and exit |
Example
By default, output includes a preamble header that explains the format for AI agent consumption. Use --raw to strip the header for bare TOON output.
$ repoguide /path/to/myproject -n 3
# Repository Map
This is a repository map generated by repoguide. It shows the structure,
key symbols, and dependencies of the codebase in TOON format.
...
---
repo: myproject
root: myproject
files[3]{path,language,rank}:
myproject/models.py,python,0.2755
myproject/languages.py,python,0.1183
myproject/discovery.py,python,0.0608
symbols[17]{file,name,kind,line,signature}:
myproject/models.py,TagKind,class,10,TagKind(enum.Enum)
myproject/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
myproject/models.py,Tag,class,27,Tag
myproject/models.py,FileInfo,class,39,FileInfo
...
dependencies[1]{source,target,symbols}:
myproject/discovery.py,myproject/languages.py,language_for_extension
Claude Code integration
The primary use case is running repoguide as a Claude Code hook so every subagent automatically gets a repo map injected into its context.
Add this to .claude/settings.json:
{
"hooks": {
"SubagentStart": [
{
"hooks": [
{
"type": "command",
"command": "repoguide \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/repoguide.toon\""
}
]
}
]
}
}
The SubagentStart hook fires when any subagent launches. repoguide's stdout is injected into the subagent's context, giving it an instant overview of the codebase. The default output includes a preamble header that explains the format, so the agent understands what it's looking at without any additional configuration.
--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.
The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:
- Scalar fields —
key: value
- Tabular arrays —
name[count]{col1,col2,...}: followed by indented CSV rows
- Quoting — values containing special characters are double-quoted; numbers and plain strings are bare
How it works
- Discover files — uses
git ls-files when available, falls back to .gitignore-based filtering
- Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
- Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
- Rank with PageRank — scores files by importance in the dependency graph
- Select top N — when
--max-files is set, keeps only the highest-ranked files
- Encode to TOON — serializes the repo map into the compact output format
Parsing runs concurrently across all available CPU cores.
Supported languages
Python, Go, Ruby. Extensible by adding a tree-sitter grammar and a .scm query file to internal/lang/queries/.
Development
make build # build binary
make test # run tests
make lint # run golangci-lint
make fmt # format with goimports
make cover # generate coverage report
License
MIT