repoguide
Tree-sitter repository map in TOON format for LLM consumption.
What it does
repoguide parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.
The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.
Written in Go for fast parallel parsing across CPU cores.
Installation
Requires Go 1.24+ and a C compiler (for tree-sitter CGo bindings).
go install github.com/phobologic/repoguide@latest
Or build from source:
git clone https://github.com/phobologic/repoguide.git
cd repoguide
go build -o repoguide .
Set the version at build time:
go build -ldflags "-X main.version=1.0.0" -o repoguide .
Usage
repoguide [ROOT] [OPTIONS]
| Option |
Description |
ROOT |
Repository root directory (default: .) |
--max-files, -n |
Limit output to top N files by PageRank (min: 1) |
--langs, -l |
Comma-separated languages to include (e.g., python,go) |
--cache |
Cache output to file; reuses if newer than all source files (add to .gitignore) |
--max-file-size |
Skip files larger than this many bytes (default: 1MB) |
--symbol |
Filter output to symbols matching this substring (case-insensitive) |
--file |
Filter output to files matching this substring (case-insensitive) |
--with-tests |
Include test files in output (excluded by default) |
--raw |
Output raw TOON without agent context header |
--version, -V |
Show version and exit |
Example
By default, output includes a preamble header that explains the format for AI agent consumption. Use --raw to strip the header for bare TOON output.
$ repoguide /path/to/myproject -n 3
# Repository Map
This is a repository map generated by repoguide. It shows the structure,
key symbols, and dependencies of the codebase in TOON format.
...
---
repo: myproject
root: myproject
files[3]{path,language,rank}:
myproject/models.py,python,0.2755
myproject/languages.py,python,0.1183
myproject/discovery.py,python,0.0608
symbols[17]{file,name,kind,line,signature}:
myproject/models.py,TagKind,class,10,TagKind(enum.Enum)
myproject/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
myproject/models.py,Tag,class,27,Tag
myproject/models.py,FileInfo,class,39,FileInfo
...
dependencies[1]{source,target,symbols}:
myproject/discovery.py,myproject/languages.py,language_for_extension
Focused queries
Use --symbol and --file to get a targeted view instead of the full map.
These are useful when asking Claude about a specific function or subsystem.
repoguide --symbol BuildGraph # show BuildGraph: definition, callers, callees, import sites
repoguide --file internal/auth # show all symbols and deps for auth package
repoguide --symbol Handle --file srv # combine: Handle symbol scoped to srv files
Both flags do case-insensitive substring matching and can be combined (AND semantics).
When active, the cache is bypassed for reading but the full unfiltered output is still
written to cache on the same run.
The --symbol output includes a callsites table with every call occurrence and
every file-level import site, each with exact file and line number. Use those line
numbers with Read(offset=N) for precise navigation without scanning.
Subcommands
repoguide init
repoguide init [--dry-run] [path-to-CLAUDE.md]
Writes a repoguide usage section to a CLAUDE.md file, creating it if it doesn't
exist. The section instructs Claude Code to call repoguide at the start of
tasks and explains how to read the output.
repoguide init # write to ./CLAUDE.md
repoguide init path/to/CLAUDE.md # explicit path
repoguide init --dry-run # print the generated section, no file written
repoguide init --dry-run CLAUDE.md # print what the full file would look like
The command reports what it did: created, updated, or already up to date.
Safe to run repeatedly — skips the write when nothing has changed.
The block is wrapped in HTML sentinel comments so subsequent runs replace only
that section, leaving surrounding content untouched:
<!-- repoguide:start -->
...generated content...
<!-- repoguide:end -->
Claude Code integration
The primary use case is running repoguide as a Claude Code hook so every subagent automatically gets a repo map injected into its context.
Add this to .claude/settings.json:
{
"hooks": {
"SubagentStart": [
{
"hooks": [
{
"type": "command",
"command": "repoguide \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/repoguide.toon\""
}
]
}
]
}
}
The SubagentStart hook fires when any subagent launches. repoguide's stdout is injected into the subagent's context, giving it an instant overview of the codebase. The default output includes a preamble header that explains the format, so the agent understands what it's looking at without any additional configuration.
--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.
The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:
- Scalar fields —
key: value
- Tabular arrays —
name[count]{col1,col2,...}: followed by indented CSV rows
- Quoting — values containing special characters are double-quoted; numbers and plain strings are bare
How it works
- Discover files — uses
git ls-files when available, falls back to .gitignore-based filtering
- Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
- Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
- Rank with PageRank — scores files by importance in the dependency graph
- Select top N — when
--max-files is set, keeps only the highest-ranked files
- Encode to TOON — serializes the repo map into the compact output format
Parsing runs concurrently across all available CPU cores.
Supported languages
Python, Go, Ruby. Extensible by adding a tree-sitter grammar and a .scm query file to internal/lang/queries/.
Development
make build # build binary
make test # run tests
make lint # run golangci-lint
make fmt # format with goimports
make cover # generate coverage report
License
MIT