repoguide

command module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2026 License: MIT Imports: 17 Imported by: 0

README

repoguide

Tree-sitter repository map in TOON format for LLM consumption.

What it does

repoguide parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.

The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.

Written in Go for fast parallel parsing across CPU cores.

Installation

Requires Go 1.24+ and a C compiler (for tree-sitter CGo bindings).

go install github.com/phobologic/repoguide@latest

Or build from source:

git clone https://github.com/phobologic/repoguide.git
cd repoguide
go build -o repoguide .

Set the version at build time:

go build -ldflags "-X main.version=1.0.0" -o repoguide .

Usage

repoguide [ROOT] [OPTIONS]
Option Description
ROOT Repository root directory (default: .)
--max-files, -n Limit output to top N files by PageRank (min: 1)
--langs, -l Comma-separated languages to include (e.g., python,go)
--cache Cache output to file; reuses if newer than all source files (add to .gitignore)
--max-file-size Skip files larger than this many bytes (default: 1MB)
--symbol Filter output to symbols matching this substring (case-insensitive)
--file Filter output to files matching this substring (case-insensitive)
--with-tests Include test files in output (excluded by default)
--raw Output raw TOON without agent context header
--version, -V Show version and exit
Example

By default, output includes a preamble header that explains the format for AI agent consumption. Use --raw to strip the header for bare TOON output.

$ repoguide /path/to/myproject -n 3
# Repository Map

This is a repository map generated by repoguide. It shows the structure,
key symbols, and dependencies of the codebase in TOON format.
...

---
repo: myproject
root: myproject
files[3]{path,language,rank}:
  myproject/models.py,python,0.2755
  myproject/languages.py,python,0.1183
  myproject/discovery.py,python,0.0608
symbols[17]{file,name,kind,line,signature}:
  myproject/models.py,TagKind,class,10,TagKind(enum.Enum)
  myproject/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
  myproject/models.py,Tag,class,27,Tag
  myproject/models.py,FileInfo,class,39,FileInfo
  ...
dependencies[1]{source,target,symbols}:
  myproject/discovery.py,myproject/languages.py,language_for_extension
Focused queries

Use --symbol and --file to get a targeted view instead of the full map. These are useful when asking Claude about a specific function or subsystem.

repoguide --symbol BuildGraph        # show BuildGraph: definition, callers, callees, import sites
repoguide --file internal/auth       # show all symbols and deps for auth package
repoguide --symbol Handle --file srv # combine: Handle symbol scoped to srv files

Both flags do case-insensitive substring matching and can be combined (AND semantics). When active, the cache is bypassed for reading but the full unfiltered output is still written to cache on the same run.

The --symbol output includes a callsites table with every call occurrence and every file-level import site, each with exact file and line number. Use those line numbers with Read(offset=N) for precise navigation without scanning.

Subcommands

repoguide init
repoguide init [--dry-run] [path-to-CLAUDE.md]

Writes a repoguide usage section to a CLAUDE.md file, creating it if it doesn't exist. The section instructs Claude Code to call repoguide at the start of tasks and explains how to read the output.

repoguide init                     # write to ./CLAUDE.md
repoguide init path/to/CLAUDE.md   # explicit path
repoguide init --dry-run           # print the generated section, no file written
repoguide init --dry-run CLAUDE.md # print what the full file would look like

The command reports what it did: created, updated, or already up to date. Safe to run repeatedly — skips the write when nothing has changed.

The block is wrapped in HTML sentinel comments so subsequent runs replace only that section, leaving surrounding content untouched:

<!-- repoguide:start -->
...generated content...
<!-- repoguide:end -->

Claude Code integration

The primary use case is running repoguide as a Claude Code hook so every subagent automatically gets a repo map injected into its context.

Add this to .claude/settings.json:

{
  "hooks": {
    "SubagentStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "repoguide \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/repoguide.toon\""
          }
        ]
      }
    ]
  }
}

The SubagentStart hook fires when any subagent launches. repoguide's stdout is injected into the subagent's context, giving it an instant overview of the codebase. The default output includes a preamble header that explains the format, so the agent understands what it's looking at without any additional configuration.

--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.

TOON format

The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:

  • Scalar fieldskey: value
  • Tabular arraysname[count]{col1,col2,...}: followed by indented CSV rows
  • Quoting — values containing special characters are double-quoted; numbers and plain strings are bare

How it works

  1. Discover files — uses git ls-files when available, falls back to .gitignore-based filtering
  2. Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
  3. Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
  4. Rank with PageRank — scores files by importance in the dependency graph
  5. Select top N — when --max-files is set, keeps only the highest-ranked files
  6. Encode to TOON — serializes the repo map into the compact output format

Parsing runs concurrently across all available CPU cores.

Supported languages

Python, Go, Ruby. Extensible by adding a tree-sitter grammar and a .scm query file to internal/lang/queries/.

Development

make build    # build binary
make test     # run tests
make lint     # run golangci-lint
make fmt      # format with goimports
make cover    # generate coverage report

License

MIT

Documentation

Overview

repoguide generates a tree-sitter repository map in TOON format.

Directories

Path Synopsis
internal
discover
Package discover finds parseable source files in a repository.
Package discover finds parseable source files in a repository.
graph
Package graph builds a dependency graph and computes PageRank.
Package graph builds a dependency graph and computes PageRank.
lang
Package lang provides a language registry mapping file extensions to tree-sitter languages and their embedded query files.
Package lang provides a language registry mapping file extensions to tree-sitter languages and their embedded query files.
model
Package model defines core data structures for repoguide.
Package model defines core data structures for repoguide.
parse
Package parse extracts tags from source files using tree-sitter.
Package parse extracts tags from source files using tree-sitter.
ranking
Package ranking implements token-budget-aware file selection.
Package ranking implements token-budget-aware file selection.
toon
Package toon implements TOON (Token-Oriented Object Notation) encoding.
Package toon implements TOON (Token-Oriented Object Notation) encoding.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL