stringer

module

v0.3.0 Latest Latest Go to latest Published: Feb 8, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/davetashner/stringer

Links

Open Source Insights

README ¶

Stringer

Status: v0.2.0. Three collectors, three output formats, parallel pipeline with signal deduplication. See Current Limitations for what's not here yet.

Codebase archaeology for Beads. Mine your repo for actionable work items, output them as Beads-formatted issues, and give your AI agents instant situational awareness.

# Install via Homebrew
brew install davetashner/tap/stringer

# Or install via Go
go install github.com/davetashner/stringer/cmd/stringer@latest

# Scan a repo and seed beads
cd your-project
stringer scan . | bd import -i -

# That's it. Your agents now have context.
bd ready --json

The Problem

You adopt Beads to give your coding agents persistent memory. On a new project, agents file issues as they go and the dependency graph grows organically.

But most real work happens on existing codebases. When an agent boots up on a 50k-line repo with an empty .beads/ directory, it has zero context. It doesn't know about the 47 TODOs scattered across the codebase or the half-finished refactor that's been sitting there for six months.

Stringer solves the cold-start problem. It mines signals already present in your repo and produces structured Beads issues that agents can immediately orient around.

What It Does Today

Collectors

TODO collector (todos) — Scans source files for TODO, FIXME, HACK, XXX, BUG, and OPTIMIZE comments. Enriched with git blame author and timestamp. Confidence scoring with age-based boosts.
Git log collector (gitlog) — Detects reverts, high-churn files, and stale branches from git history.
Patterns collector (patterns) — Flags large files and modules with low test coverage ratios.
Lottery risk analyzer (lotteryrisk) — Flags directories with low lottery risk (single-author ownership risk) using git blame and commit history with recency weighting.
GitHub collector (github) — Imports open issues, pull requests, and actionable review comments from GitHub. Requires GITHUB_TOKEN env var.

Output Formats

Beads JSONL (beads) — Produces JSONL ready for bd import, with deterministic content-based IDs
JSON (json) — Raw signals with metadata envelope, TTY-aware pretty/compact output
Markdown (markdown) — Human-readable summary grouped by collector with priority distribution

Pipeline

Parallel execution — Collectors run concurrently via errgroup
Per-collector error modes — skip, warn (default), or fail
Signal deduplication — Content-based SHA-256 hashing merges duplicate signals
Dry-run mode — Preview signal counts without producing output

┌─────────────────────────────────┐
│       Target Repository         │
└────────────────┬────────────────┘
                 │
    ┌────────────┼─────────────┐
    ▼            ▼             ▼
┌────────┐  ┌─────────┐  ┌──────────┐
│ TODOs  │  │ Git Log │  │ Patterns │  (parallel)
└───┬────┘  └────┬────┘  └─────┬────┘
    └────────────┼─────────────┘
                 ▼
          ┌──────────────┐
          │    Dedup +   │
          │  Validation  │
          └──────┬───────┘
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
┌────────┐ ┌─────────┐ ┌──────────┐
│ Beads  │ │  JSON   │ │ Markdown │
│ JSONL  │ │         │ │          │
└────────┘ └─────────┘ └──────────┘

What to Expect

Output volume depends on codebase size and coding style:

Codebase	Approximate signals
Small (<5k LOC)	5-30
Medium (10k-50k LOC)	20-200
Large (100k+ LOC)	100-1,000+

Recommendation: Use --dry-run first to see signal counts, then use --max-issues to cap output on your first scan.

# Preview how many signals exist
stringer scan . --dry-run

# Start with a manageable batch
stringer scan . --max-issues 50 | bd import -i -

Getting Started

Start small. You can always scan again.

# 1. Preview signal count
stringer scan . --dry-run

# 2. Import a capped first batch (highest-confidence signals first)
stringer scan . --max-issues 20 | bd import -i -

# 3. See what your agents can now work on
bd ready --json

# 4. When ready, import everything
stringer scan . | bd import -i -

Save to file for review

stringer scan . -o signals.jsonl
cat signals.jsonl          # review
bd import -i signals.jsonl

Machine-readable dry run

stringer scan . --dry-run --json

{
  "total_signals": 70,
  "collectors": [
    {
      "name": "todos",
      "signals": 70,
      "duration": "303.6685ms"
    }
  ],
  "duration": "303.724958ms",
  "exit_code": 0
}

Usage Reference

stringer scan [path] [flags]

Flag	Short	Default	Description
`--collectors`	`-c`	(all)	Comma-separated list of collectors to run
`--format`	`-f`	`beads`	Output format
`--output`	`-o`	stdout	Output file path
`--dry-run`			Show signal count without producing output
`--json`			Machine-readable output for `--dry-run`
`--max-issues`		`0`	Cap output count (0 = unlimited)
`--no-llm`			Skip LLM clustering pass (noop — reserved for future use)

Global flags: --quiet (-q), --verbose (-v), --no-color, --help (-h)

Available collectors: todos, gitlog, patterns, lotteryrisk, github

Available formats: beads, json, markdown

Configuration File

Place a .stringer.yaml in your repository root to set persistent scan options. CLI flags override config file values.

# .stringer.yaml
output_format: json
max_issues: 50
no_llm: true

collectors:
  todos:
    enabled: true
    error_mode: warn
    min_confidence: 0.5
    include_patterns:
      - "*.go"
      - "*.ts"
    exclude_patterns:
      - vendor/**
      - node_modules/**
  gitlog:
    enabled: false

Precedence: CLI flags > .stringer.yaml > defaults

If no config file exists, stringer uses its built-in defaults (all collectors enabled, beads format, no issue cap).

How Output Works

Confidence Scoring

Each signal gets a confidence score (0.0-1.0) based on keyword severity and age from git blame:

Base scores by keyword:

Keyword	Base Score
`BUG`	0.7
`FIXME`	0.6
`HACK`	0.55
`TODO`	0.5
`XXX`	0.5
`OPTIMIZE`	0.4

Age boost from git blame:

Older than 1 year: +0.2
Older than 6 months: +0.1
No blame data or recent: +0.0

Score is capped at 1.0. See DR-004 for the full design rationale.

Priority Mapping

Confidence maps to bead priority:

Confidence	Priority
>= 0.8	P1
>= 0.6	P2
>= 0.4	P3
< 0.4	P4

Content-Based Hashing

Each signal gets a deterministic ID: SHA-256(source + kind + filepath + line + title), truncated to 8 hex characters with a str- prefix (e.g., str-0e4098f9). Re-scanning the same repo produces the same IDs, preventing duplicate beads on reimport.

Labels

Every signal is tagged with:

The keyword kind (e.g., todo, fixme, hack)
stringer-generated — distinguishes stringer output from manually filed issues
The collector name (todos)

Sample Output

Given this source file:

// TODO: Add proper CLI argument parsing
// FIXME: This will panic on nil input
// HACK: Temporary workaround until upstream fixes the API

Stringer produces:

{"id":"str-0e4098f9","title":"TODO: Add proper CLI argument parsing","description":"Location: main.go:6","type":"task","priority":3,"status":"open","created_at":"","created_by":"stringer","labels":["todo","stringer-generated","stringer-generated","todos"]}
{"id":"str-11e6af70","title":"FIXME: This will panic on nil input","description":"Location: main.go:9","type":"bug","priority":2,"status":"open","created_at":"","created_by":"stringer","labels":["fixme","stringer-generated","stringer-generated","todos"]}
{"id":"str-3afa7732","title":"HACK: Temporary workaround until upstream fixes the API","description":"Location: main.go:15","type":"chore","priority":3,"status":"open","created_at":"","created_by":"stringer","labels":["hack","stringer-generated","stringer-generated","todos"]}

The type field is derived from keyword: bug/fixme -> bug, todo -> task, hack/xxx/optimize -> chore.

Current Limitations

No delta scanning. Every run scans the full repo. No way to find only new signals since the last scan.
No LLM clustering. The --no-llm flag exists but is a noop. There is no LLM pass to cluster related signals or infer dependencies.
No global config. Per-repo .stringer.yaml is supported, but there is no global ~/.stringer.yaml.
Line-sensitive hashing. Moving a TODO to a different line changes its ID, which means bd import sees it as a new issue.
No --min-confidence flag. Use --max-issues to cap output volume. Confidence-based filtering is planned.
Manual cleanup needed. If you delete a TODO from source and re-scan, the old bead remains in .beads/. You need to close it manually with bd close.

Roadmap

Planned for future releases:

GitHub issues collector — Import open issues, PRs, and review comments as beads
Lottery risk analyzer — Flag modules with single-author ownership risk
Delta scanning — Only find signals added since last scan
LLM clustering pass — Group related signals, infer dependencies, prioritize
Monorepo support — Per-workspace scanning and scoped output
--min-confidence flag — Filter by confidence threshold with named presets
stringer docs — Auto-generate AGENTS.md scaffolds from repo structure

Design Principles

Read-only. Stringer never modifies the target repository. It reads files and git history, writes output to stdout or a file. You decide when to bd import.

Composable collectors. Each collector is independent, testable, and implements one Go interface. Adding a new signal source means implementing Collector with Name() and Collect() methods.

LLM-optional. Core scanning works without API keys. The LLM pass (when implemented) will add clustering and dependency inference but won't be required.

Idempotent. Running stringer twice on the same repo produces the same output. Content-based hashing ensures deterministic IDs.

Beads-native output. JSONL output is validated against bd import expectations. If bd import can't consume it, that's a stringer bug.

Requirements

Go 1.24+
Git (for blame enrichment)
bd CLI (for importing output into beads)

Contributing

See AGENTS.md for architecture details, the collector interface, and development workflow. This project uses Beads for task tracking — run bd ready --json to find open work.

License

MIT

Directories ¶

Path	Synopsis
cmd
stringer command
internal
collector Package collector defines the Collector interface and a registry for managing available collectors.	Package collector defines the Collector interface and a registry for managing available collectors.
collectors Package collectors provides signal extraction modules for stringer.	Package collectors provides signal extraction modules for stringer.
config Package config handles .stringer.yaml configuration files.	Package config handles .stringer.yaml configuration files.
log Package log configures structured logging for stringer using log/slog.	Package log configures structured logging for stringer using log/slog.
output Package output defines the OutputFormatter interface for writing scan results in various formats.	Package output defines the OutputFormatter interface for writing scan results in various formats.
pipeline Package pipeline provides the scan orchestration engine for stringer.	Package pipeline provides the scan orchestration engine for stringer.
redact Package redact provides utilities to strip sensitive values from strings before they appear in output, logs, or error messages.	Package redact provides utilities to strip sensitive values from strings before they appear in output, logs, or error messages.
signal Package signal defines the core domain types for stringer.	Package signal defines the core domain types for stringer.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL