README
¶
gh-find
A
find(1)-like utility for GitHub repositories
gh-find is a fast, user-friendly way to search for files across GitHub repositories. It supports intuitive glob patterns, multiple filtering options, and sensible defaults.
Features
- Intuitive glob patterns: Use
**/*.goor*.test.jsinstead of complex regex - Concurrent search: Search multiple repositories in parallel with configurable concurrency
- Smart caching: Automatic response caching reduces API calls and respects rate limits
- Repository filtering: Search across sources, forks, archives, and mirrors
- File type filtering: Filter by file, directory, symlink, executable, or submodule with
-t/--type - Extension filtering: Quick filtering by file extension with
-e/--extension - Case-insensitive matching: Optional case-insensitive pattern matching with
-i - Full-path matching: Match against full file paths with
-p, not just basenames
Installation
As a GitHub CLI extension
gh extension install jparise/gh-find
From source
git clone https://github.com/jparise/gh-find
cd gh-find
go build
gh extension install .
Quick Start
# Find all Go files in a specific repository
gh find "*.go" cli/cli
# Search across multiple repositories
gh find "*.go" cli/cli cli/go-gh golang/go
# Find all markdown files for a user/org (searches all their repos)
gh find "*.md" torvalds
# Case-insensitive search
gh find -i "readme*" cli
# Match against full paths (great for finding tests)
gh find -p "**/*_test.go" golang/go
# Filter by file extension
gh find -e go -e md cli
# Filter by file type (files only, no directories)
gh find -t f "README*" cli
# Exclude test files
gh find "*.js" -E "*.test.js" -E "*.spec.js" facebook/react
# Find large files (over 50KB)
gh find --min-size 50k "*.go" golang/go
# Include forks and archives in search
gh find --repo-types sources,forks,archives "*.md" torvalds
# Increase concurrency for faster searches
gh find -j 20 "*.rs" rust-lang/rust
# Bypass cache to get fresh results
gh find --no-cache "*.py" python/cpython
Usage
gh-find [<pattern>] <repository>... [flags]
Arguments
pattern: Glob pattern to match files against (optional)- When searching a single repository, pattern defaults to
*(all files) - When searching multiple repositories, the first argument is the pattern
- When searching a single repository, pattern defaults to
repository: One or more repositories to search. Can be:owner- Search all repositories for a user or organizationowner/repo- Search a specific repository
Examples:
gh find cli/cli # Single repo: pattern defaults to "*"
gh find "*.go" cli/cli # Single repo: explicit pattern
gh find "*.go" cli cli/cli golang/go # Multiple repos: pattern required
gh find "*" cli/cli cli/go-gh # Multiple repos: all files
Pattern Matching
By default, patterns match against the basename (filename) only:
gh find "*.go" cli/cli # Matches any .go file
gh find "main.go" cli/cli # Matches main.go in any directory
Use -p/--full-path to match against the full path:
gh find -p "cmd/**/*.go" cli/cli # Only .go files in cmd/ directory
gh find -p "**/test/**/*.js" facebook/react # Only .js files in test/ directories
Glob patterns support:
*- Matches any characters except/**- Matches any characters including/(directory traversal)?- Matches any single character[abc]- Matches one character from the set{a,b}- Matches either pattern
Options
Pattern Matching
-i, --ignore-case- Case-insensitive pattern matching-p, --full-path- Match pattern against full path instead of basename-t, --type TYPE- Filter by file type (can be specified multiple times for OR matching)- Valid types:
f/file,d/dir/directory,l/symlink,x/executable,s/submodule - Examples:
-t f(files only),-t f -t d(files or directories)
- Valid types:
-e, --extension EXT- Filter by file extension (can be specified multiple times)-E, --exclude PATTERN- Exclude files matching pattern (can be specified multiple times)--min-size SIZE- Minimum file size (e.g.,1M,500k,1GB)--max-size SIZE- Maximum file size (e.g.,5M,1GB)
Repository Filtering
--repo-types TYPE[,TYPE...]- Repository types to include when expanding owners (default:sources)- Valid types:
sources,forks,archives,mirrors,all - Only filters owner expansion (e.g.,
cli). Does NOT filter explicitly specified repos (e.g.,cli/archived-fork) - See Repository Filtering for details
- Valid types:
Performance
-j, --jobs N- Maximum concurrent API requests (default: 10)- Increase for faster searches:
-j 20 - Decrease if hitting rate limits:
-j 5
- Increase for faster searches:
Caching
--no-cache- Bypass cache, always fetch fresh data--cache-dir PATH- Override cache directory (default:~/.cache/gh/)--cache-ttl DURATION- Cache time-to-live (default: 24h, e.g.,1h,30m)
Other
-c, --color MODE- Colorize output:auto,always,never(default:auto)
Repository Filtering
By default, gh-find only searches source repositories (excludes forks, archives, and mirrors).
# Include forks and archives
gh find --repo-types sources,forks,archives "*.go" cli
# Everything
gh find --repo-types all "*.js" facebook
Important: --repo-types only filters owner expansion (e.g., cli → all repos in cli org). Explicitly specified repos (e.g., cli/archived-fork) are always included:
# Includes cli/archived-fork even with --repo-types sources
gh find --repo-types sources cli cli/archived-fork
Caching
gh-find automatically caches GitHub API responses to improve performance and preserve your rate limit.
How It Works
- What's cached: All GET requests to the GitHub API (repository lists, file trees)
- Where:
~/.cache/gh/by default (shared withghCLI and other extensions), or configurable with--cache-dir - Duration: 24 hours by default, configurable with
--cache-ttl - Cache keys: Based on request method, URL, and authentication
Performance Impact
Without cache (searching 100 repositories):
- API calls: ~101 requests
- Time: ~60 seconds
- Rate limit cost: 101/5000
With cache (100% hit rate on repeated search):
- API calls: 0 requests
- Time: <1 second
- Rate limit cost: 0/5000
Cache Control
# Force fresh data (bypass cache)
gh find --no-cache "*.go" cli/cli
# Custom cache location
gh find --cache-dir /tmp/gh-cache "*.go" cli
# Shorter cache duration (1 hour)
gh find --cache-ttl 1h "*.go" cli
You might want to disable the cache when searching recently updated repositories or after pushing new files.
Performance Tuning
Concurrency
The -j/--jobs flag controls how many repositories are searched concurrently:
# Default: 10 concurrent requests
gh find "*.go" myorg
# Faster: 20 concurrent requests (uses rate limit faster)
gh find -j 20 "*.go" myorg
# Slower: 5 concurrent requests (conserves rate limit)
gh find -j 5 "*.go" myorg
Recommendations:
- Small searches (1-20 repos): Use default (
-j 10) - Large searches (50+ repos): Increase to
-j 20if you have rate limit headroom - Rate limit concerns: Decrease to
-j 5to spread requests over time
Rate Limits
GitHub API has rate limits:
- Authenticated: 5,000 requests/hour
- Unauthenticated: 60 requests/hour
Authentication is handled via gh auth login.
Each repository search typically uses:
- 1 request to list repositories (cached)
- 1 request per repository for file tree (cached)
Tip: Use caching to minimize rate limit impact. Cached requests don't count against your limit.
Known Limitations
Truncated Repositories
GitHub's Git Trees API truncates responses for repositories with >100,000 files or >7MB of tree data. When this happens:
Warning: username/repo has >100k files, results incomplete
Results will be partial but still useful. This limitation comes from the GitHub API, not gh-find.
Rate Limits
If you exceed the rate limit, you'll see:
Rate limit exceeded (0/5000). Resets at 14:23 (in 45m).
Examples
Find all test files across multiple repositories
gh find -p "**/*_test.go" golang/go kubernetes/kubernetes
Find only files (exclude directories) with README in the name
gh find -t f -i "readme*" myorg
Find executable scripts
gh find -t x "*.sh" cli/cli
Find symlinks in a repository
gh find -t l "*" torvalds/linux
Find directories matching a pattern
gh find -t d "test*" golang/go
Find README files (case-insensitive) in all repos for an org
gh find -i "readme*" myorg
Find TypeScript files, excluding node_modules
gh find -p "**/*.ts" -E "**/node_modules/**" facebook/react
Find larger Go files (over 50KB)
gh find --min-size 50k -e go golang/go
Find small configuration files (under 10KB)
gh find --max-size 10k "*.json" -E "**/node_modules/**" cli
Find files in a size range (between 10KB and 100KB)
gh find --min-size 10k --max-size 100k "*.go" cli/cli
Exclude test and spec files
gh find "*.js" -E "*.test.js" -E "*.spec.js" cli
Find all Python files in archived repositories
gh find --repo-types archives -e py myorg
Fast search with increased concurrency
gh find -j 25 "Dockerfile" kubernetes
Search with fresh data (bypass 24h cache)
gh find --cache-ttl 0s "*.go" cli/cli
Troubleshooting
"No repositories match the filter"
Check your --repo-types filter. By default, only source repositories are searched (excludes forks, archives, mirrors).
Try: gh find --repo-types all "*.go" username
"Pattern matching not working as expected"
By default, patterns match the basename (filename) only. Use -p/--full-path to match against the full path:
# Won't work (matches basename only)
gh find "cmd/*.go" cli/cli
# Works (matches full path)
gh find -p "cmd/*.go" cli/cli
"Rate limit exceeded"
You've hit GitHub's API rate limit (5,000 requests/hour). Options:
- Wait for the reset time shown in the error
- Use cached results (cache is enabled by default)
- Reduce concurrency:
gh find -j 5 ...
"Failed to get owner type"
The username or organization doesn't exist, or you don't have access. Verify:
gh api users/username
License
This software is released under the terms of the MIT License.
Documentation
¶
There is no documentation for this package.