This tool is used to reliably enumerate projects on GitHub.
The output of this tool is can be used as an input for the criticality_score
tool, or for input for the collect_signals worker.
Example
$ export GITHUB_TOKEN=ghp_x # Personal Access Token Goes Here
$ enumerate_github \
-start 2008-01-01 \
-min-stars=10 \
-workers=1 \
-out=github_projects.txt
Install
$ go install github.com/ossf/criticality_score/v2/cmd/enumerate_github@latest
Usage
$ enumerate_github [FLAGS]...
The URL for each repository is written to the output. By default stdout is used
for output.
FLAGS are optional. See below for documentation.
Authentication
A comma delimited environment variable with one or more GitHub Personal Access
Tokens must be set
Supported environment variables are GITHUB_AUTH_TOKEN, GITHUB_TOKEN,
GH_TOKEN, or GH_AUTH_TOKEN.
Example:
$ export GITHUB_TOKEN=ghp_abc,ghp_123
Flags
Output flags
-out FILE specify the FILE to use for output. By default stdout is used.
-append appends output to FILE if it already exists.
-force overwrites FILE if it already exists and -append is not set.
-format {text|scorecard} indicates the format to use for output. text is
used by default and consists of one URL per line. scorecard outputs a CSV
file compatible with the scorecard
project.
If FILE exists and neither -append nor -force is set the command will fail.
Date flags
-start date
the start date to enumerate back to. Must be at or after 2008-01-01. Defaults to 2008-01-01.
-end date
the end date to enumerate from. Defaults to today's date.
Query/Star flags
-min-stars int only enumerates repositories with this or more of stars
Defaults to 10.
-query string sets the base query to use for enumeration. Defaults to
is:public. See GitHub's search help
for more detail.
-require-min-stars abort execution if -min-stars can't be reached during
enumeration. If not set some repositories created on a certain date may not
be included.
-star-overlap int the number of stars to overlap between queries. Defaults
to 5. A an overlap is used to avoid missing repositories whose star count
changes during enumeration.
Misc flags
-log level set the level of logging. Can be debug, info (default), warn or error.
-workers int the total number of concurrent workers to use. Default is 1.
-help displays help text.
How It Works
Refer to Milestone 1 for details on the
algorithm.
Q&A
Q: What is the lowest practical setting for -min-stars
10 has been successfully tested, although lower may be possible.
TODO -- more detail
Q: How long does it take?
A single GitHub Personal Access Token took about 4 hours to return all
projects with >= 20 stars.
Faster performance can be achieved with more Personal Access Tokens and
additional workers.
Q: How many workers should I use?
Generally, use 1 worker for each Personal Access Token.
More workers than tokens may result in secondary rate limits.
It is possible that more restricted searches will succeed with more workers per
token.
Development
Rather than installing the binary, use go run to run the command.
For example:
$ go run ./cmd/enumerate_github [FLAGS]...
Limiting the data allows for runs to be completed quickly. For example:
$ go run ./cmd/enumerate_github \
-log=debug \
-start=2022-06-14 \
-end=2022-06-21 \
-min-stars=20