README
¶
dctl
dctl is an open source project created to provide a quick insight into the activity of a single repo or an entire GitHub organization across the two main dimensions:
- Volume of developer events over time (PR, issue, and their comments)
- Contributions by developer entity affiliation during time period (company or organization)

While GitHub does provide some charts with this information on a repo level, Grafana has plugin for GitHub that some foundations use to track their projects (e.g. CNCF Grafana Dashboard), there isn't really anything that's both, free and simple to provide quick answers to these questions.
Using GitHub API, dctl downloads locally metadata about the projects selected by you, and augments it with developer affiliation content from CNCF and Apache Foundation to provide you with an organization, repository, or entity scoped event drill-downs. You can even use time period, contribution type, and developer name filters to identify specific trends and navigate to the original detail in GitHub for additional context.
And, since all this data is cached locally, you can even use CLI to run even more customized queries using SQL without needing to re-download the data. See below for more details on how to do that.
Hope you find this tool helpful. Let me know if you have any questions.
Usage
dctl is dual-purpose app that can either be used as a CLI to import and query data, or as a server to launch a local app that can be accessed in your browser. Start by launching the CLI:
dctl
You should see the CLI version and a short summary along with the usage options:
import- List data import operationsupdate- Update all previously imported org, repos, and affiliationsquery- List data query options (requires previous import)server- Start local HTTP server
Import Data
The dctl CLI comes with an embedded SQLite database. The following import operations are currently supported:
events- Imports GitHub repo event data (PRs, comments, issues, etc)affiliations- Updates imported developer entity/identity with CNCF and GitHub datanames- Updates imported developer names with Apache Foundation data
Import GitHub Events
dctlwill need an access to your GitHub access token. Either create an environment variableGITHUB_ACCESS_TOKENto hold that token or provide it each time using the--tokenflag.
dctl import events --org <organization> --repo <repository>
By default,
dctlwill download data for the last 6 months. Provide--monthsflag to download less or more data.
When completed, dctl will return a summary of the import:
{
"org": "tektoncd",
"repo": "dashboard",
"duration": "4.4883105s",
"imported": {
"issue_comment":61,
"issue_request":42,
"pr_comment":55,
"pr_request":100
}
}
To get a more immediate feedback during import use the debug flag:
dctl --debug import events --org <organization> --repo <repository>
Update Developer Name and Entity Affiliation
Developers on GitHub often don't include their company or organization affiliation, and when they do, there use all kind of creative ways of spelling it (you'd be surprized how many different IBMs and Googles are out there). To clean this data up, dctl provides two different operations:
affiliations- Updates imported developer entity/identity with CNCF and GitHub datanames- Updates imported developer names with Apache Foundation data
To update affiliations using CNCF developer affiliation files:
dctl import affiliation
Alternatively you can provide the
--urlparameter to import a specificdevelopers_affiliations.txtfile
When completed, dctl will output the results (in this example, out of the 3756 unique developers that were already imported into the local database, 459 were updated based on the 38984 CNCF affiliations):
{
"duration": "1m4.576478333s",
"db_devs": 3756,
"cncf_devs": 38984,
"mapped_devs": 459
}
Just like before, to get a more immediate feedback during import use the --debug flag.
Update Developer Full Name
Similarly, you can use the Apache Foundation developer data to update developer full name (AF's data is used only when the local data has no developer full name):
dctl import names
Like with the affiliation, when done, dctl will return the results (in this example, out of the 3201 unique developers that were already imported into the local database, 3 were updated based on the 8337 AF names):
{
"duration": "740.209µs",
"db_devs": 3201,
"af_devs": 8337,
"mapped_devs": 3
}
View Data
Once the data has been imported, the easiest way to view it is to start a local server:
dctl server
The result should be the information with the address:
INFO server started on 127.0.0.1:8080
At this point you can use your browser to navigate to 127.0.0.1:8080 to view the data.
You can change the port on which the server starts by providing the
--portflag.
Query Data
The imported data is also available as JSON via dctl query:
dctl query
Commands:
developers- List developersdeveloper- Get specific CNCF developer details, identities and associated entitiesentities- List entities (companies or organizations with which users are affiliated)entity- Get specific CNCF entity and its associated developersrepositories- List GitHub org/user repositoriesevents- List GitHub repository events
Query Developer
Query for developer usernames and their last update info:
dctl query developers --like marn
[
{
"username": "mchmarny",
"update_date": "2022-05-13"
},
...
]
You can use the
--limitflag to indicate the maximum number of result that should be returned (default: 100)
You can also query for details of a single developer:
dctl query developer --name mchmarny
{
"username": "mchmarny",
"update_date": "2022-05-13",
"id": 175854,
"avatar_url": "https://avatars.githubusercontent.com/u/175854?v=4",
"profile_url": "https://github.com/mchmarny",
"organizations": [
{
"url": "https://api.github.com/orgs/knative",
"name":"knative"
},
...
]
}
Query Entities
Query for entity names and the number of repositories that have events:
dctl query entities --like goog
[
{
"name": "GOOGLE",
"count": 23
}
]
You can use the
--limitflag to indicate the maximum number of result that should be returned (default: 100)
You can also get all the details for specific entity:
dctl query entity --name GOOGLE
{
"entity": "GOOGLE",
"developer_count": 23,
"developers": [
{
"username": "mchmarny",
"entity": "GOOGLE",
"update_date": "2022-05-14"
},
]
}
Query Repositories
Query for organization repositories:
dctl query repositories --org knative
[
{
"name": "serving",
"full_name": "knative/serving",
"description": "Kubernetes-based, scale-to-zero, request-driven compute",
"url": "https://github.com/knative/serving"
},
...
]
You can use the
--limitflag to indicate the maximum number of result that should be returned (default: 100)
Query Events
Query events provides a number of filter options:
org- Name of the GitHub organization or userrepo- Name of the GitHub repositorysince- Event since date (YYYY-MM-DD)author- Event author (GitHub username)type- Event type (pr_request, issue_request, pr_comment, issue_comment)limit- Limits number of result returned (default: 500)
Given the possible amount data, the
--organd--repoflags are required
dctl query events --org knative --repo serving
[
{
"id": 1235445267,
"org": "knative",
"repo": "serving",
"username": "phunghaduong99",
"type": "issue_request",
"date": "2022-05-13"
},
{
"id": 935056755,
"org": "knative",
"repo": "serving",
"username": "dprotaso",
"type": "pr_request",
"date": "2022-05-12"
},
...
]
Query DB Directly (SQL)
For more specialized queries you can also query the local database. The imported data is stored in your home directory, inside of the .dctl directory.
qlite3 ~/.dctl/data.db
DB Schema
The script to create DB schema is located in pkg/data/sql/ddl.sql
Table: developer (PK: username)
Columns |
Type |
Nullable |
|---|---|---|
| username | TEXT |
false |
| update_date | TEXT |
false |
| id | INTEGER |
false |
| full_name | TEXT |
true |
TEXT |
true |
|
| avatar_url | TEXT |
true |
| profile_url | TEXT |
true |
| entity | TEXT |
true |
| location | TEXT |
true |
Table: event (PK: id, org, repo, username, event_type, event_date)
Columns |
Type |
Nullable |
|---|---|---|
| id | INTEGER |
false |
| org | TEXT |
false |
| repo | TEXT |
false |
| username | TEXT |
false |
| event_type | TEXT |
false |
| event_date | TEXT |
false |
Disclaimer
This is my personal project and it does not represent my employer. I take no responsibility for issues caused by this code. I do my best to ensure that everything works, but if something goes wrong, my apologies is all you will get.