Data Insights Platform is a unified abstraction to ingest, process, persist & query analytical data streams generated across GitLab enabling our ability to compute business insights across the product.

Resources
Development
Repository reference
This repository holds an implementation of Data Insights Platform and its components. This per-directory listing overview should help developers by providing a brief description of where things are:
.gitlab: scripts/tooling to help run our CI pipelines on Gitlab
benchmark: scripts/tooling for running our benchmarking test suite, currently using k6
containers/ci: Dockerfile definitions for our CI build setup
cmd: Standalone executables, run as their own containers. These do not implement their own authn/authz which is handled at ingress.
platform: Single-binary that implements all components of the Data Insights Platform.
benchmark: Utility to emit sample events into the pipeline at a given request rate OR for a given period of time.
inspector: Spins up a local Snowplow collector to inspect events emitted via Snowplow SDKs.
pkg: Library packages that may be used/shared by the above cmd executables, or by Go code elsewhere in the repo.
test: Directory containing tests and/or test-setups.
platform: Directory containing E2E test-suite for the platform implementation.
Initial setup
- Clone the Data Insights Platform repo
git clone git@gitlab.com:gitlab-org/analytics-section/platform-insights/core.git
- Run
mise install to install/manage all required dependencies like Go
Build binaries
- Run
make build in the root directory to build binaries.
- Navigate to the platform directory
cd /cmd/platform
- Create a
.env file in cmd/platform/ (can be empty for local development, or use AWS credentials if needed)
- Run
make build to build the platform locally
- Run
make start to start the single binary via Docker compose
- Run
make stop to stop the setup
- Navigate to the platform directory
cd /cmd/platform
- Create a
.env file in cmd/platform/ (can be empty for local development, or use AWS credentials if needed)
- Run
make build to build the platform locally
- Run
make start-distributed to start all components via Docker compose
- Run
make stop-distributed to stop the setup
Testing
Seed sample events to the Snowplow ingester
We can use the benchmark utility to emit events into the Snowplow ingester which is available at cmd/benchmark. If you've already built the images from the root directory:
- Run
./bin/benchmark to seed random Snowplow events. By default this generates GitLab billable-usage events.
See ./bin/benchmark --help for more options.
If you have all the component running, i.e. enricher and clickhouse-exporter, these events should land into ClickHouse eventually.
Inspect persisted events in ClickHouse
- Using a browser, open the Clickhouse query interface
http://localhost:8123/play
- Run a query such as
show databases; or select * from fulfillment.billing_usage_1d_mv; to view available seeded data
You can also use the ClickHouse CLI to interact with the database locally.
Troubleshooting
Port Conflicts
When running the Data Insights Platform alongside of GitLab Development Kit, you may encounter an error stating your Redis or Clickhouse port is already in use: Error response from daemon: ports are not available: exposing port TCP 0.0.0.0:6379 -> 127.0.0.1:0: bind: address already in use. If you encounter this, change the necessary port(s) in the respective docker-compose file cmd/platform/docker-compose/docker-compose-deps.yaml.
# Change from:
redis:
ports:
- "6379:6379"
# To:
redis:
ports:
- "6380:6379"
Contributing
We believe in a world where everyone can contribute.
Please join us and help with your contributions!
IMPORTANT NOTE: We welcome contributions from developers of all backgrounds. We encode that in our Community Code of Conduct.
By participating in this project, you agree to abide by its terms.
Please join us to learn more, get support, or contribute to the project.
Security Reports
Please report suspected security vulnerabilities by following the disclosure process on the GitLab.com website.