core

module
v0.0.0-...-c07c10e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2026 License: Apache-2.0, MIT

README

Data Insights Platform

Data Insights Platform is a unified abstraction to ingest, process, persist & query analytical data streams generated across GitLab enabling our ability to compute business insights across the product.

Data Insights Platform

Resources

Development

Repository reference

This repository holds an implementation of Data Insights Platform and its components. This per-directory listing overview should help developers by providing a brief description of where things are:

  • .gitlab: scripts/tooling to help run our CI pipelines on Gitlab
  • benchmark: scripts/tooling for running our benchmarking test suite, currently using k6
  • containers/ci: Dockerfile definitions for our CI build setup
  • cmd: Standalone executables, run as their own containers. These do not implement their own authn/authz which is handled at ingress.
    • platform: Single-binary that implements all components of the Data Insights Platform.
    • benchmark: Utility to emit sample events into the pipeline at a given request rate OR for a given period of time.
    • inspector: Spins up a local Snowplow collector to inspect events emitted via Snowplow SDKs.
  • pkg: Library packages that may be used/shared by the above cmd executables, or by Go code elsewhere in the repo.
  • test: Directory containing tests and/or test-setups.
    • platform: Directory containing E2E test-suite for the platform implementation.
Initial setup
  1. Clone the Data Insights Platform repo git clone git@gitlab.com:gitlab-org/analytics-section/platform-insights/core.git
  2. Run mise install to install/manage all required dependencies like Go
Build binaries
  1. Run make build in the root directory to build binaries.
Run the platform as a single binary locally
  1. Navigate to the platform directory cd /cmd/platform
  2. Create a .env file in cmd/platform/ (can be empty for local development, or use AWS credentials if needed)
  3. Run make build to build the platform locally
  4. Run make start to start the single binary via Docker compose
  5. Run make stop to stop the setup
Run the platform distributed with each component in its own container
  1. Navigate to the platform directory cd /cmd/platform
  2. Create a .env file in cmd/platform/ (can be empty for local development, or use AWS credentials if needed)
  3. Run make build to build the platform locally
  4. Run make start-distributed to start all components via Docker compose
  5. Run make stop-distributed to stop the setup
Testing
  • Unit tests:

    # If you are using podman instead of docker:
    # export USE_PODMAN=1
    # export TESTCONTAINERS_RYUK_CONTAINER_PRIVILEGED=true
    make unit-tests
    
  • E2E tests:

    • Run the platform locally either as a single binary or distributed components

    • Then:

      # If you are using podman instead of docker:
      # export USE_PODMAN=1
      # export TESTCONTAINERS_RYUK_CONTAINER_PRIVILEGED=true
      go test ./test/platform/tests/...
      
Seed sample events to the Snowplow ingester

We can use the benchmark utility to emit events into the Snowplow ingester which is available at cmd/benchmark. If you've already built the images from the root directory:

  1. Run ./bin/benchmark to seed random Snowplow events. By default this generates GitLab billable-usage events.

See ./bin/benchmark --help for more options.

If you have all the component running, i.e. enricher and clickhouse-exporter, these events should land into ClickHouse eventually.

Inspect persisted events in ClickHouse
  1. Using a browser, open the Clickhouse query interface http://localhost:8123/play
  2. Run a query such as show databases; or select * from fulfillment.billing_usage_1d_mv; to view available seeded data

You can also use the ClickHouse CLI to interact with the database locally.

Troubleshooting
Port Conflicts

When running the Data Insights Platform alongside of GitLab Development Kit, you may encounter an error stating your Redis or Clickhouse port is already in use: Error response from daemon: ports are not available: exposing port TCP 0.0.0.0:6379 -> 127.0.0.1:0: bind: address already in use. If you encounter this, change the necessary port(s) in the respective docker-compose file cmd/platform/docker-compose/docker-compose-deps.yaml.

# Change from:
redis:
  ports:
    - "6379:6379"

# To:
redis:
  ports:
    - "6380:6379"

Contributing

We believe in a world where everyone can contribute.

Please join us and help with your contributions!

IMPORTANT NOTE: We welcome contributions from developers of all backgrounds. We encode that in our Community Code of Conduct. By participating in this project, you agree to abide by its terms.

Community

Please join us to learn more, get support, or contribute to the project.

Security Reports

Please report suspected security vulnerabilities by following the disclosure process on the GitLab.com website.

Directories

Path Synopsis
cmd
benchmark command
inspector command
platform command
prober command
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL