artifact

package

v0.7.5 Latest Latest Go to latest Published: Apr 20, 2026 License: Apache-2.0 Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/abcxyz/github-metrics-aggregator

Links

Open Source Insights

README ¶

Artifact Package

This package implements a pipeline job that ingests GitHub Action workflow logs and stores them in Google Cloud Storage (GCS).

Purpose

To download and persist workflow logs for further analysis, as GitHub only retains them for a limited time.

Files

config.go: Defines the configuration for the artifact ingestion job.
ingest_logs.go: Contains the LogIngester struct and methods for downloading logs from GitHub and uploading them to GCS.
ingest_logs_test.go: Unit tests for the log ingestion logic.
job.go: The main entry point (ExecuteJob) that orchestrates the pipeline: querying BigQuery for events to process, fanning out the work to a worker pool, and writing completion records back to BigQuery.
query.go: Generates the SQL query used to find events that need log ingestion.
storage.go: Provides utilities for writing data to Google Cloud Storage.

Design Patterns

Worker Pool: Uses github.com/abcxyz/pkg/workerpool to process multiple log ingestions concurrently, handled by ExecuteJob.
Data Pipeline: Follows a typical ETL (Extract, Transform, Load) pattern: extract event IDs from BigQuery, extract logs from GitHub, load logs to GCS, and load status to BigQuery.

Documentation ¶

Overview ¶

Package artifact contains a data pipeline that will read workflow event records from BigQuery and ingest any available logs into cloud storage. A mapping from the original GitHub event to the cloud storage location is persisted in BigQuery along with an indicator for the status of the copy. The pipeline acts as a GitHub App for authentication purposes.

Index ¶

func ExecuteJob(ctx context.Context, cfg *Config) error
func NewLogIngester(ctx context.Context, cfg *Config) (*logIngester, error)
type ArtifactRecord
type Config
- func (cfg *Config) ToFlags(set *cli.FlagSet) *cli.FlagSet
- func (cfg *Config) Validate(ctx context.Context) error
type EventRecord
type ObjectStore
- func NewObjectStore(ctx context.Context) (*ObjectStore, error)
- func (s *ObjectStore) Write(ctx context.Context, content io.Reader, objectDescriptor string) error
type ObjectWriter

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ExecuteJob ¶

func ExecuteJob(ctx context.Context, cfg *Config) error

ExecuteJob runs the ingestion pipeline job to read GitHub action workflow logs from GitHub and store them into GCS.

func NewLogIngester ¶

func NewLogIngester(ctx context.Context, cfg *Config) (*logIngester, error)

NewLogIngester creates a logIngester and initializes the object store, GitHub client.

Types ¶

type ArtifactRecord ¶

type ArtifactRecord struct {
	DeliveryID       string    `bigquery:"delivery_id" json:"delivery_id"`
	ProcessedAt      time.Time `bigquery:"processed_at" json:"processed_at"`
	Status           string    `bigquery:"status" json:"status"`
	WorkflowURI      string    `bigquery:"workflow_uri" json:"workflow_uri"`
	LogsURI          string    `bigquery:"logs_uri" json:"logs_uri"`
	GitHubActor      string    `bigquery:"github_actor" json:"github_actor"`
	OrganizationName string    `bigquery:"organization_name" json:"organization_name"`
	RepositoryName   string    `bigquery:"repository_name" json:"repository_name"`
	RepositorySlug   string    `bigquery:"repository_slug" json:"repository_slug"`
	JobName          string    `bigquery:"job_name" json:"job_name"`
}

ArtifactRecord is the output data structure that maps to the leech pipeline's output table schema.

type Config ¶

type Config struct {
	GitHub githubclient.Config

	// BatchSize is the number of items to process in this pipeline run.
	BatchSize int

	// ProjectID is the project id where the tables live.
	ProjectID string

	// DatasetID is the dataset id where the tables live.
	DatasetID string

	// EventsTableID is the table_name of the optimized events table.
	EventsTableID string

	// ArtifactsTableID is the table_name of the artifact_status table.
	ArtifactsTableID string

	// BucketName is the name of the GCS bucket to store artifact logs
	BucketName string
}

Config defines the set of environment variables required for running the artifact job.

func (*Config) ToFlags ¶

func (cfg *Config) ToFlags(set *cli.FlagSet) *cli.FlagSet

ToFlags binds the config to the cli.FlagSet and returns it.

func (*Config) Validate ¶

func (cfg *Config) Validate(ctx context.Context) error

Validate validates the artifacts config after load.

type EventRecord ¶

type EventRecord struct {
	DeliveryID         string   `bigquery:"delivery_id" json:"delivery_id"`
	RepositorySlug     string   `bigquery:"repo_slug" json:"repo_slug"`
	RepositoryName     string   `bigquery:"repo_name" json:"repo_name"`
	OrganizationName   string   `bigquery:"org_name" json:"org_name"`
	LogsURL            string   `bigquery:"logs_url" json:"logs_url"`
	GitHubActor        string   `bigquery:"github_actor" json:"github_actor"`
	WorkflowURL        string   `bigquery:"workflow_url" json:"workflow_url"`
	WorkflowRunID      string   `bigquery:"workflow_run_id" json:"workflow_run_id"`
	WorkflowRunAttempt string   `bigquery:"workflow_run_attempt" json:"workflow_run_attempt"`
	PullRequestNumbers []string `bigquery:"pull_request_numbers" json:"pull_request_numbers"`
}

EventRecord maps the columns from the driving BigQuery query to a usable structure.

type ObjectStore ¶

type ObjectStore struct {
	// contains filtered or unexported fields
}

ObjectStore is an implementation of the ObjectWriter interface that writes to Cloud Storage.

func NewObjectStore ¶

func NewObjectStore(ctx context.Context) (*ObjectStore, error)

NewObjectStore creates a ObjectWriter implementation that uses cloud storage to store its objects.

func (*ObjectStore) Write ¶

func (s *ObjectStore) Write(ctx context.Context, content io.Reader, objectDescriptor string) error

Write writes an object to Google Cloud Storage.

type ObjectWriter ¶

type ObjectWriter interface {
	Write(ctx context.Context, content io.Reader, descriptor string) error
}

ObjectWriter is an interface for writing a object/blob to a storage medium.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL