artifact

package
v0.7.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2026 License: Apache-2.0 Imports: 19 Imported by: 0

README

Artifact Package

This package implements a pipeline job that ingests GitHub Action workflow logs and stores them in Google Cloud Storage (GCS).

Purpose

To download and persist workflow logs for further analysis, as GitHub only retains them for a limited time.

Files

  • config.go: Defines the configuration for the artifact ingestion job.
  • ingest_logs.go: Contains the LogIngester struct and methods for downloading logs from GitHub and uploading them to GCS.
  • ingest_logs_test.go: Unit tests for the log ingestion logic.
  • job.go: The main entry point (ExecuteJob) that orchestrates the pipeline: querying BigQuery for events to process, fanning out the work to a worker pool, and writing completion records back to BigQuery.
  • query.go: Generates the SQL query used to find events that need log ingestion.
  • storage.go: Provides utilities for writing data to Google Cloud Storage.

Design Patterns

  • Worker Pool: Uses github.com/abcxyz/pkg/workerpool to process multiple log ingestions concurrently, handled by ExecuteJob.
  • Data Pipeline: Follows a typical ETL (Extract, Transform, Load) pattern: extract event IDs from BigQuery, extract logs from GitHub, load logs to GCS, and load status to BigQuery.

Documentation

Overview

Package artifact contains a data pipeline that will read workflow event records from BigQuery and ingest any available logs into cloud storage. A mapping from the original GitHub event to the cloud storage location is persisted in BigQuery along with an indicator for the status of the copy. The pipeline acts as a GitHub App for authentication purposes.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExecuteJob

func ExecuteJob(ctx context.Context, cfg *Config) error

ExecuteJob runs the ingestion pipeline job to read GitHub action workflow logs from GitHub and store them into GCS.

func NewLogIngester

func NewLogIngester(ctx context.Context, cfg *Config) (*logIngester, error)

NewLogIngester creates a logIngester and initializes the object store, GitHub client.

Types

type ArtifactRecord

type ArtifactRecord struct {
	DeliveryID       string    `bigquery:"delivery_id" json:"delivery_id"`
	ProcessedAt      time.Time `bigquery:"processed_at" json:"processed_at"`
	Status           string    `bigquery:"status" json:"status"`
	WorkflowURI      string    `bigquery:"workflow_uri" json:"workflow_uri"`
	LogsURI          string    `bigquery:"logs_uri" json:"logs_uri"`
	GitHubActor      string    `bigquery:"github_actor" json:"github_actor"`
	OrganizationName string    `bigquery:"organization_name" json:"organization_name"`
	RepositoryName   string    `bigquery:"repository_name" json:"repository_name"`
	RepositorySlug   string    `bigquery:"repository_slug" json:"repository_slug"`
	JobName          string    `bigquery:"job_name" json:"job_name"`
}

ArtifactRecord is the output data structure that maps to the leech pipeline's output table schema.

type Config

type Config struct {
	GitHub githubclient.Config

	// BatchSize is the number of items to process in this pipeline run.
	BatchSize int

	// ProjectID is the project id where the tables live.
	ProjectID string

	// DatasetID is the dataset id where the tables live.
	DatasetID string

	// EventsTableID is the table_name of the optimized events table.
	EventsTableID string

	// ArtifactsTableID is the table_name of the artifact_status table.
	ArtifactsTableID string

	// BucketName is the name of the GCS bucket to store artifact logs
	BucketName string
}

Config defines the set of environment variables required for running the artifact job.

func (*Config) ToFlags

func (cfg *Config) ToFlags(set *cli.FlagSet) *cli.FlagSet

ToFlags binds the config to the cli.FlagSet and returns it.

func (*Config) Validate

func (cfg *Config) Validate(ctx context.Context) error

Validate validates the artifacts config after load.

type EventRecord

type EventRecord struct {
	DeliveryID         string   `bigquery:"delivery_id" json:"delivery_id"`
	RepositorySlug     string   `bigquery:"repo_slug" json:"repo_slug"`
	RepositoryName     string   `bigquery:"repo_name" json:"repo_name"`
	OrganizationName   string   `bigquery:"org_name" json:"org_name"`
	LogsURL            string   `bigquery:"logs_url" json:"logs_url"`
	GitHubActor        string   `bigquery:"github_actor" json:"github_actor"`
	WorkflowURL        string   `bigquery:"workflow_url" json:"workflow_url"`
	WorkflowRunID      string   `bigquery:"workflow_run_id" json:"workflow_run_id"`
	WorkflowRunAttempt string   `bigquery:"workflow_run_attempt" json:"workflow_run_attempt"`
	PullRequestNumbers []string `bigquery:"pull_request_numbers" json:"pull_request_numbers"`
}

EventRecord maps the columns from the driving BigQuery query to a usable structure.

type ObjectStore

type ObjectStore struct {
	// contains filtered or unexported fields
}

ObjectStore is an implementation of the ObjectWriter interface that writes to Cloud Storage.

func NewObjectStore

func NewObjectStore(ctx context.Context) (*ObjectStore, error)

NewObjectStore creates a ObjectWriter implementation that uses cloud storage to store its objects.

func (*ObjectStore) Write

func (s *ObjectStore) Write(ctx context.Context, content io.Reader, objectDescriptor string) error

Write writes an object to Google Cloud Storage.

type ObjectWriter

type ObjectWriter interface {
	Write(ctx context.Context, content io.Reader, descriptor string) error
}

ObjectWriter is an interface for writing a object/blob to a storage medium.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL