promotion

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2026 License: GPL-3.0 Imports: 16 Imported by: 0

README

Promotion Controller

The promotion controller orchestrates automatic PR promotion across environments in the cd-operator. It watches PullRequestTracker CRDs, loads promotion policies, triggers external tests, monitors results, and promotes to the next environment when all gates pass.

Architecture

The promotion package follows hexagonal architecture with clear domain boundaries:

internal/promotion/
├── controller.go      # Main reconciliation controller
├── policy.go          # Policy loading and caching
├── trigger.go         # External test triggering
├── monitor.go         # Test status monitoring
├── executor.go        # Promotion execution
├── types.go           # Domain types
└── controller_test.go # Comprehensive tests

Key Components

Reconciler (controller.go)

The main controller that orchestrates the promotion workflow:

  1. Fetch PullRequestTracker - Get the CR being reconciled
  2. Check readiness - Verify PR is in "synced" state
  3. Load policy - Find PromotionPolicy for current environment
  4. Trigger tests - Start external tests (if not already triggered)
  5. Monitor tests - Poll test status until completion
  6. Execute promotion - Update ArgoCD and GitHub on success
  7. Update status - Record promotion in CR status

Requeue Strategy:

  • Tests running: 30s (frequent polling)
  • Tests passed: 10s (quick promotion)
  • Promoted: 5m (check for next environment)
  • Error: 1m (retry with backoff)
  • Policy missing: 10m (infrequent check)
PolicyCache (policy.go)

Thread-safe cache of PromotionPolicy CRDs indexed by source environment.

Features:

  • In-memory caching for performance
  • Automatic refresh on cache misses
  • Policy validation before use
  • Support for multiple concurrent readers

Validation:

  • Source/target environments are different
  • Test names are unique
  • Provider and job names are non-empty
  • Timeouts are non-negative
TriggerManager (trigger.go)

Handles triggering of external tests in parallel.

Features:

  • Parallel test execution using goroutines
  • Variable substitution in parameters
  • Test metadata tracking for status polling
  • Graceful error handling (partial failures reported)

Variable Substitution: Supports placeholders in test parameters:

  • ${pr.number} - PR number
  • ${pr.head.sha} - HEAD commit SHA
  • ${pr.repository} - Repository (owner/repo)
  • ${cd.environment} - Target environment
  • ${cd.version} - Extracted version
  • ${custom.*} - Custom metadata
TestMonitor (monitor.go)

Monitors external test execution status and aggregates results.

Features:

  • Parallel status polling
  • Timeout detection
  • Required vs optional test handling
  • Aggregate result calculation

Result States:

  • AllPassed - All required tests succeeded
  • AnyFailed - At least one required test failed
  • AnyRunning - Tests still executing
  • AnyTimeout - Tests exceeded timeout
PromotionExecutor (executor.go)

Executes the actual promotion to the target environment.

Steps:

  1. Update ArgoCD application to target environment
  2. Add promotion label to GitHub PR (cd/promoted:env)
  3. Record promotion in CR status (audit trail)
  4. Emit Kubernetes event for observability

Audit Trail: Every promotion creates a PromotionRecord with:

  • Source and target environments
  • Timestamp
  • Test results
  • Policy name
  • Promoted by (cd-operator)

Domain Types (types.go)

TestExecution

Tracks a triggered external test run:

type TestExecution struct {
    Name      string                // Test name
    Provider  external.Provider     // CI system (jenkins/github-actions/gitlab)
    RunID     string                // Unique identifier for polling
    StartedAt time.Time            // Trigger timestamp
    Required  bool                  // Must pass for promotion?
    JobName   string                // Job/workflow name
    URL       string                // Web link to test run
}
TestAggregateResult

Summarizes outcome of all tests:

type TestAggregateResult struct {
    AllPassed            bool
    AnyFailed            bool
    AnyRunning           bool
    AnyTimeout           bool
    Results              map[string]*external.TestStatus
    FailureReasons       []string
    OptionalTestsFailed  bool
}
PromotionResult

Result of a promotion attempt:

type PromotionResult struct {
    Promoted    bool
    TargetEnv   string
    TestResults TestAggregateResult
    Error       error
    PromotedAt  time.Time
    PromotedBy  string
    PolicyName  string
}

Usage Example

Creating a Reconciler
import (
    "github.com/grhili/cd-operator/internal/promotion"
    "github.com/grhili/cd-operator/pkg/external"
)

// Setup dependencies
testRunner := external.NewCompositeRunner(...)
argoCDClient := argocd.NewClient(...)
githubClient := github.NewClient(...)
eventRecorder := mgr.GetEventRecorderFor("promotion-controller")
logger := zap.NewLogger()

// Create reconciler
reconciler := promotion.NewReconciler(
    mgr.GetClient(),
    mgr.GetScheme(),
    testRunner,
    argoCDClient,
    githubClient,
    eventRecorder,
    logger,
)

// Register with controller-runtime
if err := reconciler.SetupWithManager(mgr); err != nil {
    log.Fatal(err)
}
Creating a PromotionPolicy
apiVersion: cd.grhili.io/v1alpha1
kind: PromotionPolicy
metadata:
  name: dev-to-staging
  namespace: default
spec:
  sourceEnvironment: dev
  targetEnvironment: staging
  autoPromote: true
  timeout: 45m
  externalTests:
    - name: integration-tests
      provider: github-actions
      jobName: integration.yml
      parameters:
        PR_NUMBER: "${pr.number}"
        COMMIT_SHA: "${pr.head.sha}"
        ENVIRONMENT: "${cd.environment}"
      required: true
    - name: smoke-tests
      provider: jenkins
      jobName: smoke-tests/job
      parameters:
        VERSION: "${cd.version}"
      required: false
PullRequestTracker Flow
State: synced (in dev environment)
  ↓
Load PromotionPolicy for "dev"
  ↓
Trigger external tests (integration-tests, smoke-tests)
  ↓
Monitor test status (poll every 30s)
  ↓
All required tests passed?
  ├─ Yes → Execute promotion to staging
  │        ├─ Update ArgoCD app
  │        ├─ Add GitHub label: cd/promoted:staging
  │        └─ Record in promotion history
  │
  └─ No → Block promotion (requeue with long interval)

Testing

The package includes comprehensive tests using the external testing pattern:

  • ✅ PR not found (deleted)
  • ✅ PR not in synced state
  • ✅ No promotion policy found
  • ✅ Trigger external tests
  • ✅ Monitor tests while running
  • ✅ Tests passed and promoted
  • ✅ Tests failed (promotion blocked)
  • ✅ Auto-promote disabled

Coverage: 74.3%

Run tests:

go test ./internal/promotion/... -v -cover

Conditions

The controller sets these conditions on PullRequestTracker:

Condition Status Reason Meaning
TestsTriggered True TestsTriggered External tests started
TestsPassed True TestsPassed All required tests succeeded
TestsFailed True TestsFailed One or more tests failed
PromotionCompleted True Promoted PR promoted to next env

Error Handling

  • Policy not found: Requeue with 10m interval (may be created later)
  • Test trigger failure: Requeue with 1m backoff
  • Test status poll failure: Continue monitoring (transient failure)
  • Promotion execution failure: Requeue with 1m backoff
  • ArgoCD API failure: Return error for retry
  • GitHub API failure: Log warning (labels are informational)

Metrics

The controller exposes these metrics (via controller-runtime):

  • controller_runtime_reconcile_total - Total reconciliations
  • controller_runtime_reconcile_errors_total - Reconciliation errors
  • controller_runtime_reconcile_time_seconds - Reconciliation latency

Observability

Every promotion generates:

  1. Kubernetes Event - For audit trail

    Type: Normal
    Reason: Promoted
    Message: Promoted from dev to staging using policy dev-to-staging. All tests passed.
    
  2. Status Update - In PullRequestTracker CR

    status:
      currentEnvironment: staging
      promotionHistory:
        - fromEnvironment: dev
          toEnvironment: staging
          promotedAt: "2026-02-16T23:00:00Z"
          promotedBy: cd-operator
          policyName: dev-to-staging
          testResults: [...]
    
  3. GitHub Label - On the PR

    cd/promoted:staging
    

Future Enhancements

  • Support for manual promotion approval
  • Promotion policies with multiple test stages
  • Rollback on deployment failure
  • Promotion rate limiting (gradual rollout)
  • Integration with notification systems (Slack, email)
  • Promotion dry-run mode
  • Custom promotion strategies (blue/green, canary)

Documentation

Overview

Package promotion provides the controller for automatic PR promotion across environments. The controller orchestrates the complete promotion workflow by watching PullRequestTracker CRDs, loading promotion policies, triggering tests, and promoting on success.

Package promotion provides the controller and domain logic for automatic PR promotion across environments. It orchestrates the complete promotion workflow: discover synced PRs → load policies → trigger tests → monitor results → promote on success.

The promotion controller operates on PullRequestTracker CRDs that have reached the "synced" state (successfully deployed in an environment). It loads the appropriate PromotionPolicy based on the current environment, triggers external tests as defined in the policy, monitors their completion, and promotes to the next environment if all required tests pass.

Architecture follows hexagonal principles:

  • Controller orchestrates workflow and manages CRD lifecycle
  • Domain types are defined here (no SDK leakage)
  • External dependencies injected via interfaces (TestRunner, ArgoCDClient, GitHubClient)
  • Policy evaluation is deterministic and stateless
  • Test execution is idempotent and restartable

Index

Constants

This section is empty.

Variables

View Source
var ErrPolicyNotFound = fmt.Errorf("promotion policy not found")

ErrPolicyNotFound is returned when no promotion policy matches the query.

Functions

func BuildPromotionRecord

func BuildPromotionRecord(
	result PromotionResult,
	sourceEnv string,
	testRuns []v1alpha1.TestRun,
) v1alpha1.PromotionRecord

BuildPromotionRecord creates a PromotionRecord from a promotion result for CRD status. This provides the audit trail of promotions in the PullRequestTracker status.

Parameters:

  • result: PromotionResult from Execute()
  • sourceEnv: Source environment (from policy)
  • testRuns: Test runs that gated the promotion

Returns v1alpha1.PromotionRecord for CRD status.

func ConvertToV1Alpha1TestRuns

func ConvertToV1Alpha1TestRuns(
	executions []TestExecution,
	results map[string]*external.TestStatus,
) []v1alpha1.TestRun

ConvertToV1Alpha1TestRuns converts TestExecution records to v1alpha1.TestRun format for storing in CRD status. This bridges the domain layer with the API layer.

Parameters:

  • executions: Domain test execution records
  • results: Current test status results (optional, may be nil)

Returns v1alpha1.TestRun slice suitable for CRD status.

func ValidatePolicy

func ValidatePolicy(policy *v1alpha1.PromotionPolicy) error

ValidatePolicy checks if a promotion policy has valid configuration. This should be called before using the policy to trigger tests or promotion.

Validation checks:

  • Source and target environments are non-empty and different
  • External tests have unique names
  • External tests have valid provider and job names
  • Timeout is positive

Parameters:

  • policy: Policy to validate

Returns error describing the validation failure, or nil if valid.

Types

type ArgoCDClient

type ArgoCDClient interface {
	// GetApplicationStatus retrieves deployment status from ArgoCD.
	GetApplicationStatus(ctx context.Context, cluster, application string) (*deployment.DeploymentStatus, error)

	// UpdateApplication updates an ArgoCD application configuration.
	// Used to promote a PR by updating the target environment.
	UpdateApplication(ctx context.Context, cluster, application, targetEnv string) error
}

ArgoCDClient defines the interface for ArgoCD operations. Abstracts ArgoCD API calls to allow testing with mocks.

type EventRecorder

type EventRecorder interface {
	// Event records an event for an object.
	Event(object interface{}, eventType, reason, message string)
}

EventRecorder defines the interface for recording Kubernetes events. Used for audit trail of promotions.

type GitHubClient

type GitHubClient interface {
	// AddLabel adds a label to a PR.
	AddLabel(ctx context.Context, owner, repo string, number int, label string) error

	// RemoveLabel removes a label from a PR.
	RemoveLabel(ctx context.Context, owner, repo string, number int, label string) error
}

GitHubClient defines the interface for GitHub operations. Abstracts GitHub API calls to allow testing with mocks.

type Logger

type Logger interface {
	Info(msg string, keysAndValues ...interface{})
	Error(err error, msg string, keysAndValues ...interface{})
	V(level int) Logger
}

Logger defines the logging interface used by the trigger manager.

type PolicyCache

type PolicyCache struct {
	// contains filtered or unexported fields
}

PolicyCache provides thread-safe caching of PromotionPolicy CRDs. Reduces Kubernetes API calls by maintaining an in-memory cache that can be refreshed periodically or on-demand. Policies are indexed by source environment for efficient lookup during promotion reconciliation.

func NewPolicyCache

func NewPolicyCache(client client.Client) *PolicyCache

NewPolicyCache creates a new policy cache with the given Kubernetes client. The cache starts empty and must be populated via Refresh() before use.

Parameters:

  • client: Kubernetes client for fetching PromotionPolicy CRDs

Returns an initialized PolicyCache.

func (*PolicyCache) GetPolicy

func (c *PolicyCache) GetPolicy(ctx context.Context, sourceEnv string) (*v1alpha1.PromotionPolicy, error)

GetPolicy retrieves a promotion policy for the given source environment. If the policy is not in cache, it queries Kubernetes to find it. Returns ErrPolicyNotFound if no matching policy exists.

Parameters:

  • ctx: Context for cancellation and timeout
  • sourceEnv: Source environment to find policy for (e.g., "dev", "staging")

Returns:

  • Policy matching the source environment
  • ErrPolicyNotFound if no policy matches
  • Wrapped error for Kubernetes API failures

func (*PolicyCache) Refresh

func (c *PolicyCache) Refresh(ctx context.Context) error

Refresh reloads all PromotionPolicy CRDs from Kubernetes into the cache. This is an expensive operation and should be called judiciously (e.g., on cache misses or periodically in the background).

Parameters:

  • ctx: Context for cancellation and timeout

Returns error if Kubernetes API call fails.

type PromotionExecutor

type PromotionExecutor struct {
	// contains filtered or unexported fields
}

PromotionExecutor handles the execution of promotion operations. It coordinates with ArgoCD to update applications and GitHub to update labels, providing a complete audit trail of the promotion.

func NewPromotionExecutor

func NewPromotionExecutor(
	argoCDClient ArgoCDClient,
	githubClient GitHubClient,
	eventRecorder EventRecorder,
	logger Logger,
) *PromotionExecutor

NewPromotionExecutor creates a new promotion executor with the given dependencies.

Parameters:

  • argoCDClient: ArgoCD client for updating applications
  • githubClient: GitHub client for updating PR labels
  • eventRecorder: Kubernetes event recorder for audit trail
  • logger: Logger for diagnostic output

Returns an initialized PromotionExecutor.

func (*PromotionExecutor) Execute

Execute performs the promotion by updating ArgoCD application and GitHub labels. This is the final step in the promotion workflow after all tests have passed.

Steps:

  1. Update ArgoCD application to target environment
  2. Add promotion label to GitHub PR
  3. Record promotion in CRD status
  4. Emit Kubernetes event for audit

Parameters:

  • ctx: Context for cancellation and timeout
  • prt: PullRequestTracker CRD to promote
  • policy: PromotionPolicy that governed this promotion
  • testResults: Final test results for audit trail

Returns:

  • PromotionResult with success/failure details
  • Error if promotion failed

type PromotionResult

type PromotionResult struct {
	// Promoted indicates if the promotion was successfully executed.
	Promoted bool

	// TargetEnv is the environment the PR was promoted to.
	TargetEnv string

	// TestResults contains the final test outcomes that gated promotion.
	TestResults TestAggregateResult

	// Error contains any error encountered during promotion execution.
	// Nil if Promoted is true.
	Error error

	// PromotedAt is the timestamp when promotion completed.
	PromotedAt time.Time

	// PromotedBy indicates who/what triggered the promotion.
	PromotedBy string

	// PolicyName is the name of the PromotionPolicy that governed this promotion.
	PolicyName string
}

PromotionResult represents the outcome of a promotion attempt. Returned by PromotionExecutor.Execute() to indicate success or failure.

type Reconciler

type Reconciler struct {
	client.Client
	Scheme        *runtime.Scheme
	TestRunner    TestRunner
	ArgoCDClient  ArgoCDClient
	GitHubClient  GitHubClient
	EventRecorder EventRecorder
	PolicyCache   *PolicyCache
	Logger        Logger
}

Reconciler reconciles PullRequestTracker resources for promotion. It implements the complete promotion workflow: discover synced PRs → load policies → trigger tests → monitor results → promote on success.

func NewReconciler

func NewReconciler(
	client client.Client,
	scheme *runtime.Scheme,
	testRunner TestRunner,
	argoCDClient ArgoCDClient,
	githubClient GitHubClient,
	eventRecorder EventRecorder,
	logger Logger,
) *Reconciler

NewReconciler creates a new promotion reconciler with the given dependencies.

Parameters:

  • client: Kubernetes client for CRD operations
  • scheme: Runtime scheme for type registration
  • testRunner: TestRunner for triggering and monitoring external tests
  • argoCDClient: ArgoCD client for application updates
  • githubClient: GitHub client for PR label management
  • eventRecorder: Event recorder for Kubernetes audit trail
  • logger: Logger for diagnostic output

Returns a configured Reconciler ready for controller-runtime.

func (*Reconciler) Reconcile

func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)

Reconcile implements the main reconciliation logic for promotion. It is called by controller-runtime whenever a PullRequestTracker is created, updated, or periodically re-queued.

Reconciliation workflow:

  1. Fetch PullRequestTracker CRD
  2. Check if PR is ready for promotion (synced state)
  3. Load promotion policy for current environment
  4. If tests not triggered: trigger external tests
  5. If tests triggered: monitor test status
  6. If all tests passed: execute promotion
  7. Update CRD status and conditions
  8. Requeue based on current state

Parameters:

  • ctx: Context for cancellation and timeout
  • req: Reconcile request with namespaced name

Returns:

  • Result: Controls requeue behavior (interval or no requeue)
  • Error: Causes exponential backoff retry if non-nil

func (*Reconciler) SetupWithManager

func (r *Reconciler) SetupWithManager(mgr ctrl.Manager) error

SetupWithManager sets up the controller with the controller manager. Configures watches and reconciliation options.

Parameters:

  • mgr: Controller manager to register with

Returns error if setup fails.

type TestAggregateResult

type TestAggregateResult struct {
	// AllPassed indicates all required tests passed successfully.
	// If true, promotion can proceed (if auto-promote is enabled).
	AllPassed bool

	// AnyFailed indicates at least one required test failed.
	// If true, promotion is blocked until tests pass.
	AnyFailed bool

	// AnyRunning indicates at least one test is still executing.
	// If true, controller should continue polling.
	AnyRunning bool

	// AnyTimeout indicates at least one test exceeded its timeout.
	// Timeouts are treated as failures for required tests.
	AnyTimeout bool

	// Results maps test name to its current status.
	// Includes all tests (required and optional).
	Results map[string]*external.TestStatus

	// FailureReasons contains human-readable explanations for failures.
	// Empty if AllPassed is true.
	FailureReasons []string

	// OptionalTestsFailed indicates non-required tests failed.
	// Logged for visibility but doesn't block promotion.
	OptionalTestsFailed bool
}

TestAggregateResult summarizes the outcome of all external tests for a promotion. Used to determine if promotion should proceed (all required tests passed).

type TestExecution

type TestExecution struct {
	// Name is the test name from the PromotionPolicy spec.
	Name string

	// Provider identifies the CI system (jenkins, github-actions, gitlab).
	Provider external.Provider

	// RunID is the unique identifier in the CI system for status polling.
	RunID string

	// StartedAt is when the test was triggered by the controller.
	StartedAt time.Time

	// Required indicates if this test must pass for promotion to proceed.
	// If false, test failure is logged but doesn't block promotion.
	Required bool

	// JobName is the CI job/workflow that was triggered (for reference).
	JobName string

	// URL is the web link to view the test run in the CI system.
	URL string
}

TestExecution represents a triggered external test run with tracking metadata. Stored in CRD status to maintain state across reconciliation loops.

type TestMonitor

type TestMonitor struct {
	// contains filtered or unexported fields
}

TestMonitor handles monitoring of external test execution status. It polls test runners for status updates, handles timeouts, and aggregates results to determine if promotion should proceed.

func NewTestMonitor

func NewTestMonitor(runner TestRunner, logger Logger) *TestMonitor

NewTestMonitor creates a new test monitor with the given test runner.

Parameters:

  • runner: TestRunner implementation for polling test status
  • logger: Logger for diagnostic output

Returns an initialized TestMonitor.

func (*TestMonitor) MonitorTests

func (m *TestMonitor) MonitorTests(
	ctx context.Context,
	executions []TestExecution,
	timeout time.Duration,
) (TestAggregateResult, error)

MonitorTests checks the status of all test executions and returns aggregated results. It queries the TestRunner for each test's current status, checks for timeouts, and determines if all required tests have passed.

Parameters:

  • ctx: Context for cancellation and timeout
  • executions: Test executions to monitor (from TriggerTests)
  • timeout: Maximum time to wait for tests to complete

Returns:

  • Aggregated test results indicating overall status
  • Error if status polling fails

type TestRunner

type TestRunner = external.TestRunner

TestRunner defines the interface for triggering and monitoring external tests. Abstracts the external test system (Jenkins, GitHub Actions, GitLab CI). This is an alias for pkg/external.TestRunner to avoid SDK leakage while providing a clean domain boundary.

type TriggerManager

type TriggerManager struct {
	// contains filtered or unexported fields
}

TriggerManager handles triggering of external tests for promotion gates. It builds test configurations from policy specs, performs variable substitution, and triggers tests in parallel across multiple CI systems.

func NewTriggerManager

func NewTriggerManager(runner TestRunner, logger Logger) *TriggerManager

NewTriggerManager creates a new trigger manager with the given test runner.

Parameters:

  • runner: TestRunner implementation for triggering tests
  • logger: Logger for diagnostic output

Returns an initialized TriggerManager.

func (*TriggerManager) TriggerTests

TriggerTests triggers all external tests defined in the promotion policy. Tests are triggered in parallel to minimize latency. Variable substitution is performed on test parameters using context from the PR and environment.

Parameters:

  • ctx: Context for cancellation and timeout
  • prt: PullRequestTracker CRD containing PR metadata
  • policy: PromotionPolicy defining tests to trigger

Returns:

  • Slice of TestExecution records (one per test)
  • Error if any test failed to trigger (partial failure)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL