workflowexecution

package
v1.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2026 License: Apache-2.0 Imports: 37 Imported by: 0

Documentation

Overview

Package workflowexecution provides failure analysis for Tekton PipelineRun failures.

This file implements BR-WE-012 (Exponential Backoff) by detecting and categorizing failures to determine appropriate retry strategies.

Failure Analysis:

  • Pre-Execution Failures: Configuration errors, permission issues, image pull failures → Apply exponential backoff (DD-WE-004)
  • Execution Failures: Task-level failures during PipelineRun execution → Report to user, no automatic retry

Failure Categories: - OOMKilled: Container out of memory - DeadlineExceeded: Timeout reached - Forbidden: Permission denied - ImagePullBackOff: Container image not available - ConfigurationError: Invalid workflow configuration - TaskFailed: Workflow task failed during execution

See: docs/architecture/decisions/DD-WE-004-exponential-backoff.md

Package workflowexecution provides the WorkflowExecution CRD controller.

Business Purpose (BR-WE-003): WorkflowExecution orchestrates Tekton PipelineRuns for workflow execution, providing resource locking, exponential backoff, and comprehensive failure reporting.

Key Responsibilities: - BR-WE-003: Monitor execution status and sync with PipelineRun - BR-WE-005: Generate audit trail for execution lifecycle - BR-WE-006: Expose Kubernetes Conditions for status tracking - BR-WE-008: Emit Prometheus metrics for execution outcomes - BR-WE-012: Apply exponential backoff for failed executions

Architecture: - Pure Executor: Only executes workflows (routing handled by RemediationOrchestrator) - Status Sync: Continuously syncs WFE status with PipelineRun status - Failure Analysis: Detects Tekton task failures and reports detailed reasons

Design Decisions: - DD-WE-001: Resource locking safety (prevents concurrent execution on same target) - DD-WE-002: Dedicated execution namespace (isolates PipelineRuns) - DD-WE-003: Deterministic lock names (enables resource lock persistence) - DD-WE-004: Exponential backoff for pre-execution failures

See: docs/services/crd-controllers/03-workflowexecution/ for detailed documentation

Index

Constants

View Source
const (
	// FinalizerName is the finalizer for WorkflowExecution cleanup
	// Per finalizers-lifecycle.md: domain/resource-cleanup pattern
	FinalizerName = "workflowexecution.kubernaut.ai/workflowexecution-cleanup"

	// DefaultCooldownPeriod is the default time between workflow executions on same target
	DefaultCooldownPeriod = 5 * time.Minute
)

Variables

This section is empty.

Functions

This section is empty.

Types

type WorkflowExecutionReconciler

type WorkflowExecutionReconciler struct {
	client.Client
	Scheme   *runtime.Scheme
	Recorder record.EventRecorder

	// DD-STATUS-001: APIReader bypasses informer cache for direct API server reads.
	// Used in reconcilePending to prevent race conditions from stale cache data:
	// - Prevents duplicate audit events (cache lag between concurrent reconciles)
	// - Ensures ExecutionRef is fresh for external deletion detection
	APIReader client.Reader

	// Metrics for observability (DD-005, DD-METRICS-001)
	// Per DD-METRICS-001: Metrics MUST be dependency-injected, not global variables
	// Initialized in main.go and injected via SetupWithManager()
	Metrics *metrics.Metrics

	// ========================================
	// STATUS MANAGER (DD-PERF-001)
	// 📋 Design Decision: DD-PERF-001 | ✅ Atomic Status Updates Pattern
	// See: docs/architecture/decisions/DD-PERF-001-atomic-status-updates-mandate.md
	// ========================================
	//
	// StatusManager manages atomic status updates to reduce K8s API calls
	// Consolidates multiple status field updates into single atomic operations
	//
	// BENEFITS:
	// - 50%+ API call reduction (2 updates → 1 atomic update)
	// - Eliminates race conditions from sequential updates
	// - Reduces etcd write load and watch events
	//
	// WIRED IN: cmd/workflowexecution/main.go
	// USAGE: r.StatusManager.AtomicStatusUpdate(ctx, wfe, func() { ... })
	StatusManager *status.Manager

	// ExecutionNamespace is where PipelineRuns are created (DD-WE-002)
	// Default: "kubernaut-workflows"
	ExecutionNamespace string

	// CooldownPeriod prevents redundant sequential workflows (DD-WE-001)
	// Default: 5 minutes
	CooldownPeriod time.Duration

	// AuditStore for writing audit events (BR-WE-005, ADR-032)
	// Uses pkg/audit buffered store via Data Storage Service
	// Optional: nil disables audit (graceful degradation)
	AuditStore audit.AuditStore

	// PhaseManager manages phase state machine logic (P0: Phase State Machine)
	// Per CONTROLLER_REFACTORING_PATTERN_LIBRARY.md §1
	// Provides validated phase transitions and terminal state checking
	PhaseManager *wephase.Manager

	// AuditManager manages audit event emission (P3: Audit Manager)
	// Per CONTROLLER_REFACTORING_PATTERN_LIBRARY.md §7
	// Provides typed audit methods for better testability
	AuditManager *weaudit.Manager

	// ExecutorRegistry dispatches to the correct execution backend (BR-WE-014)
	// Maps execution engine names ("tekton", "job") to Executor implementations.
	// When nil, falls back to inline Tekton-only code path.
	ExecutorRegistry *weexecutor.Registry

	// DD-WE-006: WorkflowQuerier fetches workflow dependencies from DS on demand.
	// Optional: nil disables dependency injection (workflows run without mounted deps).
	WorkflowQuerier weclient.WorkflowQuerier

	// DD-WE-006: DependencyValidator validates that declared dependencies exist
	// with non-empty data in the execution namespace (defense in depth).
	// Optional: nil disables execution-time validation.
	DependencyValidator dsvalidation.DependencyValidator
}

WorkflowExecutionReconciler reconciles a WorkflowExecution object

func (*WorkflowExecutionReconciler) BuildPipelineRunStatusSummary

BuildPipelineRunStatusSummary creates a lightweight status summary from PipelineRun Provides visibility into task progress during execution (v3.2)

func (*WorkflowExecutionReconciler) CheckCooldownActive

func (r *WorkflowExecutionReconciler) CheckCooldownActive(ctx context.Context, targetResource, currentWFEKey string) (time.Duration, bool)

======================================== CheckCooldownActive checks if cooldown is active for a target resource BR-WE-009: Cooldown Period is Configurable Returns (remaining duration, is active) currentWFEName format: "namespace/name" to uniquely identify the current WFE ========================================

func (*WorkflowExecutionReconciler) ExtractFailureDetails

ExtractFailureDetails extracts structured failure information from PipelineRun Day 7: Now includes TaskRun-specific fields (FailedTaskName, FailedTaskIndex, ExitCode) Day 6 Extension (BR-WE-012): Includes WasExecutionFailure for backoff decisions Maps Tekton failure reasons to our FailureReason enum

func (*WorkflowExecutionReconciler) FindFailedTaskRun

FindFailedTaskRun finds the first failed TaskRun in a PipelineRun's ChildReferences Returns the TaskRun, its index in ChildReferences, and any error Returns (nil, -1, nil) if no failed TaskRun is found

func (*WorkflowExecutionReconciler) FindWFEForOwnedResource added in v1.1.0

func (r *WorkflowExecutionReconciler) FindWFEForOwnedResource(ctx context.Context, obj client.Object) []reconcile.Request

======================================== FindWFEForOwnedResource maps owned resource events (PipelineRun, Job) to WorkflowExecution reconcile requests. Both executors label their resources with kubernaut.ai/workflow-execution and kubernaut.ai/source-namespace. ========================================

func (*WorkflowExecutionReconciler) GenerateNaturalLanguageSummary

GenerateNaturalLanguageSummary creates a human/LLM-readable failure description For failure reporting and user notifications Day 9 (v3.5): Handles nil FailureDetails gracefully per Q4 decision

func (*WorkflowExecutionReconciler) HandleAlreadyExists

func (r *WorkflowExecutionReconciler) HandleAlreadyExists(ctx context.Context, wfe *workflowexecutionv1alpha1.WorkflowExecution, resourceName string, err error) (ctrl.Result, error)

======================================== HandleAlreadyExists handles the race condition where PipelineRun already exists DD-WE-003: Layer 2 - Execution-time collision handling (not routing) V1.0: Fails WFE if race condition detected (RO should have prevented this) ========================================

func (*WorkflowExecutionReconciler) MarkCompleted

MarkCompleted transitions WFE to Completed phase Calculates Duration from StartTime to CompletionTime (v3.2) Day 6 Extension (BR-WE-012): Resets ConsecutiveFailures counter Records metrics per BR-WE-008 (Day 7)

func (*WorkflowExecutionReconciler) MarkFailed

MarkFailed transitions WFE to Failed phase with FailureDetails Extracts failure information from PipelineRun (v3.2) Day 6 Extension (BR-WE-012): Handles exponential backoff for pre-execution failures Records metrics per BR-WE-008 (Day 7)

func (*WorkflowExecutionReconciler) MarkFailedWithReason

func (r *WorkflowExecutionReconciler) MarkFailedWithReason(ctx context.Context, wfe *workflowexecutionv1alpha1.WorkflowExecution, reason, message string) error

======================================== MarkFailedWithReason - Handle pre-execution failures Used for validation errors, configuration errors before PipelineRun creation ========================================

func (*WorkflowExecutionReconciler) Reconcile

Reconcile handles WorkflowExecution reconciliation Phase-based reconciliation per implementation plan

func (*WorkflowExecutionReconciler) ReconcileDelete

======================================== ReconcileDelete - Handle deletion with finalizer DD-WE-003: Use deterministic PipelineRun name finalizers-lifecycle.md: Event emission ========================================

func (*WorkflowExecutionReconciler) ReconcileTerminal

======================================== ReconcileTerminal - Handle Completed/Failed phases Day 6: Cooldown enforcement and cleanup DD-WE-003: Lock Persistence (Deterministic Name) ========================================

func (*WorkflowExecutionReconciler) SetupWithManager

func (r *WorkflowExecutionReconciler) SetupWithManager(mgr ctrl.Manager) error

======================================== SetupWithManager sets up the controller with the Manager ========================================

func (*WorkflowExecutionReconciler) ValidateSpec

ValidateSpec validates the WorkflowExecution spec Returns error if validation fails (ConfigurationError reason)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL