deploy

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2026 License: MIT Imports: 7 Imported by: 0

README

Workflow Engine — Deployment Guide

This directory contains deployment configurations for the Workflow engine.

Directory Structure

deploy/
├── tofu/                    # OpenTofu (Terraform-compatible) IaC
│   ├── modules/
│   │   ├── alb/             # Application Load Balancer
│   │   ├── ecr/             # Elastic Container Registry
│   │   ├── ecs/             # ECS Fargate cluster + service
│   │   ├── elasticache/     # Redis via ElastiCache
│   │   ├── monitoring/      # CloudWatch dashboards + alarms
│   │   ├── rds/             # PostgreSQL via RDS
│   │   └── vpc/             # VPC, subnets, NAT gateway
│   └── environments/
│       ├── dev/             # Development — small instances, single-AZ
│       ├── staging/         # Staging — multi-AZ, medium instances
│       └── production/      # Production — multi-AZ, large instances, autoscaling
├── docker-compose/          # Local development stack
│   ├── docker-compose.yml
│   ├── prometheus.yml
│   └── grafana/
│       └── provisioning/
├── helm/                    # Kubernetes Helm chart
│   └── workflow/
├── grafana/                 # Grafana dashboard JSON files
└── prometheus/              # Prometheus alert rules

Architecture Overview

AWS (OpenTofu)
Internet → ALB (public subnets) → ECS Fargate (private subnets)
                                       ↓
                               RDS PostgreSQL (private subnets)
                               ElastiCache Redis (private subnets)

All compute and data services run in private subnets. Only the ALB is public-facing. Traffic flows:

  1. HTTPS on port 443 terminates at the ALB with an ACM certificate
  2. HTTP on port 80 is redirected to HTTPS
  3. ALB forwards to ECS tasks on port 8080
  4. ECS tasks connect to RDS (port 5432) and Redis (port 6379) via security groups
Kubernetes (Helm)

The Helm chart deploys the workflow server as a Deployment with:

  • HorizontalPodAutoscaler for CPU/memory-based scaling
  • PodDisruptionBudget for safe maintenance
  • ServiceMonitor for Prometheus Operator metrics scraping
  • Ingress for external access

Prerequisites

OpenTofu
  • OpenTofu >= 1.6.0
  • AWS CLI configured with appropriate permissions
  • An ACM certificate for HTTPS (must be in the same region)
Helm
  • Helm >= 3.0
  • A running Kubernetes cluster
  • kubectl configured
Docker Compose
  • Docker >= 24.0 with Compose v2

Deploying with OpenTofu

First-time setup
cd deploy/tofu/environments/dev

# Copy and edit the example vars file
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values

# Initialize
tofu init

# Preview changes
tofu plan

# Apply
tofu apply
Deploying a new image version
cd deploy/tofu/environments/<env>
tofu apply -var="image_tag=v0.5.1"
Environment differences
Setting dev staging production
ECS CPU 256 512 2048
ECS Memory 512 MB 1024 MB 4096 MB
ECS Desired Count 1 2 3 (autoscales)
RDS Instance db.t3.micro db.t3.small db.r7g.large
RDS Multi-AZ No Yes Yes
Redis Nodes 1 1 2 (with failover)
Redis Instance cache.t3.micro cache.t3.small cache.r7g.large
Log Retention 30 days 30 days 90 days
Deletion Protection No Yes Yes

The production environment uses an S3 backend. Create the backend resources once:

# Create the S3 bucket and DynamoDB table for state
aws s3api create-bucket --bucket workflow-tofu-state --region us-east-1
aws s3api put-bucket-versioning --bucket workflow-tofu-state \
  --versioning-configuration Status=Enabled
aws dynamodb create-table \
  --table-name workflow-tofu-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

Deploying with Helm

# Add the chart (if published to a registry)
# helm repo add workflow https://charts.example.com/workflow

# Or install from local path:
helm install workflow ./deploy/helm/workflow \
  --namespace workflow \
  --create-namespace \
  --set image.tag=v0.5.0 \
  --set ingress.enabled=true \
  --set ingress.hosts[0].host=workflow.example.com \
  --set autoscaling.enabled=true

# Upgrade
helm upgrade workflow ./deploy/helm/workflow \
  --set image.tag=v0.5.1

# With a values file (recommended)
helm upgrade --install workflow ./deploy/helm/workflow \
  -f my-values.yaml
Key Helm values
Value Default Description
image.tag Chart appVersion Docker image tag
replicaCount 1 Number of replicas (when autoscaling disabled)
autoscaling.enabled false Enable HPA
autoscaling.minReplicas 1 Min pods
autoscaling.maxReplicas 10 Max pods
podDisruptionBudget.enabled false Enable PDB
podDisruptionBudget.minAvailable 1 Min available pods
ingress.enabled false Enable Ingress
ingress.className "" Ingress class (e.g., nginx, alb)
monitoring.serviceMonitor.enabled false Enable Prometheus ServiceMonitor
envFromSecret "" Name of K8s Secret for env vars
Production Helm values example
replicaCount: 3

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70

podDisruptionBudget:
  enabled: true
  minAvailable: 1

ingress:
  enabled: true
  className: alb
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
  hosts:
    - host: workflow.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: workflow-tls
      hosts:
        - workflow.example.com

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 2Gi

monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
    labels:
      prometheus: kube-prometheus

envFromSecret: workflow-secrets

Local Development with Docker Compose

cd deploy/docker-compose

# Start all services (workflow-server + postgres + redis + adminer + prometheus + grafana)
docker compose up -d

# Follow logs
docker compose logs -f workflow-server

# Access:
#   Workflow API:  http://localhost:8080
#   Admin UI:      http://localhost:8081
#   Adminer:       http://localhost:8888  (server: postgres, user/pass: workflow)
#   Prometheus:    http://localhost:9090
#   Grafana:       http://localhost:3000  (admin/admin)

# Stop
docker compose down

# Stop and remove data volumes
docker compose down -v
Running just the server dependencies (external server)
# Start only postgres and redis
docker compose up -d postgres redis

# Then run the server locally
cd ../..
go run ./cmd/server -config example/order-processing-pipeline.yaml

Configuration Options

The workflow server is configured via a YAML file passed with -config. See example/ for sample configs.

Key environment variables:

Variable Description
WORKFLOW_ADDR HTTP listen address (default :8080)
WORKFLOW_DB_HOST PostgreSQL host:port
WORKFLOW_DB_NAME Database name
WORKFLOW_DB_USER Database user
WORKFLOW_DB_PASSWORD Database password
WORKFLOW_REDIS_ADDR Redis address (host:port)
JWT_SECRET Secret for JWT token signing

Monitoring

CloudWatch alarms are configured for:

  • ECS CPU > threshold (default 80%)
  • ECS Memory > threshold (default 85%)
  • ALB 5xx error count > threshold (default 10/min)
  • ALB unhealthy host count > 0

Alerts are sent to an SNS topic with email subscription.

Grafana dashboards in deploy/grafana/ cover:

  • Workflow overview (request rates, latency, errors)
  • Dynamic components (hot-reload activity)
  • Chat platform metrics

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BlueGreenState

type BlueGreenState struct {
	ActiveEnv  Environment `json:"active_env"`
	StandbyEnv Environment `json:"standby_env"`
	ActiveVer  int         `json:"active_version"`
	StandbyVer int         `json:"standby_version"`
}

BlueGreenState tracks the current blue-green deployment state.

type BlueGreenStrategy

type BlueGreenStrategy struct {
	// contains filtered or unexported fields
}

BlueGreenStrategy implements zero-downtime deployments by swapping between two environments.

func NewBlueGreenStrategy

func NewBlueGreenStrategy(logger *slog.Logger) *BlueGreenStrategy

NewBlueGreenStrategy creates a new BlueGreenStrategy.

func (*BlueGreenStrategy) Execute

Execute performs a blue-green deployment: deploy to standby, health check, then switch traffic.

func (*BlueGreenStrategy) GetState

func (s *BlueGreenStrategy) GetState(workflowID string) (*BlueGreenState, bool)

GetState returns the current blue-green state for a workflow.

func (*BlueGreenStrategy) Name

func (s *BlueGreenStrategy) Name() string

Name returns the strategy identifier.

func (*BlueGreenStrategy) Rollback

func (s *BlueGreenStrategy) Rollback(ctx context.Context, workflowID string) (*DeploymentResult, error)

Rollback switches traffic back to the previous environment.

func (*BlueGreenStrategy) Validate

func (s *BlueGreenStrategy) Validate(config map[string]any) error

Validate checks the blue-green configuration.

type CanaryConfig

type CanaryConfig struct {
	InitialPercent float64       `json:"initial_percent"` // default 10
	Increment      float64       `json:"increment"`       // default 20
	Interval       time.Duration `json:"interval"`        // default 30s
	ErrorThreshold float64       `json:"error_threshold"` // default 5 (percent)
}

CanaryConfig holds the tunable parameters for canary deployments.

func DefaultCanaryConfig

func DefaultCanaryConfig() CanaryConfig

DefaultCanaryConfig returns the default canary configuration.

type CanaryStrategy

type CanaryStrategy struct {
	// contains filtered or unexported fields
}

CanaryStrategy implements gradual traffic-shifting deployments.

func NewCanaryStrategy

func NewCanaryStrategy(logger *slog.Logger) *CanaryStrategy

NewCanaryStrategy creates a new CanaryStrategy.

func (*CanaryStrategy) Execute

Execute performs a canary deployment with gradual traffic shifting.

func (*CanaryStrategy) GetSplit

func (s *CanaryStrategy) GetSplit(workflowID string) (*TrafficSplit, bool)

GetSplit returns the current traffic split for a workflow.

func (*CanaryStrategy) Name

func (s *CanaryStrategy) Name() string

Name returns the strategy identifier.

func (*CanaryStrategy) Rollback

func (s *CanaryStrategy) Rollback(ctx context.Context, workflowID string) (*DeploymentResult, error)

Rollback immediately shifts all traffic back to the stable version.

func (*CanaryStrategy) SetHealthCheck

func (s *CanaryStrategy) SetHealthCheck(fn func(ctx context.Context, workflowID string, version int) (float64, error))

SetHealthCheck sets a custom health check function for canary evaluation.

func (*CanaryStrategy) Validate

func (s *CanaryStrategy) Validate(config map[string]any) error

Validate checks the canary-specific configuration.

type DeploymentPlan

type DeploymentPlan struct {
	WorkflowID  string         `json:"workflow_id"`
	FromVersion int            `json:"from_version"`
	ToVersion   int            `json:"to_version"`
	Strategy    string         `json:"strategy"` // "rolling", "blue-green", "canary"
	Config      map[string]any `json:"config"`
}

DeploymentPlan describes a deployment to execute.

type DeploymentResult

type DeploymentResult struct {
	Status      string    `json:"status"` // "success", "failed", "rolled_back"
	StartedAt   time.Time `json:"started_at"`
	CompletedAt time.Time `json:"completed_at"`
	Message     string    `json:"message"`
	RolledBack  bool      `json:"rolled_back"`
}

DeploymentResult captures the outcome of a deployment.

type DeploymentStrategy

type DeploymentStrategy interface {
	// Name returns the strategy identifier (e.g., "rolling", "blue-green", "canary").
	Name() string

	// Validate checks the strategy-specific configuration.
	Validate(config map[string]any) error

	// Execute runs the deployment according to the plan.
	Execute(ctx context.Context, plan *DeploymentPlan) (*DeploymentResult, error)
}

DeploymentStrategy defines the interface for workflow deployment strategies.

type Environment

type Environment string

Environment identifies a blue-green deployment slot.

const (
	EnvBlue  Environment = "blue"
	EnvGreen Environment = "green"
)

type Executor

type Executor struct {
	// contains filtered or unexported fields
}

Executor bridges deployment strategies with cloud providers. It looks up the appropriate strategy and provider, validates the plan, and delegates execution to the cloud provider.

func NewExecutor

func NewExecutor(strategies *StrategyRegistry) *Executor

NewExecutor creates an Executor backed by the given strategy registry.

func (*Executor) Deploy

func (e *Executor) Deploy(ctx context.Context, providerName string, req provider.DeployRequest) (*provider.DeployResult, error)

Deploy executes a deployment through the named provider, using the strategy identified in the request. It validates the strategy config, deploys via the provider, and handles rollback on failure when configured.

func (*Executor) GetProvider

func (e *Executor) GetProvider(name string) (provider.CloudProvider, bool)

GetProvider returns the cloud provider with the given name, or false if not found.

func (*Executor) RegisterProvider

func (e *Executor) RegisterProvider(name string, p provider.CloudProvider)

RegisterProvider adds a cloud provider under the given name.

type RollingConfig

type RollingConfig struct {
	BatchSize int           `json:"batch_size"` // default 1
	Delay     time.Duration `json:"delay"`      // default 5s
}

RollingConfig holds the tunable parameters for rolling deployments.

func DefaultRollingConfig

func DefaultRollingConfig() RollingConfig

DefaultRollingConfig returns the default rolling configuration.

type RollingStrategy

type RollingStrategy struct {
	// contains filtered or unexported fields
}

RollingStrategy implements simple rolling update deployments, updating instances one batch at a time.

func NewRollingStrategy

func NewRollingStrategy(logger *slog.Logger) *RollingStrategy

NewRollingStrategy creates a new RollingStrategy.

func (*RollingStrategy) Execute

Execute performs a rolling deployment, processing instances in batches.

func (*RollingStrategy) Name

func (s *RollingStrategy) Name() string

Name returns the strategy identifier.

func (*RollingStrategy) Validate

func (s *RollingStrategy) Validate(config map[string]any) error

Validate checks the rolling-specific configuration.

type StrategyRegistry

type StrategyRegistry struct {
	// contains filtered or unexported fields
}

StrategyRegistry holds the available deployment strategies.

func NewStrategyRegistry

func NewStrategyRegistry(logger *slog.Logger) *StrategyRegistry

NewStrategyRegistry creates a registry pre-loaded with built-in strategies.

func (*StrategyRegistry) Execute

func (r *StrategyRegistry) Execute(plan *DeploymentPlan) (*DeploymentResult, error)

Execute is a convenience method that looks up a strategy and executes a plan.

func (*StrategyRegistry) Get

Get returns the strategy with the given name, or false if not found.

func (*StrategyRegistry) List

func (r *StrategyRegistry) List() []string

List returns the sorted names of all registered strategies.

func (*StrategyRegistry) Register

func (r *StrategyRegistry) Register(s DeploymentStrategy)

Register adds a strategy to the registry, replacing any existing one with the same name.

type TrafficSplit

type TrafficSplit struct {
	CanaryPercent float64 `json:"canary_percent"`
	StablePercent float64 `json:"stable_percent"`
	CanaryVersion int     `json:"canary_version"`
	StableVersion int     `json:"stable_version"`
}

TrafficSplit tracks how traffic is distributed between versions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL