README
¶
Scaling Example - From 1 to N Workers
This example demonstrates how to safely scale a projection from 1 worker to N workers without data loss or downtime.
What It Demonstrates
- Adding workers incrementally (1 → 2 → 3 → 4)
- Each worker independently catches up from its checkpoint
- No coordination needed between workers
- Safe to add/remove workers at any time
Running the Example
Step 1: Prepare Database and Events
# Start PostgreSQL
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=pupsourcing_example postgres:16
# Apply migrations (from ../basic)
cd ../basic && go generate
# Append sample events (one-time)
cd ../scaling
go run main.go --worker-id=0 --append
Step 2: Start with 1 Worker
Terminal 1:
WORKER_ID=0 go run main.go
Watch it process events. It handles 100% of events since it's the only worker.
Step 3: Add Second Worker
Terminal 2:
WORKER_ID=1 go run main.go
Now you have 2 workers:
- Worker 0: Processes ~50% of events (its partition)
- Worker 1: Catches up and processes ~50% of events (its partition)
Step 4: Add Third Worker
Terminal 3:
WORKER_ID=2 go run main.go
Now you have 3 workers, each handling ~33% of events.
Step 5: Add Fourth Worker
Terminal 4:
WORKER_ID=3 go run main.go
Now you have 4 workers, each handling ~25% of events.
Key Observations
1. Independent Catchup
Each new worker:
- Starts from its own checkpoint (position 0 initially)
- Reads events from the beginning
- Skips events not in its partition
- Catches up to real-time independently
2. No Coordination Required
- No need to pause existing workers
- No need to reconfigure existing workers
- No distributed locks or leader election
- Just start the new worker
3. Load Distribution
Watch the logs to see:
- Which aggregate IDs go to which worker (deterministic)
- Each worker processes approximately equal events
- Same aggregate always goes to the same worker
Scaling Patterns
Scaling Up (1 → N)
# Start with 1 worker
WORKER_ID=0 TOTAL_WORKERS=4 go run main.go
# Add more workers as needed
WORKER_ID=1 TOTAL_WORKERS=4 go run main.go
WORKER_ID=2 TOTAL_WORKERS=4 go run main.go
WORKER_ID=3 TOTAL_WORKERS=4 go run main.go
Scaling Down (N → M)
Simply stop workers. Their checkpoints remain in the database.
# Stop worker 3 (Ctrl+C)
# Stop worker 2 (Ctrl+C)
# Workers 0 and 1 continue processing their partitions
Changing Partition Count
⚠️ Important: Changing the total number of partitions requires reconfiguring ALL workers:
# From 4 partitions to 8 partitions
# 1. Stop all workers
# 2. Start 8 new workers with --total-workers=8
# 3. Each will reprocess events (idempotent projections required)
Production Deployment
Kubernetes Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: projection-worker
spec:
replicas: 4
template:
spec:
containers:
- name: worker
image: myapp:latest
env:
- name: WORKER_ID
valueFrom:
fieldRef:
fieldPath: metadata.labels['statefulset.kubernetes.io/pod-name']
- name: TOTAL_WORKERS
value: "4"
command: ["./myapp", "projection-worker"]
Docker Compose Example
version: '3.8'
services:
worker-0:
image: myapp:latest
environment:
WORKER_ID: "0"
TOTAL_WORKERS: "4"
command: ["./myapp", "projection-worker"]
worker-1:
image: myapp:latest
environment:
WORKER_ID: "1"
TOTAL_WORKERS: "4"
command: ["./myapp", "projection-worker"]
# ... worker-2, worker-3
Systemd Example
# /etc/systemd/system/projection-worker@.service
[Unit]
Description=Projection Worker %i
After=network.target postgresql.service
[Service]
Type=simple
User=myapp
Environment="WORKER_ID=%i"
Environment="TOTAL_WORKERS=4"
ExecStart=/usr/local/bin/myapp projection-worker
Restart=on-failure
[Install]
WantedBy=multi-user.target
# Enable and start workers
sudo systemctl enable projection-worker@0
sudo systemctl enable projection-worker@1
sudo systemctl enable projection-worker@2
sudo systemctl enable projection-worker@3
sudo systemctl start projection-worker@0
sudo systemctl start projection-worker@1
sudo systemctl start projection-worker@2
sudo systemctl start projection-worker@3
Monitoring
What to Monitor
- Checkpoint Lag: How far behind real-time each worker is
- Events Processed: Rate of event processing per worker
- Worker Health: Is each worker running and processing events?
Example Query - Check Lag
SELECT
projection_name,
last_global_position,
(SELECT MAX(global_position) FROM events) - last_global_position as lag
FROM projection_checkpoints
WHERE projection_name = 'scalable_projection';
Common Scenarios
Adding Capacity
Your projection is falling behind:
- Check current worker count
- Add more workers (double the count is common)
- Watch lag decrease as new workers catch up
Removing Capacity
Your projection is over-provisioned:
- Stop unnecessary workers
- Remaining workers continue processing their partitions
- Checkpoints for stopped workers remain (safe to restart later)
Worker Failure
A worker crashes:
- Other workers continue unaffected
- Restart the crashed worker
- It resumes from its checkpoint automatically
See Also
../partitioned- The pattern this example demonstrates../worker-pool- Similar but in a single process../stop-resume- Demonstrates checkpoint reliability
Documentation
¶
Overview ¶
Package main demonstrates how to safely scale projections from 1 → N workers. This example shows that you can add workers dynamically and they will catch up independently.
Run this example in stages:
- Start with worker 0: WORKER_ID=0 go run main.go
- Add worker 1: WORKER_ID=1 go run main.go (in a new terminal)
- Add worker 2: WORKER_ID=2 go run main.go (in a new terminal)
- Add worker 3: WORKER_ID=3 go run main.go (in a new terminal)