scaling

command

v1.0.0 Latest Latest Go to latest Published: Jan 2, 2026 License: MIT Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/getpup/pupsourcing

Links

Open Source Insights

README ¶

Scaling Example - From 1 to N Workers

This example demonstrates how to safely scale a projection from 1 worker to N workers without data loss or downtime.

What It Demonstrates

Adding workers incrementally (1 → 2 → 3 → 4)
Each worker independently catches up from its checkpoint
No coordination needed between workers
Safe to add/remove workers at any time

Running the Example

Step 1: Prepare Database and Events

# Start PostgreSQL
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=pupsourcing_example postgres:16

# Apply migrations (from ../basic)
cd ../basic && go generate

# Append sample events (one-time)
cd ../scaling
go run main.go --worker-id=0 --append

Step 2: Start with 1 Worker

Terminal 1:

WORKER_ID=0 go run main.go

Watch it process events. It handles 100% of events since it's the only worker.

Step 3: Add Second Worker

Terminal 2:

WORKER_ID=1 go run main.go

Now you have 2 workers:

Worker 0: Processes ~50% of events (its partition)
Worker 1: Catches up and processes ~50% of events (its partition)

Step 4: Add Third Worker

Terminal 3:

WORKER_ID=2 go run main.go

Now you have 3 workers, each handling ~33% of events.

Step 5: Add Fourth Worker

Terminal 4:

WORKER_ID=3 go run main.go

Now you have 4 workers, each handling ~25% of events.

Key Observations

1. Independent Catchup

Each new worker:

Starts from its own checkpoint (position 0 initially)
Reads events from the beginning
Skips events not in its partition
Catches up to real-time independently

2. No Coordination Required

No need to pause existing workers
No need to reconfigure existing workers
No distributed locks or leader election
Just start the new worker

3. Load Distribution

Watch the logs to see:

Which aggregate IDs go to which worker (deterministic)
Each worker processes approximately equal events
Same aggregate always goes to the same worker

Scaling Patterns

Scaling Up (1 → N)

# Start with 1 worker
WORKER_ID=0 TOTAL_WORKERS=4 go run main.go

# Add more workers as needed
WORKER_ID=1 TOTAL_WORKERS=4 go run main.go
WORKER_ID=2 TOTAL_WORKERS=4 go run main.go
WORKER_ID=3 TOTAL_WORKERS=4 go run main.go

Scaling Down (N → M)

Simply stop workers. Their checkpoints remain in the database.

# Stop worker 3 (Ctrl+C)
# Stop worker 2 (Ctrl+C)
# Workers 0 and 1 continue processing their partitions

Changing Partition Count

⚠️ Important: Changing the total number of partitions requires reconfiguring ALL workers:

# From 4 partitions to 8 partitions
# 1. Stop all workers
# 2. Start 8 new workers with --total-workers=8
# 3. Each will reprocess events (idempotent projections required)

Production Deployment

Kubernetes Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: projection-worker
spec:
  replicas: 4
  template:
    spec:
      containers:
      - name: worker
        image: myapp:latest
        env:
        - name: WORKER_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['statefulset.kubernetes.io/pod-name']
        - name: TOTAL_WORKERS
          value: "4"
        command: ["./myapp", "projection-worker"]

Docker Compose Example

version: '3.8'
services:
  worker-0:
    image: myapp:latest
    environment:
      WORKER_ID: "0"
      TOTAL_WORKERS: "4"
    command: ["./myapp", "projection-worker"]
  
  worker-1:
    image: myapp:latest
    environment:
      WORKER_ID: "1"
      TOTAL_WORKERS: "4"
    command: ["./myapp", "projection-worker"]
  
  # ... worker-2, worker-3

Systemd Example

# /etc/systemd/system/projection-worker@.service
[Unit]
Description=Projection Worker %i
After=network.target postgresql.service

[Service]
Type=simple
User=myapp
Environment="WORKER_ID=%i"
Environment="TOTAL_WORKERS=4"
ExecStart=/usr/local/bin/myapp projection-worker
Restart=on-failure

[Install]
WantedBy=multi-user.target

# Enable and start workers
sudo systemctl enable projection-worker@0
sudo systemctl enable projection-worker@1
sudo systemctl enable projection-worker@2
sudo systemctl enable projection-worker@3
sudo systemctl start projection-worker@0
sudo systemctl start projection-worker@1
sudo systemctl start projection-worker@2
sudo systemctl start projection-worker@3

Monitoring

What to Monitor

Checkpoint Lag: How far behind real-time each worker is
Events Processed: Rate of event processing per worker
Worker Health: Is each worker running and processing events?

Example Query - Check Lag

SELECT 
    projection_name,
    last_global_position,
    (SELECT MAX(global_position) FROM events) - last_global_position as lag
FROM projection_checkpoints
WHERE projection_name = 'scalable_projection';

Common Scenarios

Adding Capacity

Your projection is falling behind:

Check current worker count
Add more workers (double the count is common)
Watch lag decrease as new workers catch up

Removing Capacity

Your projection is over-provisioned:

Stop unnecessary workers
Remaining workers continue processing their partitions
Checkpoints for stopped workers remain (safe to restart later)

Worker Failure

A worker crashes:

Other workers continue unaffected
Restart the crashed worker
It resumes from its checkpoint automatically

Documentation ¶

Overview ¶

Package main demonstrates how to safely scale projections from 1 → N workers. This example shows that you can add workers dynamically and they will catch up independently.

Run this example in stages:

Start with worker 0: WORKER_ID=0 go run main.go
Add worker 1: WORKER_ID=1 go run main.go (in a new terminal)
Add worker 2: WORKER_ID=2 go run main.go (in a new terminal)
Add worker 3: WORKER_ID=3 go run main.go (in a new terminal)

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL