scheduler

package

v1.6.0 Latest Latest Go to latest Published: Oct 15, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cuemby/warren

Links

Open Source Insights

Documentation ¶

Overview ¶

Package scheduler provides container scheduling and orchestration for Warren clusters.

The scheduler is responsible for assigning pending containers to healthy worker nodes based on resource availability, volume affinity, and load balancing requirements. It runs as a continuous background process, ensuring that service replica counts match their desired state and that containers are evenly distributed across the cluster.

Architecture ¶

The scheduler operates on a fixed 5-second interval, processing all services and their associated containers in each cycle:

┌────────────────────────────────────────────────────────────┐
│                    Scheduler Loop                          │
│                   (Every 5 seconds)                        │
└────────────────┬───────────────────────────────────────────┘
                 │
                 ▼
┌────────────────────────────────────────────────────────────┐
│  1. List all services and worker nodes                    │
│  2. Filter nodes: Ready + Worker role only                │
│  3. For each service:                                      │
│     • List existing containers                             │
│     • Compare actual vs desired state                      │
│     • Create missing containers OR remove excess           │
└────────────────┬───────────────────────────────────────────┘
                 │
    ┌────────────┴────────────┐
    │                         │
    ▼                         ▼
┌─────────────┐       ┌──────────────┐
│  Replicated │       │    Global    │
│   Services  │       │   Services   │
└─────┬───────┘       └──────┬───────┘
      │                      │
      ▼                      ▼
  Round-robin            One per node
  with volume            assignment
  affinity

Core Components ¶

Scheduler: The main scheduling engine that orchestrates container placement.

scheduler := NewScheduler(manager)
scheduler.Start()  // Begins 5-second scheduling loop
defer scheduler.Stop()

The scheduler maintains no internal state beyond the manager reference - all cluster state is read from the manager on each cycle, making it stateless and resilient to restarts.

Scheduling Algorithms ¶

## Replicated Service Scheduling

Replicated services specify a desired replica count. The scheduler ensures exactly that many containers are running:

Service: nginx (replicas=3)
Current containers: 2 running
Action: Create 1 new container

Node selection uses a simple round-robin algorithm with container counting:

Count containers per node (only running containers)
Select node with fewest containers
Create container on selected node

## Global Service Scheduling

Global services run exactly one container per worker node:

Service: monitoring-agent (mode=global)
Worker nodes: 5
Action: Ensure 1 container per node

The scheduler automatically creates containers when new nodes join and removes containers when nodes are decommissioned.

## Volume Affinity

When a service uses volumes, containers must be scheduled on the node where the volume resides:

Service: database (volume=db-data)
Volume: db-data (nodeID=worker-1)
Constraint: Container MUST run on worker-1

This ensures data locality for stateful workloads. If a volume doesn't exist yet, the scheduler selects a node using standard load balancing, and the volume is created on that node.

Usage Examples ¶

## Basic Scheduler Setup

import (
	"github.com/cuemby/warren/pkg/scheduler"
	"github.com/cuemby/warren/pkg/manager"
)

// Create manager (provides cluster state access)
mgr := manager.NewManager(store, raftNode)

// Create and start scheduler
sched := scheduler.NewScheduler(mgr)
sched.Start()

// Scheduler runs automatically every 5 seconds
// ...

// Gracefully stop scheduler
sched.Stop()

## Testing Scheduler Behavior

// Create test fixtures
service := &types.Service{
	ID:       "svc-1",
	Name:     "nginx",
	Mode:     types.ServiceModeReplicated,
	Replicas: 3,
	Image:    "nginx:latest",
}

nodes := []*types.Node{
	{ID: "node-1", Role: types.NodeRoleWorker, Status: types.NodeStatusReady},
	{ID: "node-2", Role: types.NodeRoleWorker, Status: types.NodeStatusReady},
}

// Scheduler will create 3 containers distributed across 2 nodes
// Expected: 2 containers on one node, 1 container on the other

## Volume-Aware Scheduling

// Service with volume requirement
service := &types.Service{
	ID:      "db-1",
	Name:    "postgres",
	Image:   "postgres:15",
	Replicas: 1,
	Volumes: []*types.VolumeMount{
		{
			Source:      "postgres-data",
			Target:      "/var/lib/postgresql/data",
			Driver:      "local",
		},
	},
}

// Volume exists on specific node
volume := &types.Volume{
	ID:     "vol-1",
	Name:   "postgres-data",
	NodeID: "worker-2",  // Must run on worker-2
	Driver: "local",
}

// Scheduler will place container on worker-2 due to volume affinity

Integration Points ¶

## Manager Integration

The scheduler depends on the manager package for all cluster state operations:

ListServices() - Get all services
ListNodes() - Get all worker nodes
ListContainersByService(serviceID) - Get containers for a service
CreateContainer(container) - Create new container
UpdateContainer(container) - Update container state
GetVolumeByName(name) - Check volume affinity

## Reconciler Coordination

The scheduler works in tandem with the reconciler:

Scheduler: Creates containers to meet desired state
Reconciler: Detects failures and marks containers for replacement
Scheduler: Sees failed containers and creates replacements

This separation of concerns ensures clean boundaries:

Scheduler: "Make it happen" (proactive)
Reconciler: "Fix what's broken" (reactive)

## Worker Integration

Workers pull containers assigned to them via the manager:

Scheduler assigns container to node-1
Worker on node-1 polls for containers (via manager)
Worker starts runtime container
Worker reports container state back to manager

Design Patterns ¶

## Stateless Design

The scheduler maintains no persistent state. All decisions are made based on current cluster state read from the manager. This makes the scheduler:

Resilient to crashes (no state to lose)
Easy to reason about (no hidden state)
Simple to test (just mock the manager)

## Reconciliation Loop Pattern

The scheduler implements the reconciliation loop pattern common in orchestrators:

forever {
	actual_state = read_cluster_state()
	desired_state = read_service_specs()
	diff = desired_state - actual_state
	apply(diff)
	sleep(5_seconds)
}

## Separation of Concerns

The scheduler only creates and removes containers. It does NOT:

Start/stop runtime containers (worker's job)
Monitor container health (reconciler's job)
Update container runtime state (worker's job)

This clear separation prevents coupling and makes each component testable.

Performance Characteristics ¶

## Time Complexity

Per scheduling cycle (N services, M nodes, C containers):

List services: O(N)
List nodes: O(M)
List containers per service: O(C)
Node selection: O(M * C) worst case (counting containers per node)
Overall: O(N * (C + M))

For a typical cluster (100 services, 10 nodes, 500 containers):

~0.5-1 second per scheduling cycle
Well within 5-second interval

## Memory Usage

Minimal memory footprint:

No caching (reads from manager each cycle)
Temporary allocations for node/container lists
~10-20 MB for typical cluster sizes

## Scheduling Latency

Time from service creation to container running:

Best case: 5 seconds (next scheduler cycle)
Worst case: 10 seconds (just missed previous cycle)
Average: 7.5 seconds

For faster scheduling, reduce the ticker interval in the run() method, but be aware of increased CPU usage and API load on the manager.

Troubleshooting ¶

## Containers Not Being Created

Check these common issues:

1. No worker nodes available:

Run: warren node ls
Ensure nodes are in "Ready" state
Check node heartbeats (should be < 30s ago)

2. Scheduler not running:

Check manager logs for "scheduler" component
Verify scheduler.Start() was called

3. Service configuration issues:

Ensure replica count > 0
Check volume constraints (volume must exist or be creatable)

## Containers Stuck in "Pending"

The scheduler creates containers but workers start them. Debug:

1. Check worker logs:

Worker should see assigned containers
Look for containerd/image pull errors

2. Check container details:

Run: warren service ps <service-name>
Look at container error messages

## Uneven Container Distribution

If containers are not evenly distributed:

1. Check container state filtering:

Only running containers count toward load balancing
Failed/completed containers don't affect placement

2. Verify node readiness:

Scheduler only uses "Ready" worker nodes
Down nodes are excluded from scheduling

## Volume Affinity Not Working

If containers aren't being pinned to volume nodes:

1. Verify volume exists:

Run: warren volume ls
Check volume.NodeID field

2. Check service volume configuration:

Ensure volume name matches exactly
Verify volume driver is "local"

Monitoring Metrics ¶

The scheduler doesn't currently export Prometheus metrics, but you can monitor:

## Log-based Metrics

"Created container" - New container created
"Scheduler error" - Scheduling failure
"Selecting node X for service Y (volume affinity)" - Volume pinning

## Manager Metrics

Containers created per service (via manager API)
Container state distribution (pending/running/failed)
Node utilization (containers per node)

Best Practices ¶

1. Scheduler Interval Tuning

Default 5s balances responsiveness vs. overhead
For large clusters (>100 nodes), consider 10s
For dev/test, can reduce to 1s for faster feedback

2. Service Replica Planning

Set replicas <= number of worker nodes (for even distribution)
Use global mode for node-level services (monitoring, logging)
Consider resource limits when scaling replicas

3. Volume-Backed Services

Use replicas=1 for stateful services with local volumes
Pin volume to specific node by creating it first
Consider node labels for volume placement control (future feature)

4. Resource Constraints

Scheduler respects container resource limits (CPU/memory)
Ensure nodes have sufficient capacity for all replicas
Monitor node resource usage to prevent over-subscription

Index ¶

type Scheduler
- func NewScheduler(mgr *manager.Manager) *Scheduler
- func (s *Scheduler) Start()
- func (s *Scheduler) Stop()

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Scheduler ¶

type Scheduler struct {
	// contains filtered or unexported fields
}

Scheduler assigns containers to nodes based on resource availability

func NewScheduler ¶

func NewScheduler(mgr *manager.Manager) *Scheduler

NewScheduler creates a new scheduler

func (*Scheduler) Start ¶

func (s *Scheduler) Start()

Start begins the scheduler loop

func (*Scheduler) Stop ¶

func (s *Scheduler) Stop()

Stop stops the scheduler

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Overview ¶

Architecture ¶

Core Components ¶

Scheduling Algorithms ¶

Usage Examples ¶

Integration Points ¶

Design Patterns ¶

Performance Characteristics ¶

Troubleshooting ¶

Monitoring Metrics ¶

Best Practices ¶

See Also ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type Scheduler ¶

func NewScheduler ¶

func (*Scheduler) Start ¶

func (*Scheduler) Stop ¶

Source Files ¶