adaptive

package
v1.5.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 24, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Overview

Package adaptive implements a dual-engine controller that dynamically switches between io_uring and epoll based on runtime telemetry.

The adaptive Engine starts both sub-engines on the same port (via SO_REUSEPORT) and periodically evaluates their performance scores. If the standby engine's historical score exceeds the active engine's score by more than a threshold, the controller triggers a switch. Oscillation detection locks switching for five minutes after three rapid switches.

Users select the adaptive engine via celeris.Config{Engine: celeris.Adaptive}. It is the default engine on Linux.

Documentation

Full guides and examples: https://goceleris.dev/docs/engines

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine is an adaptive meta-engine that switches between io_uring and epoll.

The two sub-engine slots map to a fixed protocol direction the controller keys off: primary is ALWAYS the epoll engine (the controller's activeIsPrimary==true means epoll is active) and secondary is ALWAYS the io_uring engine. On the public New() path only the START engine is built eagerly; the other slot stays nil until the first switch actually needs it (see buildStandby + performSwitch). Under the default policy the start engine is epoll, so the io_uring standby is built lazily — and only if a sustained high-concurrency ramp promotes new conns to it; an engine that never switches never constructs its standby, so that heap never exists. newFromEngines (tests) populates BOTH slots eagerly, exercising the standby-already-exists switch path.

Lifecycle: the engine starts on epoll under the default policy and promotes NEW connections to io_uring once a sustained high-concurrency ramp develops (established connections are pinned and never migrate). Live switching is the most complex path in this package and has historically been the source of rare, hard-to-reproduce issues; the SwitchRejectedCount / EngineMetrics.AdaptiveSwitches counters exist so a throughput anomaly can be correlated with switching activity. Operators who need fully deterministic behaviour can pin the start engine via CELERIS_ADAPTIVE_START=epoll|iouring (see chooseStartEngine), which disables the runtime switch. For benchmarking, run the adaptive columns multiple times: a rare switch transient can skew a single pass.

func New

func New(cfg resource.Config, handler stream.Handler, cpuMon engine.CPUMonitor) (*Engine, error)

New creates a new adaptive engine. Only the START engine is built and Listen'd eagerly; the other engine (the standby) is constructed lazily on the first switch that actually needs it. The start engine is chosen by chooseStartEngine from the probed io_uring capabilities (feature-gated, with a CELERIS_ADAPTIVE_START env override).

Both sub-engines bind the SAME SO_REUSEPORT port so the adaptive switch is transparent: resolvePort pins a concrete port up front, and the lazily-built standby reuses it. Building only the start engine eliminates the parked standby's GC-rooted heap — on a modern kernel that starts on io_uring and never reverts, the epoll standby is never constructed (≈0 standby tax).

cpuMon is an engine.CPUMonitor (the public interface); when non-nil it supplies the live sampler with CPU utilization data so the io_uring bias can fire in the empirical sweet spot. External callers can pass their own implementation or the built-in /proc/stat monitor. Pass nil for tests or when CPU monitoring is not available; the sampler degrades gracefully with CPUUtilization=0 in the snapshot.

Example

ExampleNew shows that adaptive.New accepts any engine.CPUMonitor, including one defined entirely outside the celeris module. No Output directive: this example documents the public signature and is compiled, not executed (it would need a live io_uring/epoll-capable kernel to run).

//go:build linux

package main

import (
	"context"
	"time"

	"github.com/goceleris/celeris/adaptive"
	"github.com/goceleris/celeris/engine"
	"github.com/goceleris/celeris/protocol/h2/stream"
	"github.com/goceleris/celeris/resource"
)

// fakeCPUMonitor is an external implementation of the PUBLIC engine.CPUMonitor
// interface — proof that callers outside the module can supply their own
// monitor without depending on the internal cpumon package.
type fakeCPUMonitor struct{ util float64 }

func (f fakeCPUMonitor) Sample() (engine.CPUSample, error) {
	return engine.CPUSample{Utilization: f.util, Timestamp: time.Now()}, nil
}

// ExampleNew shows that adaptive.New accepts any engine.CPUMonitor, including
// one defined entirely outside the celeris module. No Output directive: this
// example documents the public signature and is compiled, not executed (it
// would need a live io_uring/epoll-capable kernel to run).
func main() {
	var mon engine.CPUMonitor = fakeCPUMonitor{util: 0.9}

	cfg := resource.Config{Addr: ":0", Protocol: engine.HTTP1}
	handler := stream.HandlerFunc(func(_ context.Context, _ *stream.Stream) error { return nil })

	eng, err := adaptive.New(cfg, handler, mon)
	if err != nil {
		return
	}
	_ = eng
}

func (*Engine) ActiveEngine

func (e *Engine) ActiveEngine() engine.Engine

ActiveEngine returns the currently active engine.

func (*Engine) Addr

func (e *Engine) Addr() net.Addr

Addr returns the bound listener address.

func (*Engine) DriverFDCount added in v1.4.0

func (e *Engine) DriverFDCount() int

DriverFDCount reports the number of driver FDs currently registered on either sub-engine. Exposed for tests and observability.

func (*Engine) ForceSwitch added in v0.3.5

func (e *Engine) ForceSwitch()

ForceSwitch triggers an immediate engine switch (for testing).

func (*Engine) FreezeSwitching

func (e *Engine) FreezeSwitching()

FreezeSwitching prevents the controller from switching engines.

FreezeSwitching is reference-counted: every call must be matched by a corresponding UnfreezeSwitching. The engine remains frozen until every external freeze has been released AND every driver-registered FD has been unregistered. This makes it safe for benchmarks and drivers to hold independent freezes without clobbering each other.

func (*Engine) Listen

func (e *Engine) Listen(ctx context.Context) error

Listen starts ONLY the active sub-engine and the evaluation loop. The standby is built and Listen'd lazily by performSwitch on the first switch (joined under the same ctx + wait group captured here).

func (*Engine) Metrics

func (e *Engine) Metrics() engine.EngineMetrics

Metrics aggregates metrics from whichever sub-engines exist. On the lazy New() path a never-built standby is nil and contributes nothing.

func (*Engine) NumWorkers added in v1.4.0

func (e *Engine) NumWorkers() int

NumWorkers delegates to the currently active sub-engine's EventLoopProvider. Returns 0 if the active engine does not implement EventLoopProvider.

func (*Engine) SetFreezeCooldown added in v1.1.0

func (e *Engine) SetFreezeCooldown(d time.Duration)

SetFreezeCooldown sets the duration to suppress further switches after a switch. Zero disables the cooldown (default). This prevents oscillation under unstable load.

func (*Engine) Shutdown

func (e *Engine) Shutdown(ctx context.Context) error

Shutdown gracefully shuts down both sub-engines.

It first cancels and JOINS the goroutine started by Listen (the evaluation loop), so that no controller tick can still be sampling telemetry — including the CPU monitor the server closes immediately after Shutdown returns — by the time this function completes. Only then are the sub-engines shut down. This is purely a join/sequencing concern; it does not touch the ACTIVE→DRAINING→SUSPENDED worker lifecycle.

func (*Engine) SwitchRejectedCount added in v1.4.0

func (e *Engine) SwitchRejectedCount() uint64

SwitchRejectedCount reports how many engine-switch attempts were blocked by outstanding driver FDs since the engine started. Monotonic; useful for tests asserting that a switch actually happened (or did not).

func (*Engine) Type

func (e *Engine) Type() engine.EngineType

Type returns the engine type.

func (*Engine) UnfreezeSwitching

func (e *Engine) UnfreezeSwitching()

UnfreezeSwitching releases one external freeze. The engine only becomes thawed when the external freeze count and the driver-FD count both reach zero. Calling UnfreezeSwitching more times than FreezeSwitching is a no-op (and does NOT unfreeze the engine if drivers still hold FDs).

func (*Engine) WorkerLoop added in v1.4.0

func (e *Engine) WorkerLoop(n int) engine.WorkerLoop

WorkerLoop returns a driver-safe wrapper around the currently active sub-engine's worker N. The wrapper:

  1. Re-fetches the active sub-engine on every RegisterConn, so the FD lands on whichever sub-engine is active at that instant — this matters only during the tiny window between one register/unregister cycle and the next (the refcount pins frozen while any FD lives).
  2. Pins the specific inner engine.WorkerLoop each FD was registered on so Write/Unregister always go to the right sub-engine.
  3. Increments the engine's driver-FD refcount on RegisterConn and decrements on UnregisterConn, which keeps [Engine.performSwitch] from swapping sub-engines while any FD is live.

Callers never need to call Engine.FreezeSwitching manually.

type TelemetrySampler

type TelemetrySampler interface {
	Sample(e engine.Engine) TelemetrySnapshot
}

TelemetrySampler produces telemetry snapshots from an engine.

type TelemetrySnapshot

type TelemetrySnapshot struct {
	// Timestamp is when this snapshot was taken.
	Timestamp time.Time
	// ThroughputRPS is the recent requests-per-second rate.
	ThroughputRPS float64
	// ErrorRate is the fraction of requests that resulted in errors (0.0-1.0).
	ErrorRate float64
	// ActiveConnections is the current number of open connections.
	ActiveConnections int64
	// CPUUtilization is the estimated CPU usage fraction (0.0-1.0). Read from
	// the live sampler's CPUMonitor; zero when no monitor is wired.
	CPUUtilization float64
	// ConnsPerWorker is ActiveConnections divided by the engine's worker
	// count. This is the PRIMARY load signal driving engine selection:
	// epoll and io_uring tie at low conns/worker, but io_uring pulls ahead
	// and keeps scaling above ~20/worker while epoll plateaus.
	ConnsPerWorker float64
	// AcceptRate is the new-connection arrival rate (accepts/sec) over the
	// last sampling interval, derived like ThroughputRPS. A secondary load
	// signal: a high accept rate indicates connection churn.
	AcceptRate float64
	// BytesPerReq is the average payload size (read+written bytes per
	// request) over the last interval. When this exceeds the controller's
	// large-payload threshold the workload is link-bound — the engines tie
	// — so the controller suppresses an io_uring switch to avoid churn.
	BytesPerReq float64
}

TelemetrySnapshot captures a point-in-time view of engine performance, used by the controller to decide whether to switch engines.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL