herd

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

README

Herd: The Go Edge Cloud Core

Herd is the core engine for an Application Delivery Plane. Unlike raw microVM orchestrators that just boot virtual motherboard instances, Herd provides a complete delivery stack: holding L7 connections, cold-booting microVMs in sub-500ms, and securely tunneling traffic from standard OCI/Docker images.

🧱 The Engine Architecture

Herd implements four critical layers that bridge the gap between "hardware" and "application":

  • OCI Translation: Automatically pulls Docker images via containerd, extracts CMD/ENV, and flattens them into Copy-on-Write rootfs snapshots.
  • L7 "Wake-on-Request" Proxy: A high-speed reverse proxy that intercepts HTTP requests, cold-boots the corresponding VM if needed, and tunnels the traffic inside.
  • Automated IPAM: Zero-config networking pool that manages subnets, TAP interfaces, and NAT routing.
  • Guest Agent Execution: Injects herd-guest-agent as PID 1 to handle internal OS setup and workload execution.

🛰️ The "Brutal Difference"

Feature Pure Orchestrator (e.g., Raw Firecracker) Herd (Fly.io Open Source Core)
Input Custom Kernel + Raw ext4 Disk Image Standard Docker/OCI Image
Ingress None. You must install Traefik/Nginx Built-in L7 Reverse Proxy
Lifecycle Turn On / Turn Off Wake-on-HTTP-Request (Scale to Zero)
User Experience Systems Engineer (Hard) Application Developer (Easy)

🛠️ Installation & Running

Herd requires Linux with KVM to run the Firecracker microVMs. It also requires containerd for storage.

1. Build
go build -o herd ./cmd/herd
2. Bootstrap Host Storage
# Prepare the host (loop devices, devmapper, containerd config)
sudo ./herd bootstrap --config herd.yaml
3. Start the Daemon
sudo ./herd start --config herd.yaml

Note: Herd requires sudo for managing KVM, TAP devices, and devmapper snapshots.

🔌 API Overview

Control Plane (REST)
Method Endpoint Description
POST /v1/sessions Acquire a new session (wakes up the VM).
PUT /v1/sessions/{id}/heartbeat Keep the session alive.
GET /v1/sessions/{id}/logs Stream real-time logs from the VM.
Data Plane (HTTP Proxy)
  • Port: 8080 (Data Plane)
  • Routing Header: X-Session-ID
  • Behavior: Routes directly to the pinned VM. Supports cold-boot "Wake-on-Request" for known sessions.

⚙️ Configuration Reference (herd.yaml)

  • network.control_bind: "127.0.0.1:8001"
  • network.data_bind: "127.0.0.1:8080"
  • storage.state_dir: "/var/lib/herd"
  • storage.snapshotter_name: "devmapper"

For more details, see CLI & Configuration Reference.

Documentation

Overview

options.go — functional options for Pool construction.

pool.go — Session-affine process pool with singleflight Acquire.

Core invariant

1 sessionID → 1 Worker, for the lifetime of the session.

Concurrency model (read this before touching Acquire)

There are three maps protected by a single mutex (p.mu):

p.sessions  map[string]Worker[C]       — live sessionID → worker bindings
p.inflight  map[string]chan struct{}    — in-progress Acquire for a session

And one lock-free channel:

p.tickets chan struct{}                — bounded concurrency tokens

The singleflight guarantee for Acquire(ctx, sessionID, config):

  1. Lock → check sessions → if found: unlock, return (FAST PATH).
  2. Lock → check inflight → if pending: grab chan, unlock, wait on it, then restart from step 1 when chan closes.
  3. Lock → create inflight[sessionID] = make(chan struct{}) → unlock.
  4. Block on <-p.tickets (or ctx cancel).
  5. Call p.factory.Spawn(ctx, sessionID, config). If error: return ticket, close inflight, return error.
  6. Lock → sessions[sessionID]=w, delete inflight[sid], close(ch) → unlock. Closing ch broadcasts to all goroutines waiting in step 2.
  7. Return &Session[C]{…}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewLocalRegistry

func NewLocalRegistry[C any]() *localRegistry[C]

Types

type FirecrackerFactory added in v0.5.0

type FirecrackerFactory struct {
	FirecrackerPath string
	KernelImagePath string
	// GuestAgentPath is the host path to the static herd-guest-agent binary; it is
	// copied into each VM rootfs before boot (no initrd).
	GuestAgentPath string
	Storage        *storage.Manager

	SocketPathDir string
	IPAM          *network.IPAM
}

FirecrackerFactory is a minimal implementation of a WorkerFactory that spawns Firecracker VMs.

func (*FirecrackerFactory) Spawn added in v0.5.0

func (f *FirecrackerFactory) Spawn(ctx context.Context, sessionID string, config TenantConfig) (Worker[*http.Client], error)

Spawn starts a new Firecracker VM.

func (*FirecrackerFactory) WarmImage added in v0.5.0

func (f *FirecrackerFactory) WarmImage(ctx context.Context, imageRef string) error

WarmImage ensures an image is cached without starting a VM.

type FirecrackerWorker added in v0.5.0

type FirecrackerWorker struct {
	// contains filtered or unexported fields
}

FirecrackerWorker represents a single running Firecracker VM.

func (*FirecrackerWorker) Address added in v0.5.0

func (f *FirecrackerWorker) Address() string

Address returns the HTTP base URL for the workload on the guest LAN (data-plane reverse proxy). Exec and other vsock paths use VsockUDSPath.

func (*FirecrackerWorker) Client added in v0.5.0

func (f *FirecrackerWorker) Client() *http.Client

Client returns the HTTP client.

func (*FirecrackerWorker) Close added in v0.5.0

func (f *FirecrackerWorker) Close() error

Close kills the VM and cleans up resources. It blocks until the Firecracker process has fully exited before tearing down storage, preventing "device busy" errors on the devmapper thin volume.

func (*FirecrackerWorker) GuestIP added in v0.5.0

func (f *FirecrackerWorker) GuestIP() string

GuestIP returns the internal IP allocated to the worker.

func (*FirecrackerWorker) Healthy added in v0.5.0

func (f *FirecrackerWorker) Healthy(ctx context.Context) error

Healthy checks if the VM is up. For now we just check if process is alive.

func (*FirecrackerWorker) ID added in v0.5.0

func (f *FirecrackerWorker) ID() string

ID returns the worker ID.

func (*FirecrackerWorker) OnCrash added in v0.5.0

func (f *FirecrackerWorker) OnCrash(fn func(sessionID string))

OnCrash sets a crash handler.

func (*FirecrackerWorker) VsockUDSPath added in v0.5.0

func (f *FirecrackerWorker) VsockUDSPath() string

VsockUDSPath is the host Unix socket Firecracker exposes for vsock connect.

type Option

type Option func(*config)

Option is a functional option for New[C].

func WithCrashHandler

func WithCrashHandler(fn func(sessionID string)) Option

WithCrashHandler registers a callback invoked when a worker's subprocess exits unexpectedly while it holds an active session.

fn receives the sessionID that was lost. Use it to:

  • Delete session-specific state in your database
  • Return an error to the end user ("your session was interrupted")
  • Trigger a re-run of the failed job

fn is called from a background monitor goroutine. It must not block for extended periods; spawn a goroutine if you need to do heavy work.

If WithCrashHandler is not set, crashes are only logged.

func WithMaxWorkers added in v0.5.0

func WithMaxWorkers(max int) Option

WithMaxWorkers sets the max capacity.

  • max: hard cap on concurrent workers; Acquire blocks once this limit is reached until a worker becomes available.

Panics if max < 1.

type Pool

type Pool[C any] struct {
	// contains filtered or unexported fields
}

func New

func New[C any](factory WorkerFactory[C], opts ...Option) (*Pool[C], error)

func (*Pool[C]) Acquire

func (p *Pool[C]) Acquire(ctx context.Context, sessionID string, config TenantConfig) (*Session[C], error)

func (*Pool[C]) Factory added in v0.5.0

func (p *Pool[C]) Factory() WorkerFactory[C]

func (*Pool[C]) GetSession

func (p *Pool[C]) GetSession(ctx context.Context, sessionID string) (*Session[C], error)

func (*Pool[C]) KillWorker added in v0.5.0

func (p *Pool[C]) KillWorker(sessionID string, reason string) error

func (*Pool[C]) Shutdown

func (p *Pool[C]) Shutdown(ctx context.Context) error

func (*Pool[C]) Stats

func (p *Pool[C]) Stats() PoolStats

type PoolStats

type PoolStats struct {
	TotalWorkers     int
	ActiveSessions   int
	InflightAcquires int
	Node             observer.NodeStats
}

type Session

type Session[C any] struct {
	ID     string
	Worker Worker[C]
	// contains filtered or unexported fields
}

func (*Session[C]) Release

func (s *Session[C]) Release()

type SessionRegistry

type SessionRegistry[C any] interface {
	// Get returns the worker pinned to sessionID.
	// Returns (nil, nil) if no session exists for this ID.
	Get(ctx context.Context, sessionID string) (Worker[C], error)

	// Put pins a worker to a sessionID.
	Put(ctx context.Context, sessionID string, w Worker[C]) error

	// Delete removes the pinning for sessionID.
	Delete(ctx context.Context, sessionID string) error

	// List returns a snapshot of all currently active sessions.
	// Primarily used for background health checks and cleanup.
	List(ctx context.Context) (map[string]Worker[C], error)

	// Len returns the number of active sessions.
	Len() int
}

SessionRegistry tracks which workers are pinned to which session IDs. In a distributed setup (Enterprise), this registry is shared across multiple nodes.

type TenantConfig added in v0.5.0

type TenantConfig struct {
	Image              string
	Command            []string
	Env                map[string]string
	IdleTimeoutSeconds int
	TTLSeconds         int
	HealthInterval     string
}

TenantConfig describes the on-demand container workload and its per-session limits.

type Worker

type Worker[C any] interface {
	// ID returns a stable, unique identifier for this worker (e.g. "worker-3").
	// Never reused — not even after a crash and restart.
	ID() string

	// Address returns the internal network URI the worker
	// is listening on (e.g., '127.0.0.1:54321').
	Address() string

	// Client returns the typed connection to the worker process.
	// For most users this is *http.Client; gRPC users return their stub here.
	Client() C

	// Healthy performs a liveness check against the subprocess.
	// Returns nil if the worker is accepting requests; non-nil otherwise.
	// Pool.Acquire calls this before handing a worker to a new session,
	// so a stale or crashed worker is never returned to a caller.
	Healthy(ctx context.Context) error

	// OnCrash sets a callback triggered when the worker process exits unexpectedly.
	OnCrash(func(sessionID string))

	// Close performs graceful shutdown of the worker process.
	// Called by the pool during scale-down or Pool.Shutdown.
	io.Closer
}

Worker represents one running subprocess managed by the pool.

C is the typed client the caller uses to talk to the subprocess — for example *http.Client, a gRPC connection, or a custom struct. The type parameter is constrained to "any" so the pool is fully generic.

type WorkerFactory

type WorkerFactory[C any] interface {
	// Spawn starts one new worker and blocks until it is healthy.
	// If ctx is cancelled before the worker becomes healthy, Spawn must
	// kill the process and return a non-nil error.
	Spawn(ctx context.Context, sessionID string, config TenantConfig) (Worker[C], error)

	// WarmImage ensures the requested image is present in the local cache.
	WarmImage(ctx context.Context, imageRef string) error
}

WorkerFactory knows how to spawn one worker process and return a typed Worker[C] that is ready to accept requests (i.e. Healthy returns nil).

Implement WorkerFactory to define custom spawn logic (e.g. Firecracker microVM, Docker container, remote SSH process).

Directories

Path Synopsis
cmd
herd command
test_boot command
internal
Package observer provides lightweight, OS-level resource sampling.
Package observer provides lightweight, OS-level resource sampling.
proto
Package proxy provides NewReverseProxy — the one-liner that turns a session-affine process pool into an HTTP gateway.
Package proxy provides NewReverseProxy — the one-liner that turns a session-affine process pool into an HTTP gateway.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL