herd

package module
v0.5.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2026 License: Apache-2.0 Imports: 19 Imported by: 0

README

Herd: microvm hypervisor

The initial goal was to create a lightweight, secure, and fast execution environment for running docker images. The problem with docker is that it has a major security issue with running containers on the same host. Any zero day exploit in the kernel through a container can lead to a full host compromise.

Now, I am able to successfully run and deploy pre-pulled docker images as microvms. And they are lightning fast. I can start a microvm in under ~500ms, and that includes all the networking, ingress, file system setup on demand.

Different from firecracker

Firecracker microvm is an amazing peice of technology but it's just a dumb hypervisor. It doesn't provide any host side setup for running OCI images, or networking, or ingress, or anything. You have to build all of that yourself. Herd provides all of that out of the box.

The table below highlights the difference between firecracker and herd.

Feature Raw Firecracker Herd
Input Custom Kernel + Raw ext4 Disk Image Standard Docker/OCI Image
Storage Manually create disk images using dd OCI Translation: Automated image-to-snapshot.
Network Creates a TAP device, you route the rest Automated IPAM: Host side NAT + routing.
Ingress No ingress Wake-on-Request Proxy: Host port binding.
Isolation Manually configure jailer for each microvm Automated Isolation: Herd auto configures jailer for each microvm.
Lifecycle Turn On / Turn Off Scale to Zero: Cold-boots on first request [WIP].
User Experience/Complexity Systems Engineer (Hard) Application Developer (Easy)

🛠️ Installation & Running

1. Prerequisites
  • Host OS: Linux (A recent kernel with KVM support).
  • Virtualization: Hardware virtualization (VT-x or AMD-V) must be enabled in the BIOS/UEFI.
  • KVM Access: The /dev/kvm device must exist and be accessible.
    ls -l /dev/kvm
    
  • Root Access: Most herd commands require sudo.

Before installing Herd, ensure your system has containerd and iptables installed:

sudo apt update && sudo apt install -y containerd iptables
2. Quick Install
curl -sSL https://raw.githubusercontent.com/herd-core/herd/main/scripts/install.sh | bash
2. Initialize Host
# Prepare the host (loop devices, devmapper, containerd config)
sudo herd init

# Or in non-interactive mode:
sudo herd init --yes
3. Start the Daemon
sudo herd start
4. Deploy a MicroVM
herd deploy --image postgres:latest -p 5432:5432 -e POSTGRES_PASSWORD=postgres

Note: Herd requires sudo for managing KVM, TAP devices, and devmapper snapshots.

For more details, see CLI & Configuration Reference.

Documentation

Overview

options.go — functional options for Pool construction.

pool.go — Session-affine process pool with singleflight Acquire.

Core invariant

1 sessionID → 1 Worker, for the lifetime of the session.

Concurrency model (read this before touching Acquire)

There are three maps protected by a single mutex (p.mu):

p.sessions  map[string]Worker[C]       — live sessionID → worker bindings
p.inflight  map[string]chan struct{}    — in-progress Acquire for a session

And one lock-free channel:

p.tickets chan struct{}                — bounded concurrency tokens

The singleflight guarantee for Acquire(ctx, sessionID, config):

  1. Lock → check sessions → if found: unlock, return (FAST PATH).
  2. Lock → check inflight → if pending: grab chan, unlock, wait on it, then restart from step 1 when chan closes.
  3. Lock → create inflight[sessionID] = make(chan struct{}) → unlock.
  4. Block on <-p.tickets (or ctx cancel).
  5. Call p.factory.Spawn(ctx, sessionID, config). If error: return ticket, close inflight, return error.
  6. Lock → sessions[sessionID]=w, delete inflight[sid], close(ch) → unlock. Closing ch broadcasts to all goroutines waiting in step 2.
  7. Return &Session[C]{…}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewLocalRegistry

func NewLocalRegistry[C any]() *localRegistry[C]

Types

type FirecrackerFactory added in v0.5.0

type FirecrackerFactory struct {
	// FirecrackerPath is the absolute host path to the firecracker binary.
	// The jailer exec's it inside the chroot; it never runs as root.
	FirecrackerPath string
	// JailerPath is the absolute host path to the jailer binary.
	JailerPath string
	// KernelImagePath is the shared host path to the guest kernel image.
	// It is hard-linked into each VM's chroot at /run/vmlinux before boot.
	KernelImagePath string
	// GuestAgentPath is the host path to the static herd-guest-agent binary.
	GuestAgentPath string

	Storage     *storage.Manager
	IPAM        *network.IPAM
	PortManager *network.PortManager

	// UIDPool is the per-VM UID/GID allocator. Each Spawn leases a unique UID
	// from the pool; Close returns it. This guarantees every concurrent microVM
	// runs in a distinct DAC security domain, preventing lateral movement
	// between tenants on the same host.
	UIDPool *uid.Pool
	// JailerChrootBaseDir is the root under which the jailer creates per-VM
	// chroot directories: <JailerChrootBaseDir>/firecracker/<vmID>/root/
	JailerChrootBaseDir string
}

FirecrackerFactory is a WorkerFactory that spawns Firecracker VMs via the jailer binary for secure, unprivileged isolation.

func (*FirecrackerFactory) Spawn added in v0.5.0

func (f *FirecrackerFactory) Spawn(ctx context.Context, sessionID string, config TenantConfig) (worker Worker[*http.Client], err error)

Spawn starts a new Firecracker VM under the jailer with a unique leased UID.

func (*FirecrackerFactory) WarmImage added in v0.5.0

func (f *FirecrackerFactory) WarmImage(ctx context.Context, imageRef string) error

WarmImage ensures an image is cached without starting a VM.

type FirecrackerWorker added in v0.5.0

type FirecrackerWorker struct {
	// contains filtered or unexported fields
}

FirecrackerWorker represents a single running Firecracker VM.

func (*FirecrackerWorker) Address added in v0.5.0

func (f *FirecrackerWorker) Address() string

Address returns the HTTP base URL for the workload on the guest LAN.

func (*FirecrackerWorker) Client added in v0.5.0

func (f *FirecrackerWorker) Client() *http.Client

Client returns the HTTP client.

func (*FirecrackerWorker) Close added in v0.5.0

func (f *FirecrackerWorker) Close() error

Close kills the VM and cleans up all resources. It blocks until the Firecracker process has fully exited before tearing down storage and the chroot, preventing "device busy" errors. The leased UID is returned to the pool after process exit so it cannot be reused while the old process is still alive.

func (*FirecrackerWorker) GuestIP added in v0.5.0

func (f *FirecrackerWorker) GuestIP() string

GuestIP returns the internal IP allocated to the worker.

func (*FirecrackerWorker) Healthy added in v0.5.0

func (f *FirecrackerWorker) Healthy(ctx context.Context) error

Healthy checks if the VM is up by verifying the process hasn't exited.

func (*FirecrackerWorker) ID added in v0.5.0

func (f *FirecrackerWorker) ID() string

ID returns the worker ID.

func (*FirecrackerWorker) OnCrash added in v0.5.0

func (f *FirecrackerWorker) OnCrash(fn func(sessionID string))

OnCrash sets a crash handler.

func (*FirecrackerWorker) PortMappings added in v0.5.5

func (f *FirecrackerWorker) PortMappings() []PortMapping

PortMappings returns the active port forwards for this worker.

func (*FirecrackerWorker) VsockUDSPath added in v0.5.0

func (f *FirecrackerWorker) VsockUDSPath() string

VsockUDSPath is the host-visible Unix socket Firecracker exposes for vsock.

type Option

type Option func(*config)

Option is a functional option for New[C].

func WithCrashHandler

func WithCrashHandler(fn func(sessionID string)) Option

WithCrashHandler registers a callback invoked when a worker's subprocess exits unexpectedly while it holds an active session.

fn receives the sessionID that was lost. Use it to:

  • Delete session-specific state in your database
  • Return an error to the end user ("your session was interrupted")
  • Trigger a re-run of the failed job

fn is called from a background monitor goroutine. It must not block for extended periods; spawn a goroutine if you need to do heavy work.

If WithCrashHandler is not set, crashes are only logged.

func WithMaxWorkers added in v0.5.0

func WithMaxWorkers(max int) Option

WithMaxWorkers sets the max capacity.

  • max: hard cap on concurrent workers; Acquire blocks once this limit is reached until a worker becomes available.

Panics if max < 1.

type Pool

type Pool[C any] struct {
	// contains filtered or unexported fields
}

func New

func New[C any](factory WorkerFactory[C], opts ...Option) (*Pool[C], error)

func (*Pool[C]) Acquire

func (p *Pool[C]) Acquire(ctx context.Context, sessionID string, config TenantConfig) (*Session[C], error)

func (*Pool[C]) Factory added in v0.5.0

func (p *Pool[C]) Factory() WorkerFactory[C]

func (*Pool[C]) GetSession

func (p *Pool[C]) GetSession(ctx context.Context, sessionID string) (*Session[C], error)

func (*Pool[C]) KillWorker added in v0.5.0

func (p *Pool[C]) KillWorker(sessionID string, reason string) error

func (*Pool[C]) Shutdown

func (p *Pool[C]) Shutdown(ctx context.Context) error

func (*Pool[C]) Stats

func (p *Pool[C]) Stats() PoolStats

type PoolStats

type PoolStats struct {
	TotalWorkers     int
	ActiveSessions   int
	InflightAcquires int
	Node             observer.NodeStats
}

type PortMapping added in v0.5.5

type PortMapping struct {
	HostInterface string `json:"host_interface"` // e.g. "eth0", "192.168.0.1", or "0.0.0.0"
	HostPort      int    `json:"host_port"`      // 0 means assign a random available port
	GuestPort     int    `json:"guest_port"`
	Protocol      string `json:"protocol"` // "tcp" or "udp"
}

PortMapping describes a host-to-guest port forward.

type Session

type Session[C any] struct {
	ID     string
	Worker Worker[C]
	// contains filtered or unexported fields
}

func (*Session[C]) Release

func (s *Session[C]) Release()

type SessionRegistry

type SessionRegistry[C any] interface {
	// Get returns the worker pinned to sessionID.
	// Returns (nil, nil) if no session exists for this ID.
	Get(ctx context.Context, sessionID string) (Worker[C], error)

	// Put pins a worker to a sessionID.
	Put(ctx context.Context, sessionID string, w Worker[C]) error

	// Delete removes the pinning for sessionID.
	Delete(ctx context.Context, sessionID string) error

	// List returns a snapshot of all currently active sessions.
	// Primarily used for background health checks and cleanup.
	List(ctx context.Context) (map[string]Worker[C], error)

	// Len returns the number of active sessions.
	Len() int
}

SessionRegistry tracks which workers are pinned to which session IDs. In a distributed setup (Enterprise), this registry is shared across multiple nodes.

type TenantConfig added in v0.5.0

type TenantConfig struct {
	Image              string
	Command            []string
	Env                map[string]string
	IdleTimeoutSeconds int
	TTLSeconds         int
	HealthInterval     string
	PortMappings       []PortMapping
}

TenantConfig describes the on-demand container workload and its per-session limits.

type Worker

type Worker[C any] interface {
	// ID returns a stable, unique identifier for this worker (e.g. "worker-3").
	// Never reused — not even after a crash and restart.
	ID() string

	// Address returns the internal network URI the worker
	// is listening on (e.g., '127.0.0.1:54321').
	Address() string

	// Client returns the typed connection to the worker process.
	// For most users this is *http.Client; gRPC users return their stub here.
	Client() C

	// Healthy performs a liveness check against the subprocess.
	// Returns nil if the worker is accepting requests; non-nil otherwise.
	// Pool.Acquire calls this before handing a worker to a new session,
	// so a stale or crashed worker is never returned to a caller.
	Healthy(ctx context.Context) error

	// OnCrash sets a callback triggered when the worker process exits unexpectedly.
	OnCrash(func(sessionID string))

	// Close performs graceful shutdown of the worker process.
	// Called by the pool during scale-down or Pool.Shutdown.
	io.Closer
}

Worker represents one running subprocess managed by the pool.

C is the typed client the caller uses to talk to the subprocess — for example *http.Client, a gRPC connection, or a custom struct. The type parameter is constrained to "any" so the pool is fully generic.

type WorkerFactory

type WorkerFactory[C any] interface {
	// Spawn starts one new worker and blocks until it is healthy.
	// If ctx is cancelled before the worker becomes healthy, Spawn must
	// kill the process and return a non-nil error.
	Spawn(ctx context.Context, sessionID string, config TenantConfig) (Worker[C], error)

	// WarmImage ensures the requested image is present in the local cache.
	WarmImage(ctx context.Context, imageRef string) error
}

WorkerFactory knows how to spawn one worker process and return a typed Worker[C] that is ready to accept requests (i.e. Healthy returns nil).

Implement WorkerFactory to define custom spawn logic (e.g. Firecracker microVM, Docker container, remote SSH process).

Directories

Path Synopsis
cmd
herd command
test_boot command
internal
uid
Package uid provides a thread-safe, lock-free pool of unprivileged UIDs for per-MicroVM isolation.
Package uid provides a thread-safe, lock-free pool of unprivileged UIDs for per-MicroVM isolation.
Package observer provides lightweight, OS-level resource sampling.
Package observer provides lightweight, OS-level resource sampling.
Package proxy provides NewReverseProxy — the one-liner that turns a session-affine process pool into an HTTP gateway.
Package proxy provides NewReverseProxy — the one-liner that turns a session-affine process pool into an HTTP gateway.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL