runtime

package
v0.1.5-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2026 License: AGPL-3.0 Imports: 34 Imported by: 0

Documentation

Index

Constants

View Source
const (
	IsolationHyperV  = "hyperv"
	IsolationProcess = "process"
)

IsolationHyperV runs each container as a separate Hyper-V utility VM (the production posture; matches the §4.1 boundary claim). IsolationProcess runs the container in a Windows Server silo on the host kernel — the boundary is namespace-level, not VM-level — and exists for environments that cannot nest virtualization (GitHub Actions hosted runners, dev laptops without Hyper-V, etc.). The runhcs shim accepts both; hpcc picks one at container-create time from HcsshimOptions.Isolation.

View Source
const HandlerFirecracker = "firecracker"

HandlerFirecracker is the config.toml runtime.handler value that selects the raw Firecracker driver. Linux-only. See docs/plan/phase-4-distributed.md §4.1 for why hpcc drives Firecracker directly instead of going through firecracker-containerd.

View Source
const HandlerHcsshim = "runhcs-wcow-hypervisor"

HandlerHcsshim is the config.toml runtime.handler value that selects the containerd + hcsshim driver. Wires the worker to a containerd daemon that owns a Windows host running the runhcs-wcow-hypervisor runtime — each per-tenant container is a Hyper-V utility VM (§4.1.1). The handler string matches the OCI runtime name the daemon dispatches to.

View Source
const HandlerReallyReallyDangerous = "really_really_dangerous"

HandlerReallyReallyDangerous is the config.toml runtime.handler value that selects DangerouslyExecOnHost. The string is intentionally awful so nobody sets it without knowing what they're doing — every compile runs as a child process of the worker, on the worker host, with the worker's full file system, network, and credentials. There is no kernel boundary, no NIC removal, no audit envelope. Development only.

Variables

This section is empty.

Functions

This section is empty.

Types

type AgentExecRequest

type AgentExecRequest struct {
	ExecID string
	Argv   []string
	Env    []string
	Cwd    string

	// SrcHostPath, if non-empty, is the host directory whose tree
	// the helper streams across as InputFile frames. Files are
	// walked in filesystem order and shipped relative to this root.
	// Empty means "no inputs"; the header is still sent and the
	// stream is half-closed immediately after.
	SrcHostPath string

	// OutHostPath, if non-empty, is where OutputFile chunks the
	// agent streams back get written. Paths inside the OutputFile
	// frames are agent-relative (forward-slash); the helper joins
	// them under this root with native separators.
	OutHostPath string

	Stdout io.Writer
	Stderr io.Writer
}

AgentExecRequest is the host-side view of one compile dispatch against an in-container hpcc-agent. Same surface as ExecRequest but agent-shaped: paths are in-container (the agent demuxes them into its own staging dir) and the transport is the AgentService bidi stream rather than Task.Exec.

Stdin is intentionally absent: AgentService.Exec doesn't carry a stdin stream today. None of the §4.5 dispatch paths need it (CAS-mode source ships as InputFile chunks; preprocessed-mode source rides in one InputFile too). If a future feature needs stdin we add a new oneof variant to ExecClientFrame rather than bolting it onto the host-side API.

type Container

type Container interface {
	ID() string
	TenantID() string
	ImageDigest() string
	State() gen.VMState

	// Exec runs one compiler invocation inside the container and blocks
	// until the process exits, ctx is cancelled, or the container dies.
	// On ctx cancellation the in-container process is killed via the
	// shim's process API and ctx.Err() is returned.
	Exec(ctx context.Context, req ExecRequest) (ExecResult, error)

	// Stop tears the container down (SIGTERM → grace → SIGKILL) and
	// reaps the VM. Idempotent; safe to call after the container has
	// already exited on its own.
	Stop(ctx context.Context) error
}

Container is one running per-tenant sandbox. One container == one VM (Firecracker microVM on Linux, Hyper-V utility VM on Windows), so this is also what the worker reports in WorkerHeartbeat.active_vms.

Methods are safe for concurrent use; Exec calls fan out as separate Task.Execs against the same underlying task.

type ContainerSpec

type ContainerSpec struct {
	ID          string // worker-unique container id, also the VM id reported in heartbeats
	TenantID    string
	ImageDigest string // prepared-image digest (output of image.Store)

	VCPUs       int32
	MemoryBytes int64
}

ContainerSpec describes one per-tenant container. The runtime translates this into a backend-native shape: a Firecracker VMM configuration on Linux, an OCI runtime spec for the containerd + hcsshim path on Windows.

Per-RPC source/output mount paths are deliberately NOT here — they live on ExecRequest instead. Containers are reusable across compiles (see PooledRuntime), so the spec captures only stable, tenant-level shape (identity, sizing); the mounts that change per compile bind at Exec time.

type DangerouslyExecOnHost

type DangerouslyExecOnHost struct{}

DangerouslyExecOnHost is a Runtime implementation that does NOT isolate compiles at all — every Exec is forked from the worker process via os/exec, sharing the worker's PID namespace, file system, network namespace, and uid. It exists solely to exercise the worker→runtime call path during development; selecting it in production puts the entire worker host inside the trust boundary.

The /src and /out "mounts" are simulated by string-substituting the in-container paths in argv with the host-side paths from ContainerSpec at exec time. From the rest of the system's perspective (worker handler, runtimeExecutor, compiler package), argv looks the same as it would for a real Firecracker run.

func (DangerouslyExecOnHost) Close

func (DangerouslyExecOnHost) Close() error

func (DangerouslyExecOnHost) Start

type ExecRequest

type ExecRequest struct {
	ExecID string // unique within the container; surfaces in backend events (vsock RPC id on Linux, containerd task id on Windows)
	Argv   []string
	Env    []string
	Cwd    string

	SrcHostPath string
	OutHostPath string

	Stdin          io.Reader // optional
	Stdout, Stderr io.Writer // optional; nil discards
}

ExecRequest is one Task.Exec — a fully-resolved toolchain invocation. No shell; argv goes straight to execve in the guest.

SrcHostPath / OutHostPath are the per-Exec bind-mount targets. The runtime is responsible for making them visible at /src and /out inside the sandbox for the lifetime of this Exec; an empty string means "no such mount." Real backends (raw Firecracker on Linux, hcsshim on Windows) attach these at Exec time so a pooled container can serve compiles for different RPC tmpdirs without restart.

type ExecResult

type ExecResult struct {
	ExitCode int
}

ExecResult is what Exec returns when the process exited cleanly under the runtime's control. A non-zero ExitCode is *not* an error — the caller decides whether a compiler failure is fatal.

type Firecracker

type Firecracker struct {
	// contains filtered or unexported fields
}

Firecracker is the raw Firecracker Runtime. It owns the per-tenant VMM lifecycle (jailer + firecracker), wiring the prepared squashfs rootfs and hpcc-supplied vmlinux into a freshly-jailed chroot and driving the Firecracker API to the InstanceStart action.

The host-side vsock agent and Exec dispatch are not wired here yet — Exec returns errFirecrackerExecNotImplemented. Boot is independently testable now so the rest of the worker plumbing (image pull, pool, runtime.Select) can be exercised end-to-end against a real microVM.

func NewFirecracker

func NewFirecracker(opts FirecrackerOptions) (*Firecracker, error)

NewFirecracker validates required paths/credentials up front so a misconfigured worker fails at startup rather than on the first Compile RPC. All path fields and UID/GID must be set; BootArgs may be empty (defaultBootArgs is used).

func (*Firecracker) Close

func (f *Firecracker) Close() error

func (*Firecracker) Start

func (f *Firecracker) Start(ctx context.Context, spec ContainerSpec) (Container, error)

type FirecrackerOptions

type FirecrackerOptions struct {
	FirecrackerBin string
	JailerBin      string
	KernelImage    string
	RootfsDir      string
	RunDir         string
	UID, GID       int
	BootArgs       string
}

FirecrackerOptions is the runtime's host-side configuration. All path fields are required; UID/GID are the non-root credentials jailer drops to before exec'ing firecracker. BootArgs is optional and falls back to defaultBootArgs.

type Hcsshim

type Hcsshim struct {
	// contains filtered or unexported fields
}

Hcsshim is the containerd + hcsshim Runtime. It owns the containerd client and the per-container host scratch dir layout; the per-tenant Hyper-V utility VM lifecycle is delegated to the runhcs shim via containerd. Compiles dispatch as one `Task.Exec` per invocation against the long-running pause binary the prepared image runs as PID 1 (§4.2).

func NewHcsshim

func NewHcsshim(opts HcsshimOptions) (*Hcsshim, error)

NewHcsshim validates the configuration and dials containerd. Empty optional fields take the documented defaults so a minimal worker.toml (handler + address + run_dir) boots without ceremony, but every required path is checked up front so a misconfigured worker fails at startup instead of on the first Compile.

func (*Hcsshim) Close

func (h *Hcsshim) Close() error

func (*Hcsshim) Start

func (h *Hcsshim) Start(ctx context.Context, spec ContainerSpec) (Container, error)

Start creates a per-tenant container under containerd, branched by isolation mode:

  • Hyper-V isolation: the container is a fresh utility VM with hpcc-agent.exe as PID 1. The host bind-mounts only the agent dir (no C:\src / C:\out) and dials the agent over HvSocket after Task.Start; subsequent Exec calls stream inputs/outputs through the agent's bidi-gRPC RPC, no host-disk staging.
  • Process isolation: the container runs in a Windows Server silo on the host kernel with pause.exe as PID 1; per-Exec compiles dispatch as Task.Exec calls under that pause and use copyTree to stage src and capture out. Hyper-V provides no security boundary in this mode, so there's no point in paying the gRPC streaming cost either — this is the CI / dev path on hosts that can't nest virtualization.

The branch is decided here; everything downstream (hcsshimContainer.Exec, .Stop) checks the agent connection's presence to pick the right path.

type HcsshimOptions

type HcsshimOptions struct {
	Address     string
	Namespace   string
	RunDir      string
	Runtime     string
	Snapshotter string
	// Isolation is IsolationHyperV (default) or IsolationProcess. Process
	// isolation is for CI / dev only — it shares the host kernel with
	// every other container and breaks the §4.1 security claim. Anything
	// else fails at NewHcsshim so a typo doesn't silently flip a worker
	// out of Hyper-V mode.
	Isolation string
	// PauseHostPath is the absolute path of hpcc-pause.exe on the
	// host. The runtime copies the file into the per-runtime mount
	// dir at NewHcsshim time. Every process-isolated container gets
	// that dir bind-mounted read-only at C:\.hpcc with pause.exe as
	// its entrypoint — the §4.1.1 process-isolation path keeps
	// Task.Exec + copyTree for staging because there's no partition
	// boundary to stream across.
	PauseHostPath string
	// AgentHostPath is the absolute path of hpcc-agent.exe on the
	// host. Same staging treatment as PauseHostPath, but bind-mounted
	// into Hyper-V isolated containers as their entrypoint. The agent
	// listens on HvSocket and the runtime dials it post-Start to
	// drive compiles via the bidi-streaming AgentService.Exec instead
	// of Task.Exec + per-Exec file copy — see plan §4.1.1 "Why not
	// VSMB" for the rationale. Required when Isolation == hyperv.
	AgentHostPath string
}

HcsshimOptions is the host-side configuration for the containerd + hcsshim runtime. Address is the containerd dial target (Windows default `\\.\pipe\containerd-containerd`); Namespace scopes hpcc's images and containers inside containerd; RunDir is where per- container host-side staging lives. Runtime/Snapshotter override the OCI runtime + snapshotter containerd selects — empty values fall back to the runhcs.v1 / windows defaults appropriate for Hyper-V isolation.

type Options

type Options struct {
	Firecracker FirecrackerOptions
	Hcsshim     HcsshimOptions
}

Options bundles backend-specific runtime knobs. Select inspects the fields whose handler was chosen and ignores the rest, so callers can populate every backend up front and let Select pick.

type PooledRuntime

type PooledRuntime struct {
	// contains filtered or unexported fields
}

PooledRuntime wraps a Runtime with a per-tenant container pool. The first Start for a (TenantID, ImageDigest) pair forwards to the inner runtime; subsequent Starts return a wrapper around an already-running container — refcounted, so up to VM.VCPUs concurrent Execs can share one VM. Releasing (wrapper.Stop) decrements the refcount; the entry stays in the pool while idle and the reaper evicts it on two clocks:

  • idleTTL — entries whose refcount has been zero for longer than this are torn down. (Entries with refs > 0 are never idle-aged.)
  • maxLifetime — entries whose original (cold-start) age exceeds this are torn down once their refcount drops to zero. This is §4.2's "hard session timeout": long-lived per-tenant state accumulates and someone will eventually ask what's in it. The ceiling is also enforced at Start time — over-aged entries are never handed back out.

A maxEntries cap evicts the oldest idle container when a new one would push the pool over the limit. Close drains everything, waiting for in-flight Execs to finish first.

Key choice — (TenantID, ImageDigest) — is what defines a reusable session in §4.4 of the design. VCPUs/MemoryBytes are worker-global (driven by cfg.VM) so they don't enter the key. Per-RPC source and output mount paths live on ExecRequest, not ContainerSpec, so they don't enter the key either — that's what makes pooling tractable.

func NewPooledRuntime

func NewPooledRuntime(inner Runtime, idleTTL, maxLifetime time.Duration, maxEntries int) *PooledRuntime

NewPooledRuntime wraps inner and starts the reaper if either timeout is enabled.

idleTTL is the maximum time an entry whose refcount has been zero may sit in the pool before it gets evicted; pass 0 to disable idle reaping. Entries with refs > 0 are never idle-aged.

maxLifetime is the hard ceiling on a container's wall-clock age, measured from its original cold start. Past this, the container is refused as a Start hit and torn down once its refcount drops to zero (or evicted by the reaper on its next tick). Pass 0 to disable.

maxEntries is the upper bound on total entries (running containers) across all keys; when a fresh Start would exceed it, the oldest idle container is evicted first. Pass 0 for unlimited.

func (*PooledRuntime) Close

func (p *PooledRuntime) Close() error

Close drains the pool — every entry's container gets a real Stop — and forwards to the inner runtime's Close. Blocks until all outstanding wrappers release (refs == 0 everywhere). After Close the pool rejects further Starts; outstanding pooledContainers stop being reference-counted and fall through to a no-op on their next Stop.

func (*PooledRuntime) EntriesByTenant

func (p *PooledRuntime) EntriesByTenant() map[string]int

EntriesByTenant returns the current count of live container entries keyed by tenant_id. Snapshot; safe to call concurrently with Start / Close. Exposed for the worker's active-containers observable gauge.

"Live entries" includes both currently-borrowed (refs > 0) and idle pooled-but-warm containers — the gauge's audience is "how many VMs is this worker actually holding open for this tenant," not "how many Exec calls are running."

func (*PooledRuntime) Start

func (p *PooledRuntime) Start(ctx context.Context, spec ContainerSpec) (Container, error)

Start returns a wrapper around either an existing entry for spec's (TenantID, ImageDigest) — refcount incremented — or, if no entry has capacity, a freshly-booted container. The caller's wrapper.Stop releases the refcount.

maxRefs on a fresh entry is taken from spec.VCPUs (clamped to >=1). All Starts for the same key carry the same VCPUs in practice (the worker derives spec.VCPUs from cfg.VM.VCPUs), so the cap is stable.

type Runtime

type Runtime interface {
	// Start launches a new per-tenant container from a prepared image
	// (pause binary already injected as PID 1) and waits until it's
	// ready to accept Execs. The returned Container is owned by the
	// caller and must be Stopped to release the underlying VM.
	Start(ctx context.Context, spec ContainerSpec) (Container, error)

	// Close releases client-level resources (containerd connection).
	// Outstanding Containers must be Stopped first.
	Close() error
}

Runtime owns per-tenant compile sandboxes. Implementations are backend-specific — a raw Firecracker driver on Linux (hpcc owns image pull, squashfs rootfs build, VMM lifecycle, and the host-side vsock channel to the in-VM agent), containerd + hcsshim (Hyper-V isolation) on Windows — but the surface is the same: start a container from a prepared image, dispatch Execs into it, stop it. Image preparation (pulling, pause/agent injection) is the image package's job and is out of scope here.

func Select

func Select(handler string, opts Options) (Runtime, error)

Select returns the Runtime implementation that matches the configured runtime.handler value. Unimplemented backends return a clear error at worker startup rather than panicking later on the first Compile.

Recognized values:

"really_really_dangerous"  — DangerouslyExecOnHost; dev only.
"firecracker"              — raw Firecracker driver (Linux).
"runhcs-wcow-hypervisor"   — containerd + hcsshim Hyper-V isolation
                             (Windows).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL