worker

package
v0.1.6-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2026 License: AGPL-3.0 Imports: 49 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DefaultConfigPath

func DefaultConfigPath() (string, error)

DefaultConfigPath returns ~/.config/hpcc/worker.toml on Unix and the platform equivalent elsewhere.

Types

type Config

type Config struct {
	Listen        string        `toml:"listen"`         // gRPC listen addr for incoming Compile RPCs
	MetricsListen string        `toml:"metrics_listen"` // optional: HTTP /metrics scrape addr (e.g. ":9192"). Empty disables.
	WorkerID      string        `toml:"worker_id"`      // empty → auto-generate at startup
	PublicAddr    string        `toml:"public_addr"`    // address advertised to the scheduler; clients dial this
	Paranoid      bool          `toml:"paranoid"`       // mirror of scheduler-side paranoid mode (§4.13)
	TLS           TLSConfig     `toml:"tls"`
	Scheduler     SchedulerLink `toml:"scheduler"`
	Runtime       RuntimeConfig `toml:"runtime"`
	VM            VMConfig      `toml:"vm"`
	Pool          PoolConfig    `toml:"pool"`
	Image         ImageConfig   `toml:"image"`

	// Caches is the worker-side cache backends. In paranoid mode (§4.13)
	// the worker is the only process that reads/writes the cache; in the
	// default mode this can be empty and clients carry their own caches.
	// Schema is shared with the client config (config.CacheConfig).
	Caches []config.CacheConfig `toml:"cache"`
}

func DefaultConfig

func DefaultConfig() Config

func LoadConfig

func LoadConfig(path string) (Config, error)

func (*Config) ResolveSecrets

func (c *Config) ResolveSecrets(ctx context.Context) error

ResolveSecrets dereferences any URI-prefixed values in the config against the default secret.Resolver. Run after LoadConfig and before Validate so length checks see the resolved bytes, not the URI.

func (Config) Validate

func (c Config) Validate() error

type FirecrackerConfig

type FirecrackerConfig struct {
	FirecrackerBin string `toml:"firecracker_bin"`
	JailerBin      string `toml:"jailer_bin"`
	KernelImage    string `toml:"kernel_image"`
	RootfsDir      string `toml:"rootfs_dir"`
	RunDir         string `toml:"run_dir"`
	UID            int    `toml:"uid"`
	GID            int    `toml:"gid"`
	BootArgs       string `toml:"boot_args"`
}

FirecrackerConfig is the raw-Firecracker runtime's host-side knobs. Unused unless runtime.handler == "firecracker".

FirecrackerBin and JailerBin are the host paths to the upstream firecracker and jailer executables. KernelImage is the vmlinux that every microVM boots — hpcc owns the kernel, not the user image (plan §4.3). RootfsDir mirrors the rootfs.Store CacheDir so the runtime can resolve a prepared "<algo>-<hex>.ext4" by digest. RunDir is jailer's --chroot-base-dir; one chroot per VM lands underneath as <RunDir>/firecracker/<vm-id>/root/. UID/GID are the non-root credentials jailer drops to before exec'ing firecracker. BootArgs overrides the default kernel cmdline (sane defaults boot the rootfs read-only with the in-VM hpcc-agent as PID 1).

type HcsshimConfig

type HcsshimConfig struct {
	Address     string `toml:"address"`
	Namespace   string `toml:"namespace"`
	RunDir      string `toml:"run_dir"`
	Runtime     string `toml:"runtime"`
	Snapshotter string `toml:"snapshotter"`
	Isolation   string `toml:"isolation"`
}

HcsshimConfig is the containerd + hcsshim runtime's host-side knobs. Unused unless runtime.handler == "runhcs-wcow-hypervisor".

Address is the containerd UDS / named-pipe address (Windows default: "\\\\.\\pipe\\containerd-containerd"). Namespace is the containerd namespace hpcc keeps its prepared images and containers under; isolation between hpcc state and anything else on the host is containerd-namespace-level. RunDir is hpcc's per-container scratch root — each Start allocates <RunDir>/<container-id>/{src,out} as the host backing for the per-Exec mounts the runtime injects at C:\src and C:\out (§4.1.1, "stage source onto a local volume the container mounts"). Runtime overrides the OCI runtime name containerd selects; the default ("io.containerd.runhcs.v1") matches the Hyper-V-isolated Windows containers shim. Snapshotter selects the containerd snapshotter that materializes the prepared image's rootfs; the default ("windows") is the standard WCOW snapshotter.

Isolation selects the container isolation mode the runhcs shim applies. The production value is "hyperv" — each container is a separate Hyper-V utility VM, which is the kernel boundary the regulated-enterprise story (§4.1) depends on. "process" runs the container in a Windows Server silo on the host kernel; it loses the security boundary and is only valid for environments that cannot nest virtualization (GitHub Actions hosted runners, dev laptops without Hyper-V, etc.). Empty falls back to "hyperv".

type ImageConfig

type ImageConfig struct {
	PauseLinuxAmd64   string `toml:"pause_linux_amd64"`
	PauseLinuxArm64   string `toml:"pause_linux_arm64"`
	PauseWindowsAmd64 string `toml:"pause_windows_amd64"`
	// AgentWindowsAmd64 is the host path to hpcc-agent.exe used when
	// runtime.handler = runhcs-wcow-hypervisor AND
	// runtime.hcsshim.isolation = "hyperv". The agent is bind-mounted
	// into the container alongside pause.exe (under C:\.hpcc) and
	// becomes the OCI entrypoint, so the host can stream compile
	// inputs/outputs over an HvSocket gRPC channel instead of
	// copyTree-ing every Exec. Required only for the Hyper-V isolated
	// path; process-isolation containers stick with pause.exe + the
	// Task.Exec copy-in/copy-out flow (see HcsshimOptions.Isolation
	// for the why).
	AgentWindowsAmd64 string `toml:"agent_windows_amd64"`
	// AgentLinuxAmd64 / AgentLinuxArm64 are the host paths to the
	// per-arch hpcc-agent binaries the rootfs.Store injects at
	// /.hpcc/agent inside each prepared squashfs image. Required when
	// runtime.handler = firecracker — one path per arch the worker
	// will pull images for. Empty for an arch the worker never pulls
	// for is fine; PullImage fails fast at request time rather than
	// at startup.
	AgentLinuxAmd64   string   `toml:"agent_linux_amd64"`
	AgentLinuxArm64   string   `toml:"agent_linux_arm64"`
	AdvertisedDigests []string `toml:"advertised_digests"`
	IdleTimeout       string   `toml:"idle_timeout"` // e.g. "24h"; empty disables eviction
}

ImageConfig points at pre-built pause binaries on disk. Empty paths fall back to the binaries embedded in the worker (built from /pause).

AdvertisedDigests is a static list of image digests the worker reports in RegisterWorker / Heartbeat in addition to whatever the real ImageStore knows about. Useful when the runtime is the dev-mode "really_really_dangerous" handler (no containerd, no image store) or when an operator wants to manually pin which toolchains this worker accepts. AdvertisedDigests are exempt from idle-image eviction.

IdleTimeout is the maximum time an image entry may sit in the local catalogue without being used by a Compile before the eviction loop drops it (untagging the prepared image in the backing image store — containerd on Windows, hpcc's rootfs cache on Linux — and freeing its blobs at the next GC). Empty disables eviction.

func (ImageConfig) IdleTimeoutDur

func (i ImageConfig) IdleTimeoutDur() (time.Duration, error)

IdleTimeoutDur parses Image.IdleTimeout. An empty string returns 0 (eviction disabled).

type PoolConfig

type PoolConfig struct {
	MaxActive int `toml:"max_active"` // upper bound on concurrent per-tenant VMs
}

type RuntimeConfig

type RuntimeConfig struct {
	// Handler selects the worker runtime backend.
	//   "firecracker"             — raw Firecracker driver (Linux).
	//   "runhcs-wcow-hypervisor"  — containerd + hcsshim Hyper-V (Windows).
	//   "really_really_dangerous" — host-exec; dev only.
	Handler string `toml:"handler"`

	Firecracker FirecrackerConfig `toml:"firecracker"`
	Hcsshim     HcsshimConfig     `toml:"hcsshim"`
}
type SchedulerLink struct {
	URL         string `toml:"url"`          // e.g. "scheduler.internal:9091"
	WorkerToken string `toml:"worker_token"` // shared static token; matches scheduler.auth.worker_token
	CAFile      string `toml:"ca_file"`      // optional: pin the scheduler's CA cert
}

type TLSConfig

type TLSConfig struct {
	CertFile string `toml:"cert_file"`
	KeyFile  string `toml:"key_file"`
	CertRef  string `toml:"cert_ref"`
	KeyRef   string `toml:"key_ref"`
}

TLSConfig points the gRPC server at a serving certificate. The cert and key may live on disk (CertFile/KeyFile) or in a secret store referenced by URI (CertRef/KeyRef). Exactly one form per material — see internal/secret for supported schemes.

type VMConfig

type VMConfig struct {
	Memory         string `toml:"memory"` // e.g. "2GB"
	VCPUs          int32  `toml:"vcpus"`
	IdleTimeout    string `toml:"idle_timeout"`    // e.g. "10m"
	SessionTimeout string `toml:"session_timeout"` // e.g. "8h"
}

func (VMConfig) IdleTimeoutDur

func (v VMConfig) IdleTimeoutDur() (time.Duration, error)

func (VMConfig) MemoryBytes

func (v VMConfig) MemoryBytes() int64

func (VMConfig) SessionTimeoutDur

func (v VMConfig) SessionTimeoutDur() (time.Duration, error)

type Worker

type Worker struct {
	Config Config
	// Images is the local image catalog: digest → *imageEntry. The
	// presence of an entry means "the prepared image for this digest
	// is locally available." Pulls add entries on success; advertised
	// and previously-prepared digests pre-populate at bootstrap.
	Images sync.Map
	// ImageStore prepares per-tenant images for the runtime. nil means
	// "no image store" — the dev-mode dangerous runtime takes that
	// path: ensureImage records every digest as locally-present
	// without I/O, and the eviction loop only drops catalogue
	// entries. Real deployments wire either a cdimage.Store
	// (containerd, Windows under hcsshim) or a rootfs.Store
	// (Linux under raw Firecracker).
	ImageStore image.Store
	Containers sync.Map // containerID -> runtime.Container

	// CertPEM is the serving certificate's PEM bytes. cmd/worker sets
	// this after secret.LoadTLSCertificate resolves cert_file or
	// cert_ref. bootstrap falls back to reading cfg.TLS.CertFile when
	// nil, which keeps tests that construct Worker directly with a
	// cert path on disk working.
	CertPEM []byte

	gen.UnimplementedWorkerServiceServer
	// contains filtered or unexported fields
}

func NewDefaultWorker

func NewDefaultWorker() *Worker

func NewWorker

func NewWorker(cfg Config) (*Worker, error)

func (*Worker) BeginCompile

func (w *Worker) BeginCompile()

BeginCompile / EndCompile are the hooks the Compile RPC handler uses to keep `inflight` in sync; defer EndCompile right after BeginCompile to avoid drift on early returns.

func (*Worker) Compile

func (w *Worker) Compile(ctx context.Context, req *gen.CompileRequest) (*gen.CompileResponse, error)

func (*Worker) EndCompile

func (w *Worker) EndCompile()

func (*Worker) FindMissingBlobs

func (w *Worker) FindMissingBlobs(stream gen.WorkerService_FindMissingBlobsServer) error

FindMissingBlobs is the worker's CAS-mode "what do I need?" probe. Client streams the digests it intends to upload; worker streams back the subset that aren't already present locally. 1:1 bidi so the client can pipeline this with UploadBlobs without waiting for the full probe set.

The lookup is a single layer (local-disk Has). There is no in-memory confirmed-set and no S3 probing — source blobs are worker-local-ephemeral by design (docs/plan/cas.md Step 3/4), so nothing further to consult. If profiling later shows os.Stat overhead dominating, an LRU in front of Has is the place to add one.

func (*Worker) Inflight

func (w *Worker) Inflight() int32

Inflight returns the current count of in-flight Compile RPCs. Exposed for the metrics observable gauge; capacitySnapshot wraps the same counter but also derives the available-vCPU number, which isn't what a gauge wants.

func (*Worker) PooledRuntime

func (w *Worker) PooledRuntime() *runtime.PooledRuntime

PooledRuntime returns the per-tenant container pool wrapping the underlying runtime, or nil if no pool is in use (which happens for the dev-only "really_really_dangerous" handler and any future runtime that doesn't get wrapped in NewPooledRuntime). Exposed for the metrics observable that breaks active container count down by tenant.

func (*Worker) ProbeCompileCache

func (w *Worker) ProbeCompileCache(ctx context.Context, req *gen.CompileProbe) (*gen.ProbeResponse, error)

ProbeCompileCache is the cache-short-circuit for CAS-mode dispatch. Client sends just the manifest digest + args + image; the worker derives the compile cache key (same formula the worker uses when it stores a fresh compile) and probes the compile cache. Hit returns the cached artifact; miss tells the client to proceed with the upload dance. Mirrors Bazel's ActionCache.GetActionResult.

Paranoid-mode invariant. Client sends `args`, not a pre-baked flag-key. The worker re-parses and computes flag-key itself, so the client cannot influence what cache_key the worker probes. A fake manifest_digest produces a wasted probe (miss → client falls through to the upload dance, worker materializes for real and computes the *real* cache_key on the way out); the probe is not a write path and cannot poison cache content.

func (*Worker) Run

func (w *Worker) Run(ctx context.Context) error

Run drives the worker's scheduler-side loop: dial the scheduler, authenticate, register, then heartbeat on a ticker until ctx is cancelled. On RPC failures it re-authenticates rather than retrying the same dead session.

func (*Worker) SchedulerSigningKey

func (w *Worker) SchedulerSigningKey() []byte

SchedulerSigningKey returns the scheduler's task-JWT signing pubkey learned during authentication. Compile RPCs use it to verify the JWT the client presents in gRPC metadata. Returns nil before first auth.

func (*Worker) UploadBlobs

func (w *Worker) UploadBlobs(stream gen.WorkerService_UploadBlobsServer) error

UploadBlobs accepts a client-stream of BlobChunk messages and commits verified blobs to the worker-local source store. Each blob is framed as a header (carrying the client-claimed digest) followed by zero or more data chunks. The worker re-hashes the bytes with BLAKE3 as they arrive; on a hash match the bytes land in sourceStore under the *recomputed* digest (so the client cannot influence the key under which an entry is stored — paranoid-mode invariant). Mismatches, malformed headers, and oversize blobs are reported in UploadResult.rejected_digests without aborting the stream — the client may have other blobs to upload that should still land.

Bytes are buffered in memory per-blob (capped at maxSourceBlobBytes) rather than streamed to disk. Source files in real builds sit well under that cap; if profiling later shows RSS pressure under concurrent large-blob uploads, swap in a streaming write via a future Store.OpenWriter primitive.

No S3 write-through: source blobs are worker-local-ephemeral (docs/plan/cas.md Step 4). The client is the source of truth and re-uploads on cache miss.

func (*Worker) ValidateToken

func (w *Worker) ValidateToken(req *gen.CompileRequest) error

Directories

Path Synopsis
Package image is the worker's prepared-image abstraction.
Package image is the worker's prepared-image abstraction.
cdimage
Package cdimage is the containerd-backed implementation of image.Store.
Package cdimage is the containerd-backed implementation of image.Store.
rootfs
Package rootfs is the Linux raw-Firecracker implementation of image.Store.
Package rootfs is the Linux raw-Firecracker implementation of image.Store.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL