Documentation
¶
Index ¶
- Constants
- Variables
- func AlternateMachine(current string) string
- func AppliedConfigChecksumPath(machineName string) string
- func AppliedConfigPath(machineName string) string
- func ComputeChecksum(data []byte) string
- func DiscoverHostDevicePaths() []string
- func ResolveOCIImage(log *slog.Logger, configImage string, nvidiaGPUAvailable bool) string
- func VerifyChecksum(data []byte, checksumPath string) error
- type Containerd
- type DownloadOverrides
- type DownloadSource
- type Kubelet
- type MachineGoalState
- type NodeStart
- type NvidiaHost
- type NvidiaLibDirMount
- type NvidiaLibMapping
- type NvidiaRuntime
- type RootFS
Constants ¶
const ( // ConfigDir is the host-level configuration directory for unbounded-kube. ConfigDir = "/etc/unbounded/kube" // AgentConfigDir is the host-level configuration directory for the // unbounded-agent. Applied config files are persisted here. AgentConfigDir = "/etc/unbounded/agent" SystemdNSpawnDir = "/etc/systemd/nspawn" SystemdSystemDir = "/etc/systemd/system" // DaemonUnit is the systemd unit name for the unbounded-agent daemon. DaemonUnit = "unbounded-agent-daemon.service" )
const ( NSpawnMachineKube1 = "kube1" NSpawnMachineKube2 = "kube2" )
NSpawn machine names used for alternating in-place upgrades.
- NSpawnMachineKube1 is the initial (default) machine name.
- NSpawnMachineKube2 will be used for the next upgrade cycle.
The pattern alternates between the two so an in-place upgrade can bring up the new machine before tearing down the old one:
kube1 ← initial kube2 ← after operation 1 kube1 ← after operation 2 …
const ( ContainerdVersion = "2.0.4" RunCVersion = "1.1.12" CNIPluginVersion = "1.5.1" ContainerdMetricsAddress = "0.0.0.0:10257" SandboxImage = "mcr.microsoft.com/oss/kubernetes/pause:3.9" CNIBinDir = "/opt/cni/bin" CNIConfigDir = "/etc/cni/net.d" // BinDir is the standard binary directory relative to the machine root. // Use filepath.Join(machineDir, BinDir) for host-side rootfs paths, or // "/"+BinDir for absolute paths inside the running machine container. BinDir = "usr/local/bin" // NvidiaHostLibDir is the base directory inside the nspawn container where // host NVIDIA library directories are bind-mounted read-only. Each unique // host directory gets a numbered subdirectory (e.g. /run/host-nvidia/0/). NvidiaHostLibDir = "/run/host-nvidia" NvidiaContainerRuntimePath = "/usr/bin/nvidia-container-runtime" NvidiaRuntimeClassName = "nvidia" NvidiaCTKPath = "/usr/bin/nvidia-ctk" // CDISpecDir is the directory where CDI specifications are stored inside the // nspawn machine. containerd reads specs from this directory when CDI is enabled. CDISpecDir = "/etc/cdi" // CDISpecFile is the path to the NVIDIA CDI specification file inside the // nspawn machine. This is generated by nvidia-ctk cdi generate. CDISpecFile = "/etc/cdi/nvidia.yaml" // Default OCI images for the nspawn rootfs when no image is explicitly // configured or set via AGENT_OCI_IMAGE. DefaultOCIImage = "ghcr.io/azure/agent-ubuntu2404:v20260427" DefaultNvidiaOCImage = "ghcr.io/azure/agent-ubuntu2404-nvidia:v20260427" SystemdUnitContainerd = "containerd.service" ContainerdConfigPath = "/etc/containerd/config.toml" ContainerdConfDropInDir = "/etc/containerd/conf.d" SystemdUnitKubelet = "kubelet.service" KubeletKubeconfigPath = "/var/lib/kubelet/kubeconfig" KubeletBootstrapKubeconfigPath = "/var/lib/kubelet/bootstrap-kubeconfig" KubeletPKIDir = "/etc/kubernetes/pki" KubeletAPIServerCACertPath = "/etc/kubernetes/pki/apiserver-client-ca.crt" KubeletServiceDropInDir = "/etc/systemd/system/kubelet.service.d" KubeletStaticPodManifestsDir = "/etc/kubernetes/manifests" )
Variables ¶
var ErrChecksumMismatch = errors.New("applied config checksum mismatch")
ErrChecksumMismatch is returned when the sidecar checksum does not match the config file content, indicating possible on-disk corruption.
Functions ¶
func AlternateMachine ¶
AlternateMachine returns the other machine name in the pair. kube1 -> kube2, kube2 -> kube1.
func AppliedConfigChecksumPath ¶
AppliedConfigChecksumPath returns the path to the SHA-256 sidecar file for the given nspawn machine's applied config, e.g. /etc/unbounded/agent/kube1-applied-config.json.sha256.
func AppliedConfigPath ¶
AppliedConfigPath returns the path to the applied config file for the given nspawn machine name, e.g. /etc/unbounded/agent/kube1-applied-config.json.
func ComputeChecksum ¶
ComputeChecksum returns the lowercase hex-encoded SHA-256 digest of data.
func DiscoverHostDevicePaths ¶
func DiscoverHostDevicePaths() []string
DiscoverHostDevicePaths probes the host for device nodes that should be bind-mounted into the nspawn container and returns their paths in a stable order so that repeated calls produce the same config output.
func ResolveOCIImage ¶
ResolveOCIImage determines the OCI image to use for the nspawn rootfs.
Priority (highest to lowest):
- configImage from the agent config
- AGENT_DISABLE_OCI_IMAGE env var (truthy value disables OCI, returns "")
- AGENT_OCI_IMAGE env var
- Built-in default selected by GPU presence
func VerifyChecksum ¶
VerifyChecksum compares the SHA-256 digest of data against the content of the sidecar checksum file at checksumPath.
Integrity assumptions:
- Each file (config and sidecar) is written atomically via renameio, so an individual file is never half-written.
- A missing sidecar is not an error: the config may have been written by an older agent version that did not produce checksums, or the sidecar write may not have completed before a crash. In this case VerifyChecksum returns nil so the caller can proceed.
- A present sidecar whose digest does not match the config content indicates on-disk corruption (e.g. bitflip). This returns ErrChecksumMismatch.
Types ¶
type Containerd ¶
type Containerd struct {
SandboxImage string
ContainerdBinPath string
RuncBinaryPath string
CNIBinDir string
CNIConfDir string
MetricsAddress string
NvidiaRuntime NvidiaRuntime
}
Containerd describes the containerd configuration goal state.
func ResolveContainerd ¶
func ResolveContainerd() Containerd
ResolveContainerd returns the containerd configuration goal state.
type DownloadOverrides ¶ added in v0.1.3
type DownloadOverrides struct {
// Kubernetes overrides the source for kubelet/kubectl/kube-proxy.
Kubernetes *DownloadSource
// Containerd overrides the source for the containerd release tarball.
Containerd *DownloadSource
// Runc overrides the source for the runc binary.
Runc *DownloadSource
// CNI overrides the source for the CNI plugins release tarball.
CNI *DownloadSource
// Crictl overrides the source for the crictl release tarball.
Crictl *DownloadSource
}
DownloadOverrides optionally overrides the upstream download sources for binaries the agent installs into the nspawn rootfs. Each field is optional; nil or zero-value fields fall back to the compiled-in defaults.
type DownloadSource ¶ added in v0.1.3
DownloadSource configures the override for a single binary download. BaseURL replaces the upstream host + path prefix; URL replaces the entire URL template. When both are unset the default template is used. Version overrides the version that would otherwise be derived from the cluster Kubernetes version or compiled-in defaults.
type Kubelet ¶
type Kubelet struct {
// KubeletBinPath is the absolute path to the kubelet binary inside the
// machine container (e.g. /usr/local/bin/kubelet).
KubeletBinPath string
// KubeletAuthInfo holds the authentication configuration for the
// kubelet. Exactly one of BootstrapToken or ExecCredential must be set.
config.KubeletAuthInfo
// APIServer is the HTTPS endpoint of the Kubernetes API server
// (e.g. "https://my-cluster.hcp.eastus.azmk8s.io:443").
APIServer string
// CACertData is the PEM-encoded CA certificate of the API server.
CACertData []byte
// ClusterDNS is the ClusterIP of the kube-dns service.
ClusterDNS string
// NodeLabels are key=value labels applied to the node at registration.
NodeLabels map[string]string
// RegisterWithTaints are taints applied to the node at registration.
// Each entry uses the kubelet format: "key=value:effect"
// (e.g. "dedicated=gpu:NoSchedule").
RegisterWithTaints []string
}
Kubelet defines the goal state for the kubelet configuration.
type MachineGoalState ¶
MachineGoalState holds the fully resolved goal state for provisioning and starting an nspawn machine. Callers use RootFS for the rootfs provisioning phases and NodeStart for the service configuration and boot phases.
func ResolveMachine ¶
func ResolveMachine(log *slog.Logger, cfg *config.AgentConfig, machineName string, downloads *DownloadOverrides) (*MachineGoalState, error)
ResolveMachine probes the host (kernel version, hostname, GPU hardware) and resolves the complete goal state for the named nspawn machine from an agent config.
type NodeStart ¶
type NodeStart struct {
// MachineName is the local systemd-nspawn machine name (e.g. "kube1").
// Used by machinectl commands and nspawn service management.
MachineName string
// KubeMachineName is the Kubernetes Machine CR name (e.g. "agent-e2e").
// This is the name that appears in the cluster and may differ from
// the local nspawn machine name.
KubeMachineName string
MachineDir string // e.g. /var/lib/machines/node
Containerd Containerd
Kubelet Kubelet
// Nvidia holds NVIDIA GPU state discovered on the host. After the nspawn
// boots, the setup-nvidia-libraries task uses LibMappings to create
// symlinks inside the container's library path pointing into the
// bind-mounted /run/host-nvidia/ directories.
Nvidia NvidiaHost
}
type NvidiaHost ¶
type NvidiaHost struct {
// GPUDevicePaths lists NVIDIA GPU device paths discovered on the host
// (e.g. /dev/nvidia0, /dev/nvidiactl, /dev/nvidia-caps/*, /dev/dri/*).
// When non-empty the nspawn configuration will bind-mount these devices
// and grant the container cgroup access to them.
GPUDevicePaths []string
// ContainerLibDir is the architecture-specific multiarch library
// directory inside the nspawn container (e.g. /usr/lib/x86_64-linux-gnu
// on amd64, /usr/lib/aarch64-linux-gnu on arm64). Symlinks to
// bind-mounted host NVIDIA libraries are created here.
ContainerLibDir string
// LibMappings contains NVIDIA userspace libraries discovered on the
// host via ldconfig -p. These are used to create symlinks inside the
// nspawn container so that the host's NVIDIA driver libraries are
// accessible.
LibMappings []NvidiaLibMapping
// LibDirMounts lists unique host directories containing NVIDIA libraries
// to be bind-mounted read-only into the nspawn container at
// /run/host-nvidia/<index>/. After boot, symlinks from the container's
// standard library path are created by the setup-nvidia-libraries task.
LibDirMounts []NvidiaLibDirMount
}
NvidiaHost aggregates all NVIDIA-related host state discovered at agent startup: GPU device paths, driver library mappings, and the derived bind-mount specifications for the nspawn container.
func ResolveNvidiaHost ¶
func ResolveNvidiaHost(arch string) (NvidiaHost, error)
ResolveNvidiaHost probes the host for NVIDIA GPU devices and driver libraries, returning a fully populated NvidiaHost. The arch parameter is a GOARCH value (e.g. "amd64", "arm64") used to select the correct multiarch library path and ldconfig filter. Returns an error for unsupported architectures. On a non-GPU host the returned struct has all nil/empty fields (except ContainerLibDir).
type NvidiaLibDirMount ¶
type NvidiaLibDirMount struct {
Index int // mount index, used by symlink creation to map libs to their container path
HostDir string // e.g. "/usr/lib/x86_64-linux-gnu"
ContainerDir string // e.g. "/run/host-nvidia/0"
}
NvidiaLibDirMount represents a read-only bind mount of a host directory containing NVIDIA libraries into the nspawn container.
type NvidiaLibMapping ¶
type NvidiaLibMapping struct {
HostPath string // e.g. "/usr/lib/x86_64-linux-gnu/libcuda.so.580.126.09"
ContainerPath string // e.g. "/run/host-nvidia/0/libcuda.so.580.126.09" — bind-mount source
LinkPath string // e.g. "/usr/lib/x86_64-linux-gnu/libcuda.so.580.126.09" — symlink in container
}
NvidiaLibMapping maps a host NVIDIA library to its corresponding paths inside the nspawn container.
type NvidiaRuntime ¶
type NvidiaRuntime struct {
Enabled bool
RuntimeClassName string
RuntimePath string
DisableSetAsDefaultRuntime bool
}
NvidiaRuntime describes the NVIDIA container runtime configuration for containerd. When Enabled is true the runtime is registered as a handler so that GPU workloads can be scheduled.
type RootFS ¶
type RootFS struct {
MachineDir string
NSpawnConfigFile string // e.g. /etc/systemd/nspawn/node.nspawn
ServiceOverrideFile string // e.g. /etc/systemd/system/systemd-nspawn@node.service.d/override.conf
HostArch string
HostKernel string // running kernel version from uname -r, e.g. "6.8.0-45-generic"
Hostname string // host hostname, written into the rootfs so the nspawn container inherits it
ContainerdVersion string
RunCVersion string
CNIPluginVersion string
KubernetesVersion string
// Downloads optionally overrides the download sources for binaries
// the agent installs into the nspawn rootfs (kubelet, containerd,
// runc, CNI plugins, crictl). Nil means upstream defaults apply.
Downloads *DownloadOverrides
// OCIImage is the fully-qualified OCI image reference (e.g.
// "ghcr.io/org/repo:tag") used to bootstrap the machine rootfs.
// The image must use OCI media types and include a platform manifest
// matching the host architecture.
OCIImage string
// Nvidia holds NVIDIA GPU state discovered on the host: device paths,
// driver library mappings, and bind-mount specifications for the nspawn
// container. Empty on non-GPU hosts.
Nvidia NvidiaHost
// HostDevicePaths lists host device node paths to be bind-mounted into
// the nspawn container (e.g. ["/dev/kvm"]). Device nodes are discovered
// at agent startup. Empty on hosts without any supported devices.
HostDevicePaths []string
}
RootFS defines the goal state of the machine root fs. This goal state produces a rootfs that is ready for running a Kubernetes node via systemd-nspawn from dir points to `.MachineDir`.