ctr

package
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2026 License: Apache-2.0 Imports: 37 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// AttachableBinaryPath is where the static sbsh binary is bind-mounted
	// read-only inside the container.
	AttachableBinaryPath = "/.kukeon/bin/sbsh"

	// AttachableTTYDir is where the per-container tty directory is
	// bind-mounted inside the container. sbsh creates and owns the socket,
	// capture, and log files inside this directory; because it is a
	// directory bind mount (not a file mount), sbsh's unlink-and-recreate
	// of the socket inode stays host-visible.
	AttachableTTYDir = "/run/kukeon/tty"

	// AttachableSocketPath is the in-container path of the sbsh terminal
	// socket. sbsh listens here; the host peer is the bind-mount source
	// directory's `socket` entry, which `kuke attach` connects to.
	AttachableSocketPath = AttachableTTYDir + "/socket"

	// AttachableCapturePath is the in-container path of the sbsh capture
	// file. Surfacing it to `kuke logs` is a follow-up.
	AttachableCapturePath = AttachableTTYDir + "/capture"

	// AttachableLogfilePath is the in-container path of the sbsh terminal
	// log file. Surfacing it to `kuke logs` is a follow-up.
	AttachableLogfilePath = AttachableTTYDir + "/log"

	// AttachableSubcommand is the sbsh entrypoint subcommand used to wrap
	// the workload's process. Hard-coded for the foundation slice; the
	// resolver in #67 will not change it.
	AttachableSubcommand = "terminal"

	// AttachableProfileFile is the basename of the sbsh TerminalProfile YAML
	// the daemon writes into the per-container tty directory when a
	// container declares a tty block. The host pre-writes the file in the
	// bind-mount source so it appears under AttachableTTYDir inside the
	// container, where sbsh resolves it via --profiles-dir + --profile.
	AttachableProfileFile = "profile.yaml"

	// AttachableProfileName is the profile.metadata.name written into the
	// generated TerminalProfile and passed to `sbsh terminal --profile`.
	AttachableProfileName = "kukeon"

	// AttachableSocketMode is the octal mode passed to `sbsh terminal
	// --socket-mode` when SocketGID is configured. 0660 = rw for owner
	// (the container's runtime uid) + rw for group (the kukeon group), no
	// world. Combined with `--socket-gid <kukeonGID>` this lets a non-root
	// member of the kukeon group on the host `connect()` to the per-
	// container sbsh control socket. Linux requires write permission on a
	// socket inode to connect — group-readable alone is not enough.
	//
	// Available since sbsh v0.10.0; older sbsh binaries reject the flag and
	// the wrapper omits it when the kukeon group GID is unset.
	AttachableSocketMode = "0660"

	// AttachableCaptureMode and AttachableLogFileMode are the octal modes
	// passed to `sbsh terminal --capture-mode` / `--log-file-mode` when the
	// corresponding GID is configured. 0640 = rw for owner (the container
	// uid) + r for group (the kukeon group), no world. 0640 — not 0660 —
	// because the host-side group only needs to tail the transcript via
	// `kuke log`; the in-container sbsh process does the writes.
	//
	// Available since sbsh v0.10.2; older sbsh binaries reject the flags
	// and the wrapper omits them when the corresponding GID is unset.
	AttachableCaptureMode = "0640"
	AttachableLogFileMode = "0640"
)

Reserved in-container paths that the Attachable wrapper claims. Documented as such in pkg/api/model/v1beta1/container.go. The binary path is configurable in spirit (see #67) but fixed for this slice.

View Source
const (
	// DefaultRootContainerImage is the image used when none is provided.
	DefaultRootContainerImage = "docker.io/library/busybox:latest"
)
View Source
const DefaultSecretsStagingDir = "/run/kukeon/secrets"

DefaultSecretsStagingDir is the host directory the daemon uses to stage file-mounted secrets before bind-mounting them into containers. The directory lives under /run so contents are ephemeral across reboots on typical deployments.

View Source
const SbshBinaryName = "sbsh"

SbshBinaryName is the basename of the static sbsh binary inside each per-arch cache directory.

View Source
const SbshCacheSubdir = "cache/sbsh"

SbshCacheSubdir is the host-side directory under the run path that holds per-arch sbsh binaries. The full layout is:

<runPath>/cache/sbsh/<arch>/sbsh

The foundation slice (#57) ships a stub: the daemon expects a single host-arch binary placed manually. The multi-arch resolver lands in #67.

Variables

View Source
var KukeonKnownSnapshotters = []string{
	"overlayfs",
	"native",
	"btrfs",
	"zfs",
	"devmapper",
	"blockfile",
}

KukeonKnownSnapshotters is the list of containerd snapshotters CleanupNamespaceResources walks when no snapshotter is specified. Stays in sync with the set of snapshotters supported by the kukeond image (overlayfs in production; native is always present as containerd's fallback). Listed in the order they will be drained — overlayfs first because it is the only one populated on a real install, but the others are tried so a host that experimented with btrfs/zfs/etc. still gets a clean uninstall instead of "namespace not empty" surfacing the day after.

Listed snapshotters that are not registered in the daemon return errors from snapshotService.Walk; cleanupSnapshotsFor handles those at WARN and keeps walking the rest.

Functions

func ConvertContainerdStatusToContainerState

func ConvertContainerdStatusToContainerState(status containerd.Status) intmodel.ContainerState

ConvertContainerdStatusToContainerState converts a containerd task status to internal ContainerState.

func DefaultRootContainerSpec

func DefaultRootContainerSpec(
	containerdID,
	cellID,
	realmID,
	spaceID,
	stackID,
	cniConfigPath string,
) intmodel.ContainerSpec

DefaultRootContainerSpec returns a minimal ContainerSpec suitable for keeping the root container alive while other workload containers are managed. containerdID is the hierarchical ID used for containerd operations. The ID field will be set to "root" (base name).

func NormalizeImageReference

func NormalizeImageReference(image string) string

NormalizeImageReference normalizes an image reference to a fully qualified format. Examples:

  • "debian:latest" -> "docker.io/library/debian:latest"
  • "alpine" -> "docker.io/library/alpine:latest"
  • "user/image:tag" -> "docker.io/user/image:tag"
  • "docker.io/library/debian:latest" -> "docker.io/library/debian:latest" (unchanged)
  • "registry.example.com/image:tag" -> "registry.example.com/image:tag" (unchanged)

func SbshCachePath added in v0.2.0

func SbshCachePath(baseRunPath, arch string) string

SbshCachePath returns the host path of the sbsh binary for the given architecture under the configured run path. Architecture is the GOARCH- style string ("amd64", "arm64") that comes from the image's ocispec.Image.Architecture, not the host's runtime.GOARCH — the cache must match the *image* it'll be injected into, since the image and the binary share the in-container ELF interpreter.

Types

type AttachableInjection added in v0.2.0

type AttachableInjection struct {
	// SbshBinaryPath is the host path of the static sbsh binary that will be
	// bind-mounted RO at AttachableBinaryPath inside the container.
	SbshBinaryPath string

	// HostTTYDir is the host path of the per-container tty directory that
	// will be bind-mounted at AttachableTTYDir inside the container. The
	// host-visible socket that `kuke attach` (#66) connects to is the
	// `socket` entry inside this directory.
	HostTTYDir string

	// UseProfile, when true, instructs the wrapper to append
	// `--profiles-dir AttachableTTYDir --profile AttachableProfileName` to
	// the sbsh terminal invocation. The runner is responsible for writing
	// the profile YAML to <HostTTYDir>/AttachableProfileFile before the
	// container starts; the wrapper itself never touches the filesystem.
	UseProfile bool

	// SocketGID, when non-zero, is the numeric GID of the kukeon system
	// group on the host. The wrapper emits `sbsh terminal --socket-mode
	// AttachableSocketMode --socket-gid <SocketGID>` so the per-container
	// control socket is created with mode 0660 owned by the kukeon group,
	// matching the group-traversal layout on the parent tty/ directory and
	// `/opt/kukeon`. Zero (the default) preserves sbsh's hard-coded 0o600
	// owner-only behavior — the legacy contract for callers that have no
	// kukeon group configured. Requires sbsh v0.10.0 or later inside the
	// container; the staged binary at /.kukeon/bin/sbsh must support the
	// `--socket-mode` and `--socket-gid` flags.
	SocketGID int

	// CaptureGID and LogFileGID, when non-zero, are the numeric GIDs of
	// the kukeon system group on the host. The wrapper emits `sbsh terminal
	// --capture-mode AttachableCaptureMode --capture-gid <CaptureGID>` and
	// `--log-file-mode AttachableLogFileMode --log-file-gid <LogFileGID>`
	// so the per-container capture transcript and log file are created
	// with mode 0640 owned by the kukeon group, matching the socket and
	// parent tty/ layout — without these flags sbsh's default 0o600
	// owner-only files are unreadable to non-root kukeon-group operators
	// invoking `kuke log` from the host.
	//
	// In practice both will be the same GID (the kukeon group GID) for
	// now, but separate fields keep the API parallel to SocketGID and
	// leave room for divergent group choices later. Zero (the default)
	// preserves sbsh's hard-coded 0o600 owner-only behavior. Requires
	// sbsh v0.10.2 or later inside the container; the staged binary at
	// /.kukeon/bin/sbsh must support the `--capture-mode`, `--capture-gid`,
	// `--log-file-mode`, and `--log-file-gid` flags.
	CaptureGID int
	LogFileGID int
}

AttachableInjection carries the host-side paths needed to wrap a container's OCI spec so it runs under sbsh. The caller (the daemon) computes both paths from the cell/container identity and the configured run path. Both fields are required when Attachable=true; an empty struct disables injection.

type BuildOption added in v0.2.0

type BuildOption func(*buildOpts)

BuildOption customizes BuildContainerSpec without changing its return type. Used for caller-provided values that don't live on the model spec — today just the host-side paths required when ContainerSpec.Attachable is true.

func WithAttachableInjection added in v0.2.0

func WithAttachableInjection(inj AttachableInjection) BuildOption

WithAttachableInjection configures the host-side paths used when wrapping an Attachable container. Has no effect on a spec where Attachable is false; in that case the option is silently ignored so callers can pass it unconditionally.

type CPUResources

type CPUResources struct {
	Weight *uint64
	Quota  *int64
	Period *uint64
	Cpus   string
	Mems   string
}

CPUResources maps to cpu*, cpuset* controllers.

type CgroupResources

type CgroupResources struct {
	CPU    *CPUResources
	Memory *MemoryResources
	IO     *IOResources
}

CgroupResources represents the subset of controllers we expose.

type CgroupSpec

type CgroupSpec struct {
	// Group is the target cgroup path, e.g. /kukeon/workloads/runner.
	Group string
	// Mountpoint overrides the default cgroup mount (/sys/fs/cgroup) when non-empty.
	Mountpoint string
	// Resources defines the controller knobs that should be configured for the cgroup.
	Resources CgroupResources
}

CgroupSpec describes how to create a new cgroup.

func DefaultCellSpec

func DefaultCellSpec(cell intmodel.Cell) CgroupSpec

func DefaultRealmSpec

func DefaultRealmSpec(realm intmodel.Realm) CgroupSpec

func DefaultSpaceSpec

func DefaultSpaceSpec(space intmodel.Space) CgroupSpec

func DefaultStackSpec

func DefaultStackSpec(stack intmodel.Stack) CgroupSpec

type Client

type Client interface {
	Connect() error
	Close() error

	CreateNamespace(namespace string) error
	DeleteNamespace(namespace string) error
	ListNamespaces() ([]string, error)
	GetNamespace(namespace string) (string, error)
	ExistsNamespace(namespace string) (bool, error)
	CleanupNamespaceResources(namespace, snapshotter string) error

	GetCgroupMountpoint() string
	GetCurrentCgroupPath() (string, error)
	CgroupPath(group, mountpoint string) (string, error)
	NewCgroup(spec CgroupSpec) (*cgroup2.Manager, error)
	LoadCgroup(group string, mountpoint string) (*cgroup2.Manager, error)
	DeleteCgroup(group, mountpoint string) error
	// EnsureSubtreeControllers writes "+<ctrl>" to the named group's own
	// cgroup.subtree_control AND to every ancestor up to the unified cgroup
	// mount, so the group's children inherit the controllers. The level-
	// agnostic primitive used by every realm/space/stack ensure path (issue
	// #327) and by the cell wrappers below (issues #312, #314). Filters the
	// requested set against what the host root advertises and returns the
	// effective set actually written. Idempotent — re-running on an already-
	// delegated subtree is a no-op.
	EnsureSubtreeControllers(group, mountpoint string, controllers []string) ([]string, error)
	// EnableCellSubtreeControllers enables the named cgroup-v2 controllers in
	// the cell cgroup's own subtree_control AND in every ancestor's
	// subtree_control up to the unified cgroup mount, so child cgroups (the
	// per-container task cgroups runc creates under Linux.CgroupsPath)
	// inherit the controllers and cell-level resource accounting / limits
	// become effective. Returns the effective controller set actually
	// written (the requested set filtered against the host root's
	// cgroup.controllers) so the runner can persist it on
	// CellStatus.SubtreeControllers (issue #328). Thin wrapper around
	// EnsureSubtreeControllers kept for the cell call sites' readability.
	// Issue #312.
	EnableCellSubtreeControllers(group, mountpoint string, controllers []string) ([]string, error)
	// EnableCellAllSubtreeControllers is the cell/profile=NestedCgroupRuntime
	// counterpart: it delegates the full host-available cgroup-v2 controller
	// set on the cell's subtree_control (and every ancestor's), rather than
	// the kukeon resource subset. Returns the effective controller set
	// actually written so the runner can persist it on
	// CellStatus.SubtreeControllers (issue #328). Used by cells that host
	// an inner cgroup runtime (an embedded containerd or systemd) which
	// needs to in turn delegate arbitrary controllers to its own children.
	// Issue #314.
	EnableCellAllSubtreeControllers(group, mountpoint string) ([]string, error)
	// RelocateProcessesToLeaf drains every PID currently in <group>/cgroup.procs
	// into a freshly-mkdir'd leaf cgroup at <group>/<leaf>. Used to satisfy
	// cgroup-v2's no-internal-process rule (issue #336): subtree_control
	// widening for non-thread-aware controllers (memory, io, ...) is rejected
	// by the kernel when the target cgroup hosts processes directly. The
	// leaf inherits the parent's controllers via the parent's subtree_control,
	// so resource accounting at <group> still applies — the PIDs just live
	// one level deeper. Idempotent: re-running on an already-drained group
	// is a no-op.
	RelocateProcessesToLeaf(group, mountpoint, leaf string) error
	CreateContainerFromSpec(namespace string, spec intmodel.ContainerSpec, creds []RegistryCredentials, opts ...BuildOption) (containerd.Container, error)

	CreateContainer(namespace string, spec ContainerSpec, creds []RegistryCredentials) (containerd.Container, error)
	GetContainer(namespace, id string) (containerd.Container, error)
	ListContainers(namespace string, filters ...string) ([]containerd.Container, error)
	ExistsContainer(namespace, id string) (bool, error)
	DeleteContainer(namespace, id string, opts ContainerDeleteOptions) error
	StartContainer(namespace string, spec ContainerSpec, taskSpec TaskSpec) (containerd.Task, error)
	StopContainer(namespace, id string, opts StopContainerOptions) (*containerd.ExitStatus, error)

	TaskStatus(namespace, id string) (containerd.Status, error)
	TaskMetrics(namespace, id string) (*apitypes.Metric, error)

	ResolveSbshCachePath(namespace, imageRef, baseRunPath string, creds []RegistryCredentials) (string, error)

	// ContainerProcessUID returns the resolved process.User.UID from the
	// given container's OCI runtime spec. Used after CreateContainerFromSpec
	// to chown the host-side per-container Attachable tty directory to the
	// runtime uid the container will execute as — which can be non-root
	// when the image carries a USER directive (or the cell profile sets
	// container.user). Without this, sbsh inside the container fails to
	// create its socket/log/capture files in the bind-mounted dir.
	ContainerProcessUID(namespace string, container containerd.Container) (uint32, error)

	// LoadImage imports an OCI/docker image tarball into the specified
	// containerd namespace and returns the names of the imported images.
	LoadImage(namespace string, reader io.Reader) ([]string, error)

	// ListImages enumerates images in the specified containerd namespace.
	ListImages(namespace string) ([]ImageInfo, error)

	// GetImage returns metadata for the named image ref in the specified
	// containerd namespace. Returns errdefs.ErrImageNotFound if
	// the ref is absent.
	GetImage(namespace, ref string) (ImageInfo, error)

	// DeleteImage removes the named image ref from the specified
	// containerd namespace. Returns errdefs.ErrImageNotFound if the ref
	// is absent so callers can distinguish missing from operational
	// failures.
	DeleteImage(namespace, ref string) error
}

func NewClient

func NewClient(ctx context.Context, logger *slog.Logger, socket string) Client

type ContainerDeleteOptions

type ContainerDeleteOptions struct {
	// SnapshotCleanup indicates whether to clean up snapshots.
	SnapshotCleanup bool
}

ContainerDeleteOptions describes options for deleting a container.

type ContainerRuntime

type ContainerRuntime struct {
	// Name is the runtime name (e.g., "io.containerd.runc.v2").
	Name string
	// Options are runtime-specific options.
	Options interface{}
}

ContainerRuntime describes the runtime configuration.

type ContainerSpec

type ContainerSpec struct {
	// ID is the unique identifier for the container.
	ID string
	// Image is the image reference to use for the container.
	Image string
	// SnapshotKey is the key for the snapshot. If empty, defaults to ID.
	SnapshotKey string
	// Snapshotter is the snapshotter to use. If empty, uses default.
	Snapshotter string
	// Runtime is the runtime configuration.
	Runtime *ContainerRuntime
	// SpecOpts are OCI spec options to apply.
	SpecOpts []oci.SpecOpts
	// Labels are key-value pairs to attach to the container.
	Labels map[string]string
	// CNIConfigPath is the path to the CNI configuration to use for this container.
	CNIConfigPath string
}

ContainerSpec describes how to create a new container.

func BuildContainerSpec

func BuildContainerSpec(
	containerSpec intmodel.ContainerSpec,
	options ...BuildOption,
) ContainerSpec

BuildContainerSpec converts an internal ContainerSpec to ctr.ContainerSpec with the expected defaults applied. Uses ContainerdID if available, otherwise falls back to ID.

func BuildRootContainerSpec

func BuildRootContainerSpec(
	rootSpec intmodel.ContainerSpec,
	labels map[string]string,
) ContainerSpec

BuildRootContainerSpec converts the internal root container spec into an internal ctr.ContainerSpec with the expected defaults applied. Uses ContainerdID if available, otherwise falls back to ID.

func JoinContainerNamespaces

func JoinContainerNamespaces(spec ContainerSpec, ns NamespacePaths) ContainerSpec

JoinContainerNamespaces returns a copy of spec with namespace spec options applied.

type IOResources

type IOResources struct {
	Weight   uint16
	Throttle []IOThrottleEntry
}

IOResources exposes IO weight + throttling.

type IOThrottleEntry

type IOThrottleEntry struct {
	Type  IOThrottleType
	Major int64
	Minor int64
	Rate  uint64
}

IOThrottleEntry represents a single io.max entry.

type IOThrottleType

type IOThrottleType string

IOThrottleType identifies the throttle file to target.

type ImageInfo added in v0.3.0

type ImageInfo struct {
	Name      string
	Size      int64
	CreatedAt time.Time
	Digest    string
	MediaType string
	Labels    map[string]string
}

ImageInfo is the ctr-layer view of a containerd image. The fields are the common subset surfaced to operators by `kuke image get`; downstream layers re-encode this onto their own wire types so the ctr package does not leak into pkg/api.

type MemoryResources

type MemoryResources struct {
	Min  *int64
	Max  *int64
	Low  *int64
	High *int64
	Swap *int64
}

MemoryResources maps to memory controller knobs.

type NamespacePaths

type NamespacePaths struct {
	Net string
	IPC string
	UTS string
	PID string
}

NamespacePaths describes the namespace file paths a container should join.

type RegistryCredentials

type RegistryCredentials struct {
	// Username is the registry username.
	Username string
	// Password is the registry password or token.
	Password string
	// ServerAddress is the registry server address (e.g., "docker.io", "registry.example.com").
	// If empty, credentials apply to the registry extracted from the image reference.
	ServerAddress string
}

RegistryCredentials contains authentication information for a container registry. This type matches the modelhub RegistryCredentials structure for use in the ctr package.

func ConvertRealmCredentials

func ConvertRealmCredentials(creds []intmodel.RegistryCredentials) []RegistryCredentials

ConvertRealmCredentials converts modelhub RegistryCredentials slice to ctr RegistryCredentials slice.

type StopContainerOptions

type StopContainerOptions struct {
	// Signal is the signal to send (defaults to SIGTERM).
	Signal string
	// Timeout is the timeout for graceful shutdown.
	Timeout *time.Duration
	// Force indicates whether to force kill if timeout is exceeded.
	Force bool
}

StopContainerOptions describes options for stopping a container.

type TaskIO

type TaskIO struct {
	// Stdin is the path to stdin (if any).
	Stdin string
	// Stdout is the path to stdout (if any).
	Stdout string
	// Stderr is the path to stderr (if any).
	Stderr string
	// Terminal indicates if the task should have a TTY.
	Terminal bool
	// LogFilePath, when set, makes the runtime shim write the task's
	// stdout and stderr to this host path via cio.LogFile. The shim
	// opens the file in append mode; if no file exists yet it is
	// created. Mutually exclusive with Terminal — log files do not
	// pair with a TTY.
	LogFilePath string
}

TaskIO describes the IO configuration for a task.

type TaskSpec

type TaskSpec struct {
	// IO is the IO configuration for the task.
	IO *TaskIO
	// Options are task creation options.
	Options []containerd.NewTaskOpts
}

TaskSpec describes how to create a new task.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL