instances

package

v0.0.7 Latest Latest Go to latest Published: Mar 3, 2026 License: MIT Imports: 40 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kernel/hypeman

Links

Open Source Insights

README ¶

Instance Manager

Manages VM instance lifecycle across multiple hypervisors (Cloud Hypervisor, QEMU on Linux; vz on macOS).

Design Decisions

Why State Machine? (state.go)

What: Single-hop state transitions matching hypervisor states

Why:

Validates transitions before execution (prevents invalid operations)
Manager orchestrates multi-hop flows (e.g., Running → Paused → Standby)
Clear separation: state machine = rules, manager = orchestration

States:

Stopped - No VMM, no snapshot
Created - VMM created but not booted (CH native)
Running - VM actively running (CH native)
Paused - VM paused (CH native)
Shutdown - VM shutdown, VMM exists (CH native)
Standby - No VMM, snapshot exists (can restore)

Why Config Disk? (configdisk.go)

What: Read-only erofs disk with instance configuration

Why:

Zero modifications to OCI images (images used as-is)
Config injected at boot time (not baked into image)
Efficient (compressed erofs, ~few KB)
Contains: entrypoint, cmd, env vars, workdir

Filesystem Layout (storage.go)

/var/lib/hypeman/
  guests/
    {instance-id}/              # ULID-based ID
      metadata.json             # State, versions, timestamps
      overlay.raw               # 50GB sparse writable overlay
      config.erofs              # Compressed config disk
      ch.sock                   # Hypervisor API socket (abbreviated for SUN_LEN limit)
      logs/
        app.log                 # Guest application log (serial console output)
        vmm.log                 # Hypervisor log (stdout+stderr)
        hypeman.log             # Hypeman operations log
      snapshots/
        snapshot-latest/        # Snapshot directory
          config.json           # VM configuration
          memory-ranges         # Memory state

Benefits:

Content-addressable IDs (ULID = time-ordered)
Self-contained: all instance data in one directory
Easy cleanup: delete directory = full cleanup
Sparse overlays: only store diffs from base image

Multi-Hop Orchestrations (manager.go)

Manager orchestrates multiple single-hop state transitions:

CreateInstance:

Stopped → Created → Running
1. Start VMM process
2. Create VM config
3. Boot VM
4. Expand memory (if hotplug configured)

StandbyInstance:

Running → Paused → Standby
1. Reduce memory (virtio-mem hotplug)
2. Pause VM
3. Create snapshot
4. Stop VMM

RestoreInstance:

Standby → Paused → Running
1. Start VMM
2. Restore from snapshot
3. Resume VM

DeleteInstance:

Any State → Stopped
1. Stop VMM (if running)
2. Delete all instance data

Snapshot Optimization (standby.go, restore.go)

Reduce snapshot size:

Memory hotplug: Reduce to base size before snapshot (virtio-mem)
Sparse overlays: Only store diffs from base image

Fast restore:

Don't prefault pages (lazy loading)
Parallel with TAP device setup

Reference Handling

Instances use OCI image references directly:

req := CreateInstanceRequest{
    Image: "docker.io/library/alpine:latest",  // OCI reference
}
// Validates image exists and is ready via image manager

Testing

Tests focus on testable components:

# State machine (pure logic, no VM needed)
TestStateTransitions - validates all transition rules

# Storage operations (filesystem only, no VM needed)
TestStorageOperations - metadata persistence, directory cleanup

# Full integration (requires kernel/initrd)
# Skipped by default, needs system files from system manager

Dependencies

lib/images - Image manager for OCI image validation
lib/system - System manager for kernel/initrd files
lib/hypervisor - Hypervisor abstraction for VM operations
System tools: mkfs.erofs, cpio, gzip (Linux); mkfs.ext4 (macOS)

Documentation ¶

Rendered for

Index ¶

Constants
Variables
func NewLivenessChecker(m Manager) devices.InstanceLivenessChecker
func WaitForProcessExit(pid int, timeout time.Duration) bool
type AttachVolumeRequest
type CreateInstanceRequest
type ForkInstanceRequest
type GPUConfig
type HostTopology
type IngressResolver
- func NewIngressResolver(manager Manager) *IngressResolver
- func (r *IngressResolver) InstanceExists(ctx context.Context, nameOrID string) (bool, error)
- func (r *IngressResolver) ResolveInstance(ctx context.Context, nameOrID string) (string, string, error)
- func (r *IngressResolver) ResolveInstanceIP(ctx context.Context, nameOrID string) (string, error)
type Instance
- func (i *Instance) GetHypervisorType() string
type ListInstancesFilter
- func (f *ListInstancesFilter) Matches(inst *Instance) bool
type LogSource
type Manager
- func NewManager(p *paths.Paths, imageManager images.Manager, systemManager system.Manager, ...) Manager
type Metrics
type ResourceLimits
type ResourceValidator
type StartInstanceRequest
type State
- func (s State) CanTransitionTo(target State) error
- func (s State) IsTerminal() bool
- func (s State) RequiresVMM() bool
- func (s State) String() string
type StoredMetadata
type VolumeAttachment

Constants ¶

View Source

const DefaultStopTimeout = 5

DefaultStopTimeout is the default grace period for graceful shutdown (seconds).

View Source

const (
	// MaxVolumesPerInstance is the maximum number of volumes that can be attached
	// to a single instance. This limit exists because volume devices are named
	// /dev/vdd, /dev/vde, ... /dev/vdz (letters d-z = 23 devices).
	// Devices a-c are reserved for rootfs, overlay, and config disk.
	MaxVolumesPerInstance = 23
)

Variables ¶

View Source

var (
	// ErrNotFound is returned when an instance is not found
	ErrNotFound = errors.New("instance not found")

	// ErrInvalidState is returned when a state transition is not valid
	ErrInvalidState = errors.New("invalid state transition")

	// ErrInvalidRequest is returned when request validation fails
	ErrInvalidRequest = errors.New("invalid request")

	// ErrAlreadyExists is returned when creating an instance that already exists
	ErrAlreadyExists = errors.New("instance already exists")

	// ErrImageNotReady is returned when the image is not ready for use
	ErrImageNotReady = errors.New("image not ready")

	// ErrAmbiguousName is returned when multiple instances have the same name
	ErrAmbiguousName = errors.New("multiple instances with the same name")

	// ErrInsufficientResources is returned when resources (CPU, memory, network, GPU) are not available
	ErrInsufficientResources = errors.New("insufficient resources")

	// ErrNotSupported is returned when an operation is not supported for the instance hypervisor
	ErrNotSupported = errors.New("operation not supported")
)

View Source

var ErrLogNotFound = fmt.Errorf("log file not found")

ErrLogNotFound is returned when the requested log file doesn't exist

View Source

var ErrTailNotFound = fmt.Errorf("tail command not found: required for log streaming")

ErrTailNotFound is returned when the tail command is not available

View Source

var ValidTransitions = map[State][]State{

	StateCreated: {
		StateRunning,
		StateShutdown,
	},
	StateRunning: {
		StatePaused,
		StateShutdown,
	},
	StatePaused: {
		StateRunning,
		StateShutdown,
		StateStandby,
	},
	StateShutdown: {
		StateRunning,
		StateStopped,
	},

	StateStopped: {
		StateCreated,
	},
	StateStandby: {
		StatePaused,
		StateStopped,
	},

	StateUnknown: {},
}

ValidTransitions defines allowed single-hop state transitions Based on Cloud Hypervisor's actual state machine plus our additions

Functions ¶

func NewLivenessChecker ¶

func NewLivenessChecker(m Manager) devices.InstanceLivenessChecker

NewLivenessChecker creates a new InstanceLivenessChecker that wraps the instances manager. This adapter allows the devices package to query instance state without a circular import.

func WaitForProcessExit ¶

func WaitForProcessExit(pid int, timeout time.Duration) bool

WaitForProcessExit polls for a process to exit, returns true if exited within timeout. Exported for use in tests.

Types ¶

type AttachVolumeRequest ¶

type AttachVolumeRequest struct {
	MountPath string
	Readonly  bool
}

AttachVolumeRequest is the domain request for attaching a volume (used for API compatibility)

type CreateInstanceRequest ¶

type CreateInstanceRequest struct {
	Name                     string             // Required
	Image                    string             // Required: OCI reference
	Size                     int64              // Base memory in bytes (default: 1GB)
	HotplugSize              int64              // Hotplug memory in bytes (default: 0, set explicitly to enable)
	OverlaySize              int64              // Overlay disk size in bytes (default: 10GB)
	Vcpus                    int                // Default 2
	NetworkBandwidthDownload int64              // Download rate limit bytes/sec (0 = auto, proportional to CPU)
	NetworkBandwidthUpload   int64              // Upload rate limit bytes/sec (0 = auto, proportional to CPU)
	DiskIOBps                int64              // Disk I/O rate limit bytes/sec (0 = auto, proportional to CPU)
	Env                      map[string]string  // Optional environment variables
	Metadata                 map[string]string  // Optional user-defined key-value metadata
	NetworkEnabled           bool               // Whether to enable networking (uses default network)
	Devices                  []string           // Device IDs or names to attach (GPU passthrough)
	Volumes                  []VolumeAttachment // Volumes to attach at creation time
	Hypervisor               hypervisor.Type    // Optional: hypervisor type (defaults to config)
	GPU                      *GPUConfig         // Optional: vGPU configuration
	Entrypoint               []string           // Override image entrypoint (nil = use image default)
	Cmd                      []string           // Override image cmd (nil = use image default)
	SkipKernelHeaders        bool               // Skip kernel headers installation (disables DKMS)
	SkipGuestAgent           bool               // Skip guest-agent installation (disables exec/stat API)
}

CreateInstanceRequest is the domain request for creating an instance

type ForkInstanceRequest ¶ added in v0.0.7

type ForkInstanceRequest struct {
	Name        string // Required: name for the new forked instance
	FromRunning bool   // Optional: allow forking from Running by auto standby/fork/restore
	TargetState State  // Optional: desired final state of forked instance (Stopped, Standby, Running). Empty means inherit source state.
}

ForkInstanceRequest is the domain request for forking an instance.

type GPUConfig ¶ added in v0.0.5

type GPUConfig struct {
	Profile string // vGPU profile name (e.g., "L40S-1Q")
}

GPUConfig contains GPU configuration for instance creation

type HostTopology ¶

type HostTopology struct {
	ThreadsPerCore int
	CoresPerSocket int
	Sockets        int
}

HostTopology represents the CPU topology of the host machine

type IngressResolver ¶

type IngressResolver struct {
	// contains filtered or unexported fields
}

IngressResolver provides instance resolution for the ingress package. It implements ingress.InstanceResolver interface without importing the ingress package to avoid import cycles.

func NewIngressResolver ¶

func NewIngressResolver(manager Manager) *IngressResolver

NewIngressResolver creates a new IngressResolver that wraps an instance manager.

func (*IngressResolver) InstanceExists ¶

func (r *IngressResolver) InstanceExists(ctx context.Context, nameOrID string) (bool, error)

InstanceExists checks if an instance with the given name, ID, or ID prefix exists.

func (*IngressResolver) ResolveInstance ¶

func (r *IngressResolver) ResolveInstance(ctx context.Context, nameOrID string) (string, string, error)

ResolveInstance resolves an instance name, ID, or ID prefix to its canonical name and ID.

func (*IngressResolver) ResolveInstanceIP ¶

func (r *IngressResolver) ResolveInstanceIP(ctx context.Context, nameOrID string) (string, error)

ResolveInstanceIP resolves an instance name, ID, or ID prefix to its IP address.

type Instance ¶

type Instance struct {
	StoredMetadata

	// Derived fields (not stored in metadata.json)
	State       State   // Derived from socket + VMM query
	StateError  *string // Error message if state couldn't be determined (non-nil when State=Unknown)
	HasSnapshot bool    // Derived from filesystem check
}

Instance represents a virtual machine instance with derived runtime state

func (*Instance) GetHypervisorType ¶

func (i *Instance) GetHypervisorType() string

GetHypervisorType returns the hypervisor type as a string. This implements the middleware.HypervisorTyper interface for OTEL enrichment.

type ListInstancesFilter ¶ added in v0.0.7

type ListInstancesFilter struct {
	State    *State            // Filter by instance state
	Metadata map[string]string // Filter by metadata key-value pairs (all must match)
}

ListInstancesFilter contains optional filters for listing instances. All fields are ANDed together: an instance must match every specified filter.

func (*ListInstancesFilter) Matches ¶ added in v0.0.7

func (f *ListInstancesFilter) Matches(inst *Instance) bool

Matches returns true if the given instance satisfies all filter criteria.

type LogSource ¶

type LogSource string

LogSource represents a log source type

const (
	// LogSourceApp is the guest application log (serial console)
	LogSourceApp LogSource = "app"
	// LogSourceVMM is the Cloud Hypervisor VMM log
	LogSourceVMM LogSource = "vmm"
	// LogSourceHypeman is the hypeman operations log
	LogSourceHypeman LogSource = "hypeman"
)

type Manager ¶

type Manager interface {
	ListInstances(ctx context.Context, filter *ListInstancesFilter) ([]Instance, error)
	CreateInstance(ctx context.Context, req CreateInstanceRequest) (*Instance, error)
	// GetInstance returns an instance by ID, name, or ID prefix.
	// Lookup order: exact ID match -> exact name match -> ID prefix match.
	// Returns ErrAmbiguousName if prefix matches multiple instances.
	GetInstance(ctx context.Context, idOrName string) (*Instance, error)
	DeleteInstance(ctx context.Context, id string) error
	ForkInstance(ctx context.Context, id string, req ForkInstanceRequest) (*Instance, error)
	StandbyInstance(ctx context.Context, id string) (*Instance, error)
	RestoreInstance(ctx context.Context, id string) (*Instance, error)
	StopInstance(ctx context.Context, id string) (*Instance, error)
	StartInstance(ctx context.Context, id string, req StartInstanceRequest) (*Instance, error)
	StreamInstanceLogs(ctx context.Context, id string, tail int, follow bool, source LogSource) (<-chan string, error)
	RotateLogs(ctx context.Context, maxBytes int64, maxFiles int) error
	AttachVolume(ctx context.Context, id string, volumeId string, req AttachVolumeRequest) (*Instance, error)
	DetachVolume(ctx context.Context, id string, volumeId string) (*Instance, error)
	// ListInstanceAllocations returns resource allocations for all instances.
	// Used by the resource manager for capacity tracking.
	ListInstanceAllocations(ctx context.Context) ([]resources.InstanceAllocation, error)
	// ListRunningInstancesInfo returns info needed for utilization metrics collection.
	// Used by the resource manager for VM utilization tracking.
	ListRunningInstancesInfo(ctx context.Context) ([]resources.InstanceUtilizationInfo, error)
	// SetResourceValidator sets the validator for aggregate resource limit checking.
	// Called after initialization to avoid circular dependencies.
	SetResourceValidator(v ResourceValidator)
	// GetVsockDialer returns a VsockDialer for the specified instance.
	GetVsockDialer(ctx context.Context, instanceID string) (hypervisor.VsockDialer, error)
}

func NewManager ¶

func NewManager(p *paths.Paths, imageManager images.Manager, systemManager system.Manager, networkManager network.Manager, deviceManager devices.Manager, volumeManager volumes.Manager, limits ResourceLimits, defaultHypervisor hypervisor.Type, meter metric.Meter, tracer trace.Tracer) Manager

NewManager creates a new instances manager. If meter is nil, metrics are disabled. defaultHypervisor specifies which hypervisor to use when not specified in requests.

type Metrics ¶

type Metrics struct {
	// contains filtered or unexported fields
}

Metrics holds the metrics instruments for instance operations.

type ResourceLimits ¶

type ResourceLimits struct {
	MaxOverlaySize       int64 // Maximum overlay disk size in bytes per instance
	MaxVcpusPerInstance  int   // Maximum vCPUs per instance (0 = unlimited)
	MaxMemoryPerInstance int64 // Maximum memory in bytes per instance (0 = unlimited)
}

ResourceLimits contains configurable resource limits for instances

type ResourceValidator ¶ added in v0.0.6

type ResourceValidator interface {
	// ValidateAllocation checks if the requested resources are available.
	// Returns nil if allocation is allowed, or a detailed error describing
	// which resource is insufficient and the current capacity/usage.
	ValidateAllocation(ctx context.Context, vcpus int, memoryBytes int64, networkDownloadBps int64, networkUploadBps int64, diskIOBps int64, needsGPU bool) error
}

ResourceValidator validates if resources can be allocated

type StartInstanceRequest ¶ added in v0.0.6

type StartInstanceRequest struct {
	Entrypoint []string // Override entrypoint (nil = keep previous/image default)
	Cmd        []string // Override cmd (nil = keep previous/image default)
}

StartInstanceRequest is the domain request for starting a stopped instance

type State ¶

type State string

State represents the instance state

const (
	StateStopped  State = "Stopped"  // No VMM, no snapshot
	StateCreated  State = "Created"  // VMM created but not booted (CH native)
	StateRunning  State = "Running"  // VM running (CH native)
	StatePaused   State = "Paused"   // VM paused (CH native)
	StateShutdown State = "Shutdown" // VM shutdown, VMM exists (CH native)
	StateStandby  State = "Standby"  // No VMM, snapshot exists
	StateUnknown  State = "Unknown"  // Failed to determine state (VMM query failed)
)

func (State) CanTransitionTo ¶

func (s State) CanTransitionTo(target State) error

CanTransitionTo checks if a transition from current state to target state is valid

func (State) IsTerminal ¶

func (s State) IsTerminal() bool

IsTerminal returns true if this state represents a terminal transition point

func (State) RequiresVMM ¶

func (s State) RequiresVMM() bool

RequiresVMM returns true if this state requires a running VMM process

func (State) String ¶

func (s State) String() string

String returns the string representation of the state

type StoredMetadata ¶

type StoredMetadata struct {
	// Identification
	Id    string // Auto-generated CUID2
	Name  string
	Image string // OCI reference

	// Resources (matching Cloud Hypervisor terminology)
	Size                     int64 // Base memory in bytes
	HotplugSize              int64 // Hotplug memory in bytes
	OverlaySize              int64 // Overlay disk size in bytes
	Vcpus                    int
	NetworkBandwidthDownload int64 // Download rate limit in bytes/sec (external→VM), 0 = auto
	NetworkBandwidthUpload   int64 // Upload rate limit in bytes/sec (VM→external), 0 = auto
	DiskIOBps                int64 // Disk I/O rate limit in bytes/sec, 0 = auto

	// Configuration
	Env            map[string]string
	Metadata       map[string]string // User-defined key-value metadata
	NetworkEnabled bool              // Whether instance has networking enabled (uses default network)
	IP             string            // Assigned IP address (empty if NetworkEnabled=false)
	MAC            string            // Assigned MAC address (empty if NetworkEnabled=false)

	// Attached volumes
	Volumes []VolumeAttachment // Volumes attached to this instance

	// Timestamps (stored for historical tracking)
	CreatedAt time.Time
	StartedAt *time.Time // Last time VM was started
	StoppedAt *time.Time // Last time VM was stopped

	// Versions
	KernelVersion string // Kernel version (e.g., "ch-v6.12.9")

	// Hypervisor configuration
	HypervisorType    hypervisor.Type // Hypervisor type (e.g., "cloud-hypervisor")
	HypervisorVersion string          // Hypervisor version (e.g., "v49.0")
	HypervisorPID     *int            // Hypervisor process ID (may be stale after host restart)

	// Paths
	SocketPath string // Path to API socket
	DataDir    string // Instance data directory

	// vsock configuration
	VsockCID    int64  // Guest vsock Context ID
	VsockSocket string // Host-side vsock socket path

	// Attached devices (GPU passthrough)
	Devices []string // Device IDs attached to this instance

	// GPU configuration (vGPU mode)
	GPUProfile  string // vGPU profile name (e.g., "L40S-1Q")
	GPUMdevUUID string // mdev device UUID

	// Command overrides (like docker run <image> <command>)
	Entrypoint []string // Override image entrypoint (nil = use image default)
	Cmd        []string // Override image cmd (nil = use image default)

	// Boot optimizations
	SkipKernelHeaders bool // Skip kernel headers installation (disables DKMS)
	SkipGuestAgent    bool // Skip guest-agent installation (disables exec/stat API)

	// Shutdown configuration
	StopTimeout int // Grace period in seconds for graceful stop (0 = use default 5s)

	// Exit information (populated from serial console sentinel when VM stops)
	ExitCode    *int   // App exit code, nil if VM hasn't exited
	ExitMessage string // Human-readable description of exit (e.g., "command not found", "killed by signal 9 (SIGKILL) - OOM")
}

StoredMetadata represents instance metadata that is persisted to disk

type VolumeAttachment ¶

type VolumeAttachment struct {
	VolumeID    string // Volume ID
	MountPath   string // Mount path in guest
	Readonly    bool   // Whether mounted read-only
	Overlay     bool   // If true, create per-instance overlay for writes (requires Readonly=true)
	OverlaySize int64  // Size of overlay disk in bytes (max diff from base)
}

VolumeAttachment represents a volume attached to an instance

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Instance Manager

Design Decisions

Why State Machine? (state.go)

Why Config Disk? (configdisk.go)

Filesystem Layout (storage.go)

Multi-Hop Orchestrations (manager.go)

Snapshot Optimization (standby.go, restore.go)

Reference Handling

Testing

Dependencies

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func NewLivenessChecker ¶

func WaitForProcessExit ¶

Types ¶

type AttachVolumeRequest ¶

type CreateInstanceRequest ¶

type ForkInstanceRequest ¶ added in v0.0.7

type GPUConfig ¶ added in v0.0.5

type HostTopology ¶

type IngressResolver ¶

func NewIngressResolver ¶

func (*IngressResolver) InstanceExists ¶

func (*IngressResolver) ResolveInstance ¶

func (*IngressResolver) ResolveInstanceIP ¶

type Instance ¶

func (*Instance) GetHypervisorType ¶

type ListInstancesFilter ¶ added in v0.0.7

func (*ListInstancesFilter) Matches ¶ added in v0.0.7

type LogSource ¶

type Manager ¶

func NewManager ¶

type Metrics ¶

type ResourceLimits ¶

type ResourceValidator ¶ added in v0.0.6

type StartInstanceRequest ¶ added in v0.0.6

type State ¶

func (State) CanTransitionTo ¶

func (State) IsTerminal ¶

func (State) RequiresVMM ¶

func (State) String ¶

type StoredMetadata ¶

type VolumeAttachment ¶

Source Files ¶