runtime

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Documentation

Overview

Package runtime provides runtime management for model instances.

Package runtime provides runtime implementations for model inference backends.

This file implements the ExtSandbox, a configuration-driven device sandbox that enables support for additional AI accelerator chips without code changes.

Package runtime provides runtime management for model instances.

This package implements a manager that coordinates multiple runtime implementations (e.g., vLLM-Docker, MindIE-Docker), handles device allocation, and provides lifecycle management for model instances.

Package runtime provides the core runtime abstraction for model lifecycle management.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BoolPtr

func BoolPtr(b bool) *bool

BoolPtr returns a pointer to a boolean value.

This utility function is useful for Docker API fields that require pointer types to distinguish between false and unset.

Parameters:

  • b: Boolean value to create pointer for

Returns:

  • Pointer to boolean value

Example:

hostConfig.Init = BoolPtr(true)

func CheckDockerImageExists

func CheckDockerImageExists(ctx context.Context, imageName string) (bool, error)

CheckDockerImageExists checks if a Docker image exists locally using docker CLI.

This function queries Docker to determine if an image is available in the local Docker image cache. It uses the docker CLI command for simplicity and compatibility.

Parameters:

  • ctx: Context for cancellation and timeout control
  • imageName: Full image name (e.g., "ubuntu:20.04", "quay.io/ascend/vllm-ascend:v0.11.0rc0")

Returns:

  • true if image exists locally
  • Error if Docker query fails

Thread Safety: Safe for concurrent calls

func GetImageForEngine

func GetImageForEngine(configMap map[string]map[string]map[string]string, devices []DeviceInfo, engineName string) (string, error)

GetImageForEngine is a helper function to get Docker image for specific engine.

This function encapsulates the common logic for sandbox implementations to get the appropriate Docker image based on device information and engine type. It:

  1. Extracts chip model name from device list
  2. Maps model name to configuration key
  3. Auto-detects system architecture
  4. Looks up image from RuntimeImagesConfig
  5. Returns error if any step fails (no fallback)

This is used by both vLLM and MindIE sandbox implementations to avoid code duplication.

Parameters:

  • runtimeImages: RuntimeImagesConfig instance (as interface{} to avoid import cycle)
  • devices: List of devices (uses first device's ModelName for chip identification)
  • engineName: Inference engine name (e.g., "vllm", "mindie")

Returns:

  • Docker image URL if found
  • Error if configuration is invalid or image not found

func LoadExtendedSandboxes

func LoadExtendedSandboxes(engineName string) []func() DeviceSandbox

LoadExtendedSandboxes loads extended sandbox configurations for a specific engine.

This helper function provides a centralized way for runtimes to discover and load configuration-based sandboxes. It encapsulates the common logic of reading ext_sandboxes from devices.yaml and creating ExtSandbox instances for the specified engine (vllm, mindie, or mlguider).

The function is designed to be called during runtime initialization (init) to register extended sandboxes alongside core code-based sandboxes.

Configuration structure in devices.yaml:

chip_models:
  - config_key: kunlun-r200
    model_name: Kunlun XPU R200
    device_id: "0x1234"
    ext_sandboxes:
      # Common configuration (shared by all engines)
      devices: [/dev/xpu0, /dev/xpu1, /dev/xpu_ctl]
      volumes: [/usr/local/xpu:/usr/local/xpu]
      runtime: runc
      # Engine-specific configurations
      vllm:
        device_env: XPU_VISIBLE_DEVICES
        privileged: true
      mindie:
        device_env: XPU_VISIBLE_DEVICES
        privileged: true

Parameters:

  • engineName: Engine name to load configs for ("vllm", "mindie", "mlguider")

Returns:

  • List of sandbox constructor functions for the specified engine
  • Empty list if device config is not loaded or no extended sandboxes are defined

Example usage in vllm-docker runtime initialization:

func init() {
    extSandboxes := runtime.LoadExtendedSandboxes("vllm")
    if len(extSandboxes) > 0 {
        sandboxRegistry = append(sandboxRegistry, extSandboxes...)
    }
}

func PullDockerImage

func PullDockerImage(ctx context.Context, imageName string, eventCh chan<- string) error

PullDockerImage pulls a Docker image from registry using docker CLI with PTY.

This function pulls an image from the configured registry (or Docker Hub by default). The pull operation shows progress through an optional event channel.

Progress Events:

  • "DOCKER_CR|..." - Carriage return update (overwrite current line)
  • "DOCKER_LF|..." - Line feed update (new line)
  • Regular messages for status updates

The function uses PTY to capture Docker's native progress bars and formatting.

Parameters:

  • ctx: Context for cancellation and timeout control
  • imageName: Full image name to pull
  • eventCh: Optional channel for progress events (can be nil)

Returns:

  • nil on success
  • Error if pull fails or is cancelled

Thread Safety: Safe for concurrent calls

func UpdateInstanceStateFromContainer

func UpdateInstanceStateFromContainer(ctx context.Context, dockerClient *client.Client, instance *Instance) bool

UpdateInstanceStateFromContainer checks the actual container state and updates the instance.

This function should be called periodically to keep instance state in sync with Docker container reality. It only updates instances that are expected to be running (starting, running, or ready states).

Parameters:

  • ctx: Context for cancellation
  • dockerClient: Docker API client
  • instance: Instance to update (modified in place)

Returns:

  • true if state was changed
  • false if state remained the same or instance doesn't need checking

Types

type BackendType

type BackendType = api.BackendType

Type aliases for convenience

type ContainerStateInfo

type ContainerStateInfo struct {
	// State is the mapped instance state
	State InstanceState

	// ErrorMessage contains details if state is StateError
	ErrorMessage string

	// ExitCode contains the container exit code (only valid for exited containers)
	ExitCode int

	// IsRunning indicates if the container is currently running
	IsRunning bool
}

ContainerStateInfo holds the result of container state inspection.

func InspectContainerState

func InspectContainerState(ctx context.Context, dockerClient *client.Client, containerID string) (*ContainerStateInfo, error)

InspectContainerState inspects a Docker container and maps its state to our instance state model.

State Mapping Rules:

  • Container running -> StateRunning
  • Container created -> StateCreated
  • Container exited/failed -> StateError (since xw stop now removes containers)
  • Inspect failure -> Returns error

This function encapsulates all container-to-instance state mapping logic in one place, ensuring consistency across the codebase.

Parameters:

  • ctx: Context for cancellation
  • dockerClient: Docker API client
  • containerID: Container ID to inspect

Returns:

  • ContainerStateInfo with state details
  • Error if inspection fails

type CreateParams

type CreateParams struct {
	InstanceID     string
	ModelID        string
	Alias          string // Instance alias for inference (defaults to ModelID)
	ModelPath      string
	ModelVersion   string
	BackendType    string // Backend type (e.g., "vllm")
	DeploymentMode string // Deployment mode (e.g., "docker")
	ServerName     string // Server unique identifier (added as container name suffix)
	DataDir        string // Data directory for runtime files (e.g., converted models)
	Devices        []DeviceInfo
	Port           int
	Environment    map[string]string
	ExtraConfig    map[string]interface{}

	// Template parameters from runtime_params.yaml
	// Format: ["key=value", "tensor_parallel=4"]
	// Will be converted to environment variables (KEY=value, TENSOR_PARALLEL=4)
	TemplateParams []string

	// Unified parallelism parameters (set by Manager)
	TensorParallel   int // Number of devices for tensor parallelism
	PipelineParallel int // Number of devices for pipeline parallelism (default: 1)
	WorldSize        int // Total number of devices (TENSOR_PARALLEL * PIPELINE_PARALLEL)

	// EventChannel for sending progress messages to client (optional, for SSE streams)
	EventChannel chan<- string
}

CreateParams contains standard parameters for creating an instance.

type DeploymentMode

type DeploymentMode = api.DeploymentMode

type DeviceInfo

type DeviceInfo struct {
	Type       api.DeviceType
	Index      int
	PCIAddress string
	DevicePath string
	ModelName  string
	ConfigKey  string // Base model config key (for sandbox selection, image lookup)
	VariantKey string // Specific variant key (for runtime_params matching), empty if no variant
	Properties map[string]string
}

DeviceInfo contains information about a hardware device.

type DeviceSandbox

type DeviceSandbox interface {
	// PrepareEnvironment generates device-specific environment variables.
	//
	// This method prepares the environment configuration that makes devices
	// visible and accessible to processes inside the container. Different
	// device types have different environment variable conventions.
	//
	// Examples:
	//   - Ascend NPU: ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
	//   - NVIDIA GPU: CUDA_VISIBLE_DEVICES=0,1,2,3
	//   - Kunlun XPU: XPU_VISIBLE_DEVICES=0,1,2,3
	//
	// Parameters:
	//   - devices: List of devices to make visible to the container
	//
	// Returns:
	//   - Map of environment variable name to value
	//   - Error if device configuration is invalid
	PrepareEnvironment(devices []DeviceInfo) (map[string]string, error)

	// GetDeviceMounts returns device files that must be mounted into the container.
	//
	// Device files provide direct hardware access to the container. These
	// files are typically under /dev and require special permissions (rwm).
	//
	// Examples:
	//   - Ascend NPU: ["/dev/davinci0", "/dev/davinci_manager", "/dev/devmm_svm"]
	//   - NVIDIA GPU: ["/dev/nvidia0", "/dev/nvidiactl", "/dev/nvidia-uvm"]
	//
	// Parameters:
	//   - devices: List of devices to mount
	//
	// Returns:
	//   - List of device paths (e.g., ["/dev/davinci0", "/dev/davinci_manager"])
	//   - Error if device paths are invalid
	GetDeviceMounts(devices []DeviceInfo) ([]string, error)

	// GetAdditionalMounts returns additional volume mounts required by the device.
	//
	// Many device types require access to host libraries, tools, and configuration
	// files beyond just the device files. This method returns a mapping of host
	// paths to container paths for these additional requirements.
	//
	// Common mount types:
	//   - Driver libraries: Shared libraries required by device SDK
	//   - Management tools: Utilities for device monitoring (npu-smi, nvidia-smi)
	//   - Configuration files: Device installation info and version metadata
	//   - Cache directories: For model downloads and compilation artifacts
	//
	// Returns:
	//   - Map of host path to container path (e.g., {"/usr/local/dcmi": "/usr/local/dcmi"})
	GetAdditionalMounts() map[string]string

	// RequiresPrivileged indicates whether the container needs privileged mode.
	//
	// Privileged mode grants the container extended permissions and capabilities,
	// including access to all devices and the ability to modify system settings.
	// While less secure, some device types require it for proper operation.
	//
	// Security Considerations:
	//   - Privileged containers can potentially access host resources
	//   - Prefer capability-based security when possible
	//   - Document why privileged mode is required if true
	//
	// Returns:
	//   - true if --privileged flag is required
	RequiresPrivileged() bool

	// Supports checks if this sandbox supports the given device type.
	//
	// This method allows sandboxes to declare which device types they support.
	// Runtimes use this to automatically select the appropriate sandbox
	// implementation for a given device type without hardcoding device lists.
	//
	// Each sandbox should explicitly list the device types it has been
	// tested and validated to work with. Device types must match the
	// config_key values from the device configuration (devices.yaml).
	//
	// Parameters:
	//   - deviceType: Device type string (e.g., "ascend-910b", "ascend-310p")
	//
	// Returns:
	//   - true if this sandbox supports the device type
	//
	// Example:
	//
	//	func (s *AscendSandbox) Supports(deviceType string) bool {
	//	    supportedTypes := []string{"ascend-910b", "ascend-310p"}
	//	    for _, t := range supportedTypes {
	//	        if deviceType == t { return true }
	//	    }
	//	    return false
	//	}
	Supports(deviceType string) bool

	// GetCapabilities returns Linux capabilities needed by the container.
	//
	// Linux capabilities provide fine-grained privilege control. Even when
	// privileged mode is used, documenting required capabilities helps
	// understand security requirements and may support future migration
	// to non-privileged mode.
	//
	// Common capabilities:
	//   - SYS_ADMIN: System administration operations
	//   - SYS_RAWIO: Direct device I/O access
	//   - IPC_LOCK: Memory locking for device buffers
	//   - SYS_RESOURCE: Resource limit adjustments
	//
	// Returns:
	//   - List of capability names (e.g., ["SYS_ADMIN", "SYS_RAWIO"])
	GetCapabilities() []string

	// GetDefaultImage returns the default Docker image for this device type.
	//
	// Different device types typically require different container images
	// with device-specific drivers and libraries pre-installed. This method
	// provides a sensible default that can be overridden by users.
	//
	// The method receives device information to determine the appropriate image
	// based on the specific chip model (e.g., Ascend 910B vs 310P) and system
	// architecture (ARM64 vs x86_64).
	//
	// Image Guidelines:
	//   - Use official or verified images when available
	//   - Pin to specific versions for reproducibility
	//   - Include full registry path for clarity
	//
	// Parameters:
	//   - devices: List of devices to get the image for
	//
	// Returns:
	//   - Docker image URL (e.g., "quay.io/ascend/vllm-ascend:v0.11.0rc0")
	//   - Error if image configuration is not found or invalid
	GetDefaultImage(devices []DeviceInfo) (string, error)

	// GetDockerRuntime returns the Docker runtime to use for this device type.
	//
	// Docker supports pluggable runtimes (via OCI spec) that can provide
	// device-specific functionality. Common runtimes:
	//   - "runc": Default OCI runtime (standard containers)
	//   - "nvidia": NVIDIA Container Runtime (automatic GPU setup)
	//   - "kata-runtime": Lightweight VM isolation
	//
	// Returns:
	//   - Runtime name (e.g., "runc", "nvidia", "ascend")
	GetDockerRuntime() string
}

DeviceSandbox defines the interface for device-specific configuration.

Each device type (Ascend NPU, Kunlun XPU, NVIDIA GPU, etc.) implements this interface to provide chip-specific Docker configuration. This abstraction isolates device-specific logic from the core runtime implementation, enabling runtime backends to support multiple hardware accelerators.

Responsibilities:

  • Environment variable preparation (device visibility, logging, etc.)
  • Device file mounting (e.g., /dev/davinci*, /dev/npu*, /dev/nvidia*)
  • Additional volume mounts (driver libs, tools, config files)
  • Security requirements (privileged mode, capabilities)
  • Docker runtime selection (runc, nvidia, ascend, etc.)
  • Default container image selection

Implementation Guidelines:

  • PrepareEnvironment: Return device visibility and configuration variables
  • GetDeviceMounts: Return device files that need rwm access
  • GetAdditionalMounts: Return host->container path mappings
  • RequiresPrivileged: Return true if privileged mode is required
  • GetCapabilities: Document required Linux capabilities
  • GetDefaultImage: Return device-optimized container image
  • GetDockerRuntime: Return Docker runtime name (e.g., "runc", "nvidia")

Thread Safety:

Implementations should be stateless and safe for concurrent use.

Example Implementation:

type AscendSandbox struct{}

func (s *AscendSandbox) PrepareEnvironment(devices []DeviceInfo) (map[string]string, error) {
    return map[string]string{
        "ASCEND_RT_VISIBLE_DEVICES": "0,1,2,3",
        "ASCEND_SLOG_PRINT_TO_STDOUT": "1",
    }, nil
}

func NewExtSandbox

func NewExtSandbox(deviceType string, engineName string, conf *config.ExtSandboxConfig) DeviceSandbox

NewExtSandbox creates a new configuration-driven device sandbox.

Parameters:

  • deviceType: The device config_key (e.g., "kunlun-r200", "cambricon-mlu370")
  • conf: The parsed configuration from sandboxes.yaml

Returns:

  • A fully initialized ExtSandbox implementing the DeviceSandbox interface

Example usage:

cfg := &config.ExtSandboxConfig{
    DeviceEnv: "XPU_VISIBLE_DEVICES",
    Devices: []string{"/dev/xpu0", "/dev/xpu1", "/dev/xpu_ctl"},
    // ... other fields
}
sandbox := runtime.NewExtSandbox("kunlun-r200", cfg)

type DockerRuntimeBase

type DockerRuntimeBase struct {
	// contains filtered or unexported fields
}

DockerRuntimeBase provides common Docker operations for runtime implementations.

This base implementation handles the shared Docker infrastructure used by different runtime backends (vLLM, MindIE, etc.). It provides:

  • Docker client lifecycle management with connection pooling
  • Container lifecycle operations (start, stop, remove)
  • Instance state tracking and synchronization
  • Log streaming with proper cleanup
  • Container discovery and restoration after restarts

Concrete runtime implementations should embed this struct and implement the Create() method with their specific container configuration logic.

Thread Safety:

All methods are thread-safe through RWMutex synchronization.
The instances map is protected for concurrent access by multiple goroutines.

func NewDockerRuntimeBase

func NewDockerRuntimeBase(runtimeName string) (*DockerRuntimeBase, error)

NewDockerRuntimeBase creates and initializes a new Docker runtime base.

This function performs the following initialization steps:

  1. Creates Docker client with environment-based configuration (DOCKER_HOST, etc.)
  2. Negotiates API version with Docker daemon for compatibility
  3. Verifies Docker daemon connectivity with timeout
  4. Initializes instance tracking structures

The created base must be embedded in a concrete runtime implementation. Call LoadExistingContainers() after construction to restore previous state.

Parameters:

  • runtimeName: Unique identifier for this runtime type (used in container labels)

Returns:

  • Initialized base runtime instance
  • Error if Docker daemon is unreachable or client creation fails

Example:

base, err := NewDockerRuntimeBase("vllm:docker")
if err != nil {
    return nil, fmt.Errorf("failed to initialize: %w", err)
}

func (*DockerRuntimeBase) ApplyTemplateParams

func (b *DockerRuntimeBase) ApplyTemplateParams(env map[string]string, params *CreateParams)

ApplyTemplateParams applies template parameters from CreateParams to the environment map.

This is a common Docker operation that converts template parameters (key=value format) into environment variables (KEY=VALUE format) suitable for Docker containers.

Template parameter keys are converted to environment variable format:

  • camelCase -> CAMEL_CASE
  • kebab-case -> KEBAB_CASE

Template environment variables will NOT override existing environment variables. This allows explicit environment variables to take precedence over template defaults.

Parameters:

  • env: Existing environment map to merge template parameters into
  • params: CreateParams containing TemplateParams to apply

Example:

env := map[string]string{"CUSTOM_VAR": "value"}
ApplyTemplateParams(env, params) // params.TemplateParams = ["tensorParallel=4"]
// Result: env = {"CUSTOM_VAR": "value", "TENSOR_PARALLEL": "4"}

func (*DockerRuntimeBase) CreateContainerWithLabels

func (b *DockerRuntimeBase) CreateContainerWithLabels(
	ctx context.Context,
	params *CreateParams,
	containerConfig *container.Config,
	hostConfig *container.HostConfig,
	containerName string,
	extraLabels map[string]string,
) (container.CreateResponse, error)

CreateContainerWithLabels creates a Docker container with automatic common label injection.

This method wraps Docker's ContainerCreate API and automatically adds common xw labels to ensure all containers can be discovered and managed consistently.

Common labels added automatically:

  • xw.runtime: Runtime type (e.g., "vllm:docker", "mindie:docker")
  • xw.model_id: Model identifier
  • xw.alias: Instance alias for inference
  • xw.instance_id: Unique instance identifier
  • xw.backend_type: Backend type (e.g., "vllm", "mindie")
  • xw.deployment_mode: Deployment mode (e.g., "docker")
  • xw.server_name: Server identifier for multi-server support
  • xw.max_concurrent: Max concurrent requests (if specified in ExtraConfig)

Runtime-specific labels can be passed via the extraLabels parameter.

Parameters:

  • ctx: Context for cancellation and timeout
  • params: Creation parameters containing model info and configuration
  • containerConfig: Docker container configuration
  • hostConfig: Docker host configuration
  • containerName: Name for the container
  • extraLabels: Additional runtime-specific labels (optional)

Returns:

  • Container creation response with ID
  • Error if container creation fails

func (*DockerRuntimeBase) EnsureImage

func (b *DockerRuntimeBase) EnsureImage(ctx context.Context, imageName string, params *CreateParams) error

EnsureImage checks if an image exists locally and pulls it if not.

This method combines CheckDockerImageExists and PullDockerImage to ensure a Docker image is available before creating containers. It's designed to be called by concrete runtime implementations (vllm:docker, mindie:docker) in their Create methods.

The method sends progress events through the CreateParams.EventChannel:

  • Checking image availability
  • Image found/not found status
  • Pull progress (if needed)

Parameters:

  • ctx: Context for cancellation and timeout control
  • imageName: Full image name to ensure
  • params: CreateParams containing EventChannel for progress updates

Returns:

  • nil if image is available (either found or pulled successfully)
  • Error if check fails or pull fails

Thread Safety: Safe for concurrent calls

Example:

if err := r.EnsureImage(ctx, imageName, params); err != nil {
    return nil, fmt.Errorf("failed to ensure image: %w", err)
}

func (*DockerRuntimeBase) Get

func (b *DockerRuntimeBase) Get(ctx context.Context, instanceID string) (*Instance, error)

Get retrieves instance information by ID.

This method returns a pointer to the instance struct, allowing callers to read instance metadata, state, and configuration. The returned pointer should not be modified directly - use runtime methods for state changes.

This method also checks the actual container status and updates instance state if the container has exited unexpectedly.

Parameters:

  • ctx: Context for cancellation
  • instanceID: Unique identifier of the instance to retrieve

Returns:

  • Instance pointer on success
  • Error if instance not found

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) GetDockerClient

func (b *DockerRuntimeBase) GetDockerClient() *client.Client

GetDockerClient returns the underlying Docker client.

This method exposes the Docker client for advanced operations not covered by the base implementation. Use with caution as direct client access bypasses base state management.

Returns:

  • Docker API client pointer

Thread Safety: Safe for concurrent calls (client itself is thread-safe)

func (*DockerRuntimeBase) GetInstances

func (b *DockerRuntimeBase) GetInstances() map[string]*Instance

GetInstances returns the instances map for direct access.

This method is intended for concrete runtime implementations that need to access or modify the instances map. Callers must handle locking appropriately using GetMutex().

WARNING: Direct map access requires external synchronization.

Returns:

  • Map of instance ID to Instance pointer

Thread Safety: NOT safe - caller must lock GetMutex()

func (*DockerRuntimeBase) GetMutex

func (b *DockerRuntimeBase) GetMutex() *sync.RWMutex

GetMutex returns the mutex for synchronizing instance map access.

Concrete runtime implementations should use this mutex when accessing the instances map directly via GetInstances().

Returns:

  • RWMutex pointer for synchronization

Thread Safety: Always safe to call

func (*DockerRuntimeBase) GetServerName

func (b *DockerRuntimeBase) GetServerName() string

GetServerName returns the current server name.

Returns:

  • Server name string, or empty if not configured

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) List

func (b *DockerRuntimeBase) List(ctx context.Context) ([]*Instance, error)

List returns all instances managed by this runtime.

The returned slice contains pointers to all tracked instances, regardless of their state (running, stopped, etc.). The list is a snapshot at call time.

This method also checks the actual container status and updates instance state if the container has exited unexpectedly (e.g., due to errors).

Parameters:

  • ctx: Context for cancellation

Returns:

  • Slice of instance pointers (empty if no instances)
  • Error (currently always nil, reserved for future use)

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) LoadExistingContainers

func (b *DockerRuntimeBase) LoadExistingContainers(ctx context.Context) error

LoadExistingContainers discovers and registers containers from previous runs.

This method performs container restoration by:

  1. Querying Docker for containers with matching runtime label
  2. Filtering by server name (if configured) for multi-server support
  3. Inspecting each container using centralized state management
  4. Extracting container metadata and port mappings
  5. Marking allocated ports as used to prevent conflicts
  6. Registering instances in the tracking map

This allows the runtime to resume managing containers after a restart, enabling seamless server upgrades and crash recovery.

State Management:

Uses InspectContainerState() for consistent state mapping across the codebase.
See state_manager.go for detailed state mapping rules.

Port Allocation:

Discovered ports are marked as used in the global port allocator to
prevent new instances from conflicting with existing ones.

Parameters:

  • ctx: Context for cancellation and timeout control

Returns:

  • nil on success
  • Error if Docker query fails (individual container errors are logged as warnings)

Thread Safety: Safe for concurrent calls (but typically called once during initialization)

func (*DockerRuntimeBase) Logs

func (b *DockerRuntimeBase) Logs(ctx context.Context, instanceID string, follow bool) (LogStream, error)

Logs retrieves container logs for an instance.

This method streams logs from the Docker container with the following options:

  • Both stdout and stderr are included
  • Timestamps are prepended to each log line
  • All historical logs are returned ("tail=all")
  • Optionally follows new logs in real-time

The returned LogStream must be closed by the caller to release resources.

Parameters:

  • ctx: Context for cancellation and timeout control
  • instanceID: Unique identifier of the instance
  • follow: If true, stream continues with new logs; if false, return existing logs and close

Returns:

  • LogStream for reading log data
  • Error if instance not found or Docker operation fails

Example:

stream, err := runtime.Logs(ctx, "my-instance", true)
if err != nil {
    return err
}
defer stream.Close()
io.Copy(os.Stdout, stream)

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) RegisterCoreSandboxes

func (b *DockerRuntimeBase) RegisterCoreSandboxes(sandboxes []func() DeviceSandbox)

RegisterCoreSandboxes registers core device sandboxes for this runtime.

This method should be called during runtime initialization to register code-based sandbox implementations for mainstream accelerators.

Core sandboxes serve as fallbacks when no configuration-based sandboxes are found for a device type. This design allows users to override default implementations via configuration.

Parameters:

  • sandboxes: List of sandbox constructor functions

Thread Safety: Should be called during initialization before concurrent access

Example:

base.RegisterCoreSandboxes([]func() runtime.DeviceSandbox{
    func() runtime.DeviceSandbox { return NewAscendSandbox() },
    func() runtime.DeviceSandbox { return NewMetaXSandbox() },
})

func (*DockerRuntimeBase) ReloadContainers

func (b *DockerRuntimeBase) ReloadContainers(ctx context.Context) error

ReloadContainers clears and reloads all containers from Docker.

This method is useful when:

  • Server name changes (multi-server scenarios)
  • External container modifications need to be detected
  • Recovering from state inconsistencies

WARNING: This clears all tracked instances and reloads from Docker. Any in-memory state not persisted in container labels will be lost.

Parameters:

  • ctx: Context for cancellation and timeout control

Returns:

  • nil on success
  • Error if Docker query fails

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) Remove

func (b *DockerRuntimeBase) Remove(ctx context.Context, instanceID string) error

Remove permanently removes a Docker container and its instance tracking.

This method performs the following cleanup:

  1. Force stops the container if still running
  2. Removes the container and its anonymous volumes
  3. Unregisters instance from tracking map
  4. Releases associated resources

The operation is idempotent - removing an already-removed container will return an error from Docker but won't corrupt state.

Parameters:

  • ctx: Context for cancellation and timeout control
  • instanceID: Unique identifier of the instance to remove

Returns:

  • nil on success
  • Error if instance not found or Docker operation fails

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) SelectSandbox

func (b *DockerRuntimeBase) SelectSandbox(deviceType string) (DeviceSandbox, error)

SelectSandbox selects an appropriate sandbox for the given device type.

This method implements a configuration-first selection strategy:

  1. Check extended sandboxes from configuration (higher priority)
  2. Fall back to core sandboxes from code (lower priority)

Extended sandboxes are loaded lazily on first call to avoid timing issues with configuration file loading.

Parameters:

  • deviceType: Device config_key to find sandbox for (e.g., "ascend-910b")

Returns:

  • DeviceSandbox instance if found
  • Error if no sandbox supports the device type

Thread Safety: Safe for concurrent calls

Example:

sandbox, err := base.SelectSandbox("ascend-910b")
if err != nil {
    return fmt.Errorf("no sandbox for device: %w", err)
}

func (*DockerRuntimeBase) SetServerName

func (b *DockerRuntimeBase) SetServerName(name string)

SetServerName configures the server name for multi-server support.

The server name is used as a suffix in container names and as a filter when loading existing containers. This allows multiple xw servers to coexist on the same Docker host without conflicts.

This method should be called before LoadExistingContainers() to ensure proper container filtering.

Parameters:

  • name: Unique server identifier (e.g., hostname, UUID)

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) Start

func (b *DockerRuntimeBase) Start(ctx context.Context, instanceID string) error

Start starts a created Docker container instance.

This method transitions a container from "created" or "stopped" state to "running". The container begins executing its configured entrypoint/command.

The method performs the following:

  1. Validates instance exists in tracking map
  2. Extracts container ID from instance metadata
  3. Issues Docker start command
  4. Updates instance state and start timestamp

Parameters:

  • ctx: Context for cancellation and timeout control
  • instanceID: Unique identifier of the instance to start

Returns:

  • nil on success
  • Error if instance not found, container ID missing, or Docker operation fails

Thread Safety: Safe for concurrent calls

func (*DockerRuntimeBase) Stop

func (b *DockerRuntimeBase) Stop(ctx context.Context, instanceID string) error

Stop gracefully stops a running Docker container.

This method sends SIGTERM to the container and waits up to 30 seconds for graceful shutdown. If the container doesn't stop within the timeout, Docker will send SIGKILL to force termination.

The 30-second timeout allows models to complete any in-flight inference requests and perform proper cleanup before shutdown.

After stopping, the container is preserved (not removed) to allow:

  • Inspection of final state and logs
  • Quick restart without recreating container
  • Manual cleanup decision by operators

Parameters:

  • ctx: Context for cancellation (separate from container stop timeout)
  • instanceID: Unique identifier of the instance to stop

Returns:

  • nil on success
  • Error if instance not found or Docker operation fails

Thread Safety: Safe for concurrent calls

type ExtSandbox

type ExtSandbox struct {
	// contains filtered or unexported fields
}

ExtSandbox is a configuration-driven implementation of the DeviceSandbox interface.

ExtSandbox allows users to define device sandbox behavior through YAML configuration files, making it easy to add support for new AI accelerator chips without modifying source code or recompiling the application.

Design philosophy:

  • Simple, common scenarios: Use ExtSandbox with YAML configuration
  • Complex, custom scenarios: Implement code-based sandboxes (e.g., AscendSandbox)

The configuration-driven approach is ideal for:

  • Niche accelerator chips with small user bases
  • Rapid prototyping and testing of new devices
  • Environment-specific customizations

Limitations:

  • Device environment variable must use comma-separated format
  • Device paths must follow predictable naming patterns
  • No support for dynamic runtime configuration

For advanced use cases requiring custom logic, implement a dedicated sandbox in code (see ascend_sandbox.go, metax_sandbox.go for examples).

func (*ExtSandbox) GetAdditionalMounts

func (s *ExtSandbox) GetAdditionalMounts() map[string]string

GetAdditionalMounts returns volume mounts required by the device runtime.

These mounts typically include:

  • Driver libraries and shared objects
  • Device management tools (e.g., npu-smi, xpu-smi)
  • Configuration files and metadata
  • Cache directories for model downloads

Volume paths are specified in "host:container" format in the configuration. If no container path is provided (no colon), the same path is used for both.

Returns:

  • Map of host path to container path

Example:

// Configuration:
// volumes: ["/usr/local/xpu:/usr/local/xpu", "/root/.cache"]
//
// Result:
mounts := sandbox.GetAdditionalMounts()
// mounts = {
//   "/usr/local/xpu": "/usr/local/xpu",
//   "/root/.cache": "/root/.cache",
// }

func (*ExtSandbox) GetCapabilities

func (s *ExtSandbox) GetCapabilities() []string

GetCapabilities returns Linux capabilities required by the container.

Linux capabilities provide fine-grained privilege control as an alternative to full privileged mode. Common capabilities for AI accelerators include:

  • SYS_ADMIN: System administration operations
  • SYS_RAWIO: Direct device I/O access
  • IPC_LOCK: Memory locking for device buffers
  • SYS_RESOURCE: Resource limit adjustments

Note: Even when privileged mode is used, documenting required capabilities helps understand security requirements and may support future migration to non-privileged mode.

Returns:

  • List of capability names (e.g., ["SYS_ADMIN", "SYS_RAWIO"])

func (*ExtSandbox) GetDefaultImage

func (s *ExtSandbox) GetDefaultImage(devices []DeviceInfo) (string, error)

GetDefaultImage returns the default Docker image for this device type.

Note: ExtSandbox delegates image selection to the runtime_images.yaml configuration file. This method is not directly used by ExtSandbox but is required by the DeviceSandbox interface.

The actual image lookup is performed by the runtime using the device's ConfigKey and engine name.

Returns:

  • Empty string (image selection handled by runtime)
  • Error indicating this method is not supported

func (*ExtSandbox) GetDeviceMounts

func (s *ExtSandbox) GetDeviceMounts(devices []DeviceInfo) ([]string, error)

GetDeviceMounts returns device node paths that must be mounted into the container.

Device mounting follows an intelligent pattern-matching approach:

  • Paths ending with a digit (e.g., /dev/xpu0, /dev/xpu5): Mounted only when the corresponding device index is allocated
  • Paths not ending with a digit (e.g., /dev/xpu_ctl, /dev/xpu_manager): Mounted as shared devices for all containers

This approach allows users to list all possible device paths in the configuration, and the system automatically selects the appropriate subset based on actual device allocation.

Parameters:

  • devices: List of allocated devices

Returns:

  • List of device paths to mount with "rwm" permissions
  • Error if device configuration is invalid

Example:

// Configuration lists:
// devices: [/dev/xpu0, /dev/xpu1, /dev/xpu2, /dev/xpu_ctl]
//
// Allocated devices: 0, 2
//
// Result:
mounts, _ := sandbox.GetDeviceMounts([]DeviceInfo{{Index: 0}, {Index: 2}})
// mounts = ["/dev/xpu0", "/dev/xpu2", "/dev/xpu_ctl"]

func (*ExtSandbox) GetDockerRuntime

func (s *ExtSandbox) GetDockerRuntime() string

GetDockerRuntime returns the Docker runtime to use for this device.

Docker supports pluggable runtimes that can provide device-specific functionality. Common runtimes include:

  • runc: Standard OCI runtime (most common)
  • nvidia: NVIDIA Container Runtime
  • kata-runtime: Lightweight VM isolation

Returns:

  • Runtime name from configuration, or "runc" if not specified

func (*ExtSandbox) GetSharedMemorySize

func (s *ExtSandbox) GetSharedMemorySize() int64

GetSharedMemorySize returns the shared memory size required for the container.

Shared memory (--shm-size) is critical for:

  • PyTorch DataLoader workers in multi-process mode
  • KV cache management across processes
  • Tensor sharing in distributed inference
  • Inter-process communication in multi-device setups

The default Docker shared memory size (64MB) is often insufficient for large model inference workloads.

Returns:

  • Shared memory size in bytes (configured value or 16GB default)

func (*ExtSandbox) PrepareEnvironment

func (s *ExtSandbox) PrepareEnvironment(devices []DeviceInfo) (map[string]string, error)

PrepareEnvironment generates environment variables for the container.

This method sets up the runtime environment required for the device to function properly inside the container. It combines:

  1. Static environment variables from the configuration
  2. Dynamic device visibility variable (comma-separated device indices)

The device visibility environment variable (specified by DeviceEnv in config) is automatically populated with a comma-separated list of allocated device indices. For example, if devices 0, 2, and 5 are allocated:

XPU_VISIBLE_DEVICES=0,2,5

Parameters:

  • devices: List of allocated devices to make visible in the container

Returns:

  • Map of environment variable names to values
  • Error if device configuration is invalid

Example:

env, err := sandbox.PrepareEnvironment([]DeviceInfo{
    {Index: 0, ...},
    {Index: 2, ...},
})
// env = {
//   "XPU_VISIBLE_DEVICES": "0,2",
//   "XPU_WORKER_MULTIPROC_METHOD": "spawn",
// }

func (*ExtSandbox) RequiresPrivileged

func (s *ExtSandbox) RequiresPrivileged() bool

RequiresPrivileged indicates whether the container needs privileged mode.

Privileged mode (--privileged=true) grants the container extended permissions, including access to all devices and the ability to modify system settings.

While less secure than capability-based access control, some accelerator drivers require privileged mode for proper operation, particularly for:

  • Direct hardware access beyond standard device mounts
  • Kernel module interactions
  • Memory management operations

Returns:

  • true if privileged mode is required (from configuration)

func (*ExtSandbox) Supports

func (s *ExtSandbox) Supports(deviceType string) bool

Supports checks if this sandbox supports the specified device type.

This method is used by runtimes to automatically select the appropriate sandbox implementation for a given device type during instance creation.

Parameters:

  • deviceType: Device config_key to check (e.g., "kunlun-r200")

Returns:

  • true if this sandbox was created for the specified device type

type Instance

type Instance struct {
	ID           string
	RuntimeName  string
	CreatedAt    time.Time
	ModelID      string
	Alias        string // Instance alias for inference
	ModelVersion string
	State        InstanceState
	StartedAt    time.Time
	StoppedAt    time.Time
	Error        string
	Port         int
	Endpoint     string
	CPUUsage     float64
	MemoryUsage  int64
	Metadata     map[string]string
}

Instance represents a running model instance.

type InstanceState

type InstanceState string

InstanceState represents the state of an instance.

const (
	StateCreating  InstanceState = "creating"
	StateCreated   InstanceState = "created"
	StateStarting  InstanceState = "starting"
	StateRunning   InstanceState = "running"
	StateReady     InstanceState = "ready"     // Running and endpoint is accessible
	StateUnhealthy InstanceState = "unhealthy" // Running but endpoint is not accessible
	StateStopping  InstanceState = "stopping"
	StateStopped   InstanceState = "stopped"
	StateError     InstanceState = "error"
	StateUnknown   InstanceState = "unknown" // Unable to determine real state
)

type LogStream

type LogStream interface {
	Read(p []byte) (n int, err error)
	Close() error
}

LogStream provides access to instance logs.

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager manages multiple runtime implementations.

func NewManager

func NewManager(serverName string, cfg *config.Config) (*Manager, error)

NewManager creates a new runtime manager with the given server name and configuration. The server name is used as a suffix for container names to support multiple xw servers. Configuration is passed in to provide access to runtime parameters and other settings.

func (*Manager) Close

func (m *Manager) Close() error

Close shuts down the manager.

func (*Manager) Create

func (m *Manager) Create(ctx context.Context, runtimeName string, params *CreateParams) (*Instance, error)

Create creates an instance using the specified runtime.

This method handles unified parallelism parameter management:

  1. Calculates TensorParallel from ExtraConfig or uses device count
  2. Gets PipelineParallel from ExtraConfig (defaults to 1)
  3. Calculates WorldSize = TensorParallel * PipelineParallel
  4. Validates WorldSize matches allocated device count
  5. Passes computed parameters to runtime implementation

func (*Manager) Get

func (m *Manager) Get(ctx context.Context, instanceID string) (*Instance, error)

Get retrieves a specific instance by ID across all runtimes.

This method searches all registered runtimes to find the instance with the specified ID. It returns the first matching instance found.

Returns:

  • The instance if found
  • Error if instance not found or lookup fails

func (*Manager) GetLogsByAlias

func (m *Manager) GetLogsByAlias(ctx context.Context, alias string, follow bool) (LogStream, error)

GetLogsByAlias retrieves the log stream for an instance by its alias.

Parameters:

  • ctx: Context for cancellation and timeout
  • alias: Instance alias
  • follow: If true, stream logs in real-time

Returns:

  • LogStream reader
  • Error if instance not found

func (*Manager) GetServerName

func (m *Manager) GetServerName() string

GetServerName returns the server name

func (*Manager) List

func (m *Manager) List(ctx context.Context) ([]*Instance, error)

List lists all instances across all runtimes.

func (*Manager) ListCompat

func (m *Manager) ListCompat() []*RunInstance

ListCompat lists all instances in legacy API format.

This method provides backward compatibility with the legacy API by converting instances to the RunInstance format.

Returns:

  • Array of RunInstance objects

func (*Manager) RegisterRuntime

func (m *Manager) RegisterRuntime(runtime Runtime) error

RegisterRuntime registers a runtime implementation.

func (*Manager) Remove

func (m *Manager) Remove(ctx context.Context, instanceID string) error

Remove removes an instance and releases its allocated devices.

func (*Manager) RemoveByAliasCompat

func (m *Manager) RemoveByAliasCompat(alias string, force bool) error

RemoveByAliasCompat removes an instance by its alias.

This method provides a convenient way to remove instances using their alias instead of the internal instance ID. If force is true, it stops the instance before removing it.

Parameters:

  • alias: The alias of the instance to remove
  • force: If true, stops the instance before removing

Returns:

  • Error if the instance is not found or remove fails

func (*Manager) RemoveCompat

func (m *Manager) RemoveCompat(instanceID string, force bool) error

RemoveCompat removes an instance with legacy API compatibility.

This method provides backward compatibility by wrapping the Remove method with a timeout. If force is true, it stops the instance before removing it.

Parameters:

  • instanceID: ID of the instance to remove
  • force: If true, stops the instance before removing

Returns:

  • Error if remove fails

func (*Manager) Run

func (m *Manager) Run(configDir, dataDir string, opts *RunOptions) (*RunInstance, error)

Run creates and starts a model instance (legacy API compatibility).

This method bridges the legacy API to the new runtime system. It:

  1. Determines the runtime name from backend type and deployment mode
  2. Allocates devices for the instance
  3. Creates the instance via the appropriate runtime
  4. Starts the instance

Parameters:

  • configDir: Configuration directory for storing allocation state
  • dataDir: Data directory for runtime files (e.g., converted models)
  • opts: Legacy run options from API handler

Returns:

  • RunInstance with instance metadata
  • Error if any step fails

func (*Manager) SetServerName

func (m *Manager) SetServerName(name string)

SetServerName sets the server name (used during initialization)

func (*Manager) Start

func (m *Manager) Start(ctx context.Context, instanceID string) error

Start starts an instance.

func (*Manager) StartBackgroundTasks

func (m *Manager) StartBackgroundTasks()

StartBackgroundTasks starts background maintenance tasks.

func (*Manager) Stop

func (m *Manager) Stop(ctx context.Context, instanceID string) error

Stop stops an instance and releases its allocated devices.

This method stops the instance and removes its container. Allocated devices are released back to the pool.

func (*Manager) StopByAliasCompat

func (m *Manager) StopByAliasCompat(alias string) error

StopByAliasCompat stops an instance by its alias.

This method provides a convenient way to stop instances using their alias instead of the internal instance ID.

Parameters:

  • alias: The alias of the instance to stop

Returns:

  • Error if the instance is not found or stop fails

func (*Manager) StopCompat

func (m *Manager) StopCompat(instanceID string) error

StopCompat stops an instance with legacy API compatibility.

This method provides backward compatibility by wrapping the Stop method with a timeout.

Parameters:

  • instanceID: ID of the instance to stop

Returns:

  • Error if stop fails

type PortAllocator

type PortAllocator struct {
	// contains filtered or unexported fields
}

PortAllocator manages dynamic port allocation for model instances.

This allocator ensures that each instance gets a unique port and handles port conflicts gracefully. It uses the operating system's dynamic port allocation mechanism to find available ports.

func GetGlobalPortAllocator

func GetGlobalPortAllocator() *PortAllocator

GetGlobalPortAllocator returns the global port allocator instance.

This allocator is shared across all runtimes to prevent port conflicts. It uses lazy initialization to create the allocator on first access. Ports are allocated starting from 10881.

Returns:

  • The global PortAllocator instance

func NewPortAllocator

func NewPortAllocator(minPort, maxPort int) *PortAllocator

NewPortAllocator creates a new port allocator.

The allocator will assign ports in the range [minPort, maxPort]. By default, it uses the range [10881, 11881] for model inference services.

func (*PortAllocator) GetFreePort

func (pa *PortAllocator) GetFreePort() (int, error)

GetFreePort finds and returns an available port.

This method sequentially tries ports starting from minPort until it finds one that is available and not already allocated.

Returns:

  • Available port number
  • Error if no ports are available

func (*PortAllocator) MarkPortUsed

func (pa *PortAllocator) MarkPortUsed(port int)

MarkPortUsed marks a port as in use.

This is useful when loading existing instances that already have ports assigned, to prevent double allocation.

Parameters:

  • port: The port number to mark as used

func (*PortAllocator) ReleasePort

func (pa *PortAllocator) ReleasePort(port int)

ReleasePort marks a port as available for reuse.

Parameters:

  • port: The port number to release

type RunInstance

type RunInstance struct {
	ID             string                 `json:"id"`
	ModelID        string                 `json:"model_id"`
	Alias          string                 `json:"alias"` // Instance alias for inference
	BackendType    string                 `json:"backend_type"`
	DeploymentMode string                 `json:"deployment_mode"`
	State          InstanceState          `json:"state"`
	CreatedAt      time.Time              `json:"created_at"`
	StartedAt      time.Time              `json:"started_at,omitempty"`
	Port           int                    `json:"port"`
	ContainerID    string                 `json:"container_id,omitempty"` // Docker container ID
	Error          string                 `json:"error,omitempty"`
	Config         map[string]interface{} `json:"config,omitempty"`
}

RunInstance represents legacy API response (for handlers).

type RunOptions

type RunOptions struct {
	ModelID          string
	Alias            string // Instance alias for inference (defaults to ModelID)
	ModelPath        string // Path to the model files on disk
	BackendType      string
	DeploymentMode   string
	Port             int
	Interactive      bool
	AdditionalConfig map[string]interface{}
	EventChannel     chan<- string // Optional: for sending progress events via SSE
}

RunOptions contains legacy API parameters (for handlers).

type Runtime

type Runtime interface {
	Create(ctx context.Context, params *CreateParams) (*Instance, error)
	Start(ctx context.Context, instanceID string) error
	Stop(ctx context.Context, instanceID string) error
	Remove(ctx context.Context, instanceID string) error
	Get(ctx context.Context, instanceID string) (*Instance, error)
	List(ctx context.Context) ([]*Instance, error)
	Logs(ctx context.Context, instanceID string, follow bool) (LogStream, error)
	Name() string
}

Runtime defines the interface for model runtime backends.

Directories

Path Synopsis
Package mindiedocker implements MindIE runtime with Docker deployment.
Package mindiedocker implements MindIE runtime with Docker deployment.
Package mlguiderdocker implements MLGuider runtime with Docker deployment.
Package mlguiderdocker implements MLGuider runtime with Docker deployment.
Package omniinferdocker implements Omni-Infer runtime with Docker deployment.
Package omniinferdocker implements Omni-Infer runtime with Docker deployment.
Package vllmdocker implements vLLM runtime with Docker deployment.
Package vllmdocker implements vLLM runtime with Docker deployment.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL