Documentation
¶
Overview ¶
Package vllmdocker implements vLLM runtime with Docker deployment.
Overview ¶
This package provides a Docker-based runtime for running vLLM inference engine with support for various AI accelerators (Ascend NPU, MetaX, etc.). The runtime manages the complete lifecycle of containerized model instances.
Architecture ¶
The runtime uses a configuration-driven approach where device-specific behavior is defined in YAML configuration files (configs/devices.yaml) rather than hard-coded implementations. This design allows:
- Adding new chip support without recompiling binaries
- Quick configuration updates for driver changes
- User customization of device behavior
Configuration-Driven Device Support ¶
All device configurations are now defined in configs/devices.yaml under the ext_sandboxes section. For example:
chip_models:
- config_key: ascend-910b
ext_sandboxes:
# Common configuration (shared by all engines)
devices:
- /dev/davinci0
- /dev/davinci_manager
volumes:
- /usr/local/Ascend/driver:/usr/local/Ascend/driver
- /root/.cache:/root/.cache
runtime: runc
# Engine-specific configurations
vllm:
device_env: ASCEND_RT_VISIBLE_DEVICES
privileged: true
shm_size_gb: 100
When to Use Code-Based Sandboxes ¶
While configuration-driven approach is preferred for most cases, you may need to implement a code-based DeviceSandbox when:
Complex Logic Required: - Dynamic device path generation based on system state - Conditional mounting based on driver version detection - Custom device index to path mapping algorithms
Runtime Decisions: - Detect and handle different hardware variants at runtime - Fallback logic when certain devices are unavailable - Performance optimizations based on device capabilities
Integration with External Systems: - Query device management APIs for configuration - Interact with vendor-specific device managers - Handle complex multi-step initialization sequences
Implementing a Code-Based Sandbox ¶
To implement a custom device sandbox, follow these steps:
Step 1: Create a sandbox struct implementing runtime.DeviceSandbox interface:
package vllmdocker
import "github.com/tsingmaoai/xw-cli/internal/runtime"
// CustomSandbox handles device-specific configuration for CustomChip.
type CustomSandbox struct{}
func NewCustomSandbox() *CustomSandbox {
return &CustomSandbox{}
}
Step 2: Implement all required interface methods:
// PrepareEnvironment generates environment variables for the container.
func (s *CustomSandbox) PrepareEnvironment(devices []runtime.DeviceInfo) (map[string]string, error) {
// Build comma-separated device indices
indices := make([]string, len(devices))
for i, dev := range devices {
indices[i] = fmt.Sprintf("%d", dev.Index)
}
return map[string]string{
"CUSTOM_VISIBLE_DEVICES": strings.Join(indices, ","),
"CUSTOM_LOG_LEVEL": "3",
}, nil
}
// GetDeviceMounts returns device files to mount into container.
func (s *CustomSandbox) GetDeviceMounts(devices []runtime.DeviceInfo) ([]string, error) {
mounts := make([]string, 0, len(devices)+1)
// Mount individual device nodes
for _, dev := range devices {
mounts = append(mounts, fmt.Sprintf("/dev/custom%d", dev.Index))
}
// Add shared control device
mounts = append(mounts, "/dev/custom_manager")
return mounts, nil
}
// GetAdditionalMounts returns additional volume mounts (drivers, libs, etc.).
func (s *CustomSandbox) GetAdditionalMounts() map[string]string {
return map[string]string{
"/usr/local/custom/driver": "/usr/local/custom/driver",
"/usr/local/bin/custom-smi": "/usr/local/bin/custom-smi",
"/root/.cache": "/root/.cache",
}
}
// RequiresPrivileged indicates if container needs privileged mode.
func (s *CustomSandbox) RequiresPrivileged() bool {
return true
}
// GetCapabilities returns Linux capabilities required.
func (s *CustomSandbox) GetCapabilities() []string {
return []string{"SYS_ADMIN", "SYS_RAWIO", "IPC_LOCK"}
}
// GetDefaultImage returns Docker image for this device type.
func (s *CustomSandbox) GetDefaultImage(devices []runtime.DeviceInfo) (string, error) {
runtimeImages, err := config.LoadRuntimeImagesConfig()
if err != nil {
return "", fmt.Errorf("failed to load runtime images: %w", err)
}
return runtime.GetImageForEngine(runtimeImages, devices, "vllm")
}
// GetDockerRuntime returns Docker runtime to use.
func (s *CustomSandbox) GetDockerRuntime() string {
return "runc"
}
// GetSharedMemorySize returns shared memory size in bytes.
func (s *CustomSandbox) GetSharedMemorySize() int64 {
return 100 * 1024 * 1024 * 1024 // 100GB
}
// Supports checks if this sandbox supports the given device type.
func (s *CustomSandbox) Supports(deviceType string) bool {
return deviceType == "custom-chip-x100"
}
Step 3: Register the sandbox in NewRuntime():
func NewRuntime() (*Runtime, error) {
base, err := runtime.NewDockerRuntimeBase("vllm-docker")
if err != nil {
return nil, err
}
// Register your custom sandbox
base.RegisterCoreSandboxes([]func() runtime.DeviceSandbox{
func() runtime.DeviceSandbox { return NewCustomSandbox() },
})
// ... rest of initialization
}
Selection Priority ¶
When a device type is used, the runtime selects sandboxes in this order:
- Extended sandboxes from configs/devices.yaml (highest priority)
- Core sandboxes registered via RegisterCoreSandboxes()
- Error if no sandbox found
This means configuration always overrides code, allowing users to customize behavior without modifying binaries.
Best Practices ¶
- Prefer Configuration: Use ext_sandboxes in devices.yaml for simple cases
- Document Behavior: Add comprehensive comments explaining device requirements
- Error Handling: Provide clear error messages for configuration issues
- Validate Inputs: Check device indices, paths, and other parameters
- Test Thoroughly: Verify sandbox works with actual hardware
Example: Complex Logic Requiring Code ¶
Here's a scenario that requires code-based implementation:
// DynamicSandbox queries system at runtime to determine device paths
type DynamicSandbox struct{}
func (s *DynamicSandbox) GetDeviceMounts(devices []runtime.DeviceInfo) ([]string, error) {
mounts := []string{}
// Query vendor API to get actual device paths
vendorAPI := NewVendorAPI()
for _, dev := range devices {
path, err := vendorAPI.GetDevicePathForIndex(dev.Index)
if err != nil {
return nil, fmt.Errorf("failed to query device path: %w", err)
}
mounts = append(mounts, path)
}
// Detect driver version and add version-specific mounts
driverVersion, _ := vendorAPI.GetDriverVersion()
if strings.HasPrefix(driverVersion, "2.") {
// New driver requires additional control device
mounts = append(mounts, "/dev/vendor_v2_ctrl")
} else {
// Legacy driver uses old control device
mounts = append(mounts, "/dev/vendor_ctrl")
}
return mounts, nil
}
This type of dynamic logic cannot be expressed in static YAML configuration and requires a code-based sandbox implementation.
Migration from Code to Config ¶
If you find your code-based sandbox contains mostly static configuration, consider migrating it to ext_sandboxes in devices.yaml:
Before (Code):
func (s *AscendSandbox) GetDeviceMounts(devices []runtime.DeviceInfo) ([]string, error) {
mounts := []string{"/dev/davinci_manager", "/dev/devmm_svm"}
for _, dev := range devices {
mounts = append(mounts, fmt.Sprintf("/dev/davinci%d", dev.Index))
}
return mounts, nil
}
After (Config):
ext_sandboxes:
devices:
- /dev/davinci0 # Auto-matched by index
- /dev/davinci1
- /dev/davinci_manager # Always mounted
- /dev/devmm_svm
runtime: runc
vllm:
device_env: ASCEND_RT_VISIBLE_DEVICES
privileged: true
The configuration-driven approach reduces code maintenance and allows users to adapt to new hardware without waiting for releases.
Package vllmdocker implements vLLM runtime with Docker deployment.
This package provides a Docker-based runtime for running vLLM inference engine. It handles the complete lifecycle of containerized model instances, including:
- Container creation with proper device access and mounts
- Device-specific configuration via sandbox abstraction
- Instance state tracking and monitoring
- Model serving with vLLM backend
The runtime uses device-specific sandboxes to handle chip-specific configurations (Ascend NPU, etc.) and embeds DockerRuntimeBase for common Docker operations.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Runtime ¶
type Runtime struct {
*runtime.DockerRuntimeBase // Embedded base provides common Docker operations
}
Runtime implements the runtime.Runtime interface for vLLM with Docker.
This runtime manages vLLM model instances running in Docker containers. Each instance is an isolated container with access to specified hardware devices.
Architecture:
- Embeds DockerRuntimeBase for common Docker operations
- Uses DeviceSandbox abstraction for device-specific configuration
- Implements Create() for vLLM-specific container setup
Thread Safety:
All public methods are thread-safe via inherited mutex protection.
func NewRuntime ¶
NewRuntime creates a new vLLM Docker runtime instance.
This function:
- Initializes Docker base with "vllm-docker" runtime name
- Registers core sandbox implementations
- Verifies Docker daemon connectivity
- Loads any existing containers from previous runs
Returns:
- Configured runtime instance ready for use
- Error if Docker is unavailable or initialization fails
func (*Runtime) Create ¶
func (r *Runtime) Create(ctx context.Context, params *runtime.CreateParams) (*runtime.Instance, error)
Create creates a new model instance but does not start it.
This method implements vLLM-specific container creation:
- Validates parameters and checks for duplicate instance IDs
- Selects appropriate device sandbox based on device type
- Prepares device-specific configuration (env, mounts, devices)
- Configures vLLM command with model path and serving options
- Creates Docker container with all required settings
- Registers instance in runtime's instance map
The created container is in "created" state and must be started separately via the Start method (inherited from DockerRuntimeBase).
Container Configuration:
- Image: Device-specific vLLM image or custom from params.ExtraConfig["image"]
- Command: vLLM serve with model path and instance alias
- Network: Bridge mode with port mapping (container:8000 -> host:params.Port)
- Restart: unless-stopped for automatic recovery
- Init: Enabled for proper signal handling
Labels:
Containers are labeled with metadata for discovery and filtering: - xw.runtime: Runtime type (vllm-docker) - xw.model_id: Model identifier - xw.alias: Instance alias for inference - xw.instance_id: Unique instance identifier - xw.backend_type: Backend type (vllm) - xw.deployment_mode: Deployment mode (docker) - xw.device_indices: Comma-separated device indices - xw.server_name: Server identifier for multi-server support
Parameters:
- ctx: Context for cancellation and timeout
- params: Standard creation parameters including model info and devices
Returns:
- Instance metadata with container information
- Error if creation fails at any step