Documentation
¶
Overview ¶
Package mindiedocker implements MindIE runtime with Docker deployment.
Overview ¶
This package provides a Docker-based runtime for running MindIE inference engine on Huawei Ascend NPUs. MindIE is optimized for large language models with support for multi-device tensor parallelism and pipeline parallelism.
Configuration-Driven Architecture ¶
This runtime uses configuration-driven device support via configs/devices.yaml. All device-specific behavior (environment variables, device mounts, volumes) is defined in YAML configuration rather than code.
Example configuration in devices.yaml:
chip_models:
- config_key: ascend-910b
ext_sandboxes:
# Common configuration (shared by all engines)
devices:
- /dev/davinci0
- /dev/davinci1
- /dev/davinci_manager
- /dev/devmm_svm
- /dev/hisi_hdc
volumes:
- /usr/local/Ascend/driver:/usr/local/Ascend/driver
- /root/.cache:/root/.cache
runtime: runc
# Engine-specific configurations
mindie:
device_env: MINDIE_NPU_DEVICE_IDS
privileged: true
shm_size_gb: 100
Implementing Code-Based Sandboxes ¶
For complex scenarios that cannot be expressed in configuration, you can implement a code-based DeviceSandbox. See package vllmdocker documentation for detailed instructions and examples:
import "github.com/tsingmaoai/xw-cli/internal/runtime/vllm-docker" // See vllmdocker package doc for implementation guide
Key scenarios requiring code-based sandboxes:
- Dynamic device path detection at runtime
- Complex conditional logic based on driver versions
- Integration with vendor-specific device management APIs
- Custom device allocation algorithms
MindIE-Specific Configuration ¶
MindIE has unique requirements compared to other runtimes:
Device Visibility: Uses MINDIE_NPU_DEVICE_IDS instead of ASCEND_RT_VISIBLE_DEVICES
Log Directories: Requires extensive log directory mounts for profiling and debugging: - /var/log/npu/slog: System logs from CANN runtime - /var/log/npu/profiling: Performance profiling data - /var/log/npu/dump: Core dumps and diagnostics
Multi-Device Support: WORLD_SIZE environment variable must match total device count (TENSOR_PARALLEL * PIPELINE_PARALLEL)
Migration from Code to Configuration ¶
This package previously used code-based AscendSandbox. It has been migrated to configuration-driven approach to allow updates without recompiling.
Benefits of configuration-driven approach:
- Add new Ascend chip variants without code changes
- Adjust log paths when driver updates change locations
- Users can customize behavior for their environment
- Faster iteration on device support
See configs/devices.yaml for current device configurations.
Package mindiedocker implements MindIE runtime with Docker deployment.
This package provides a Docker-based runtime for running MindIE inference engine. It handles the complete lifecycle of containerized model instances, including:
- Container creation with proper device access and extensive mounts
- Device-specific configuration via sandbox abstraction
- Multi-device distributed inference support
- Model serving with MindIE backend
MindIE Features:
- Optimized for Ascend NPU distributed inference
- Large shared memory support for multi-device communication
- Extensive logging and profiling capabilities
- Compatible with Huawei CANN ecosystem
The runtime uses device-specific sandboxes to handle chip-specific configurations (Ascend NPU, etc.) and embeds DockerRuntimeBase for common Docker operations.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type MindIESandbox ¶
type MindIESandbox interface {
runtime.DeviceSandbox
// Supports checks if this sandbox supports the given device type.
//
// Parameters:
// - deviceType: Device type string (e.g., "ascend-910b", "ascend-310p")
//
// Returns:
// - true if this sandbox supports the device type
Supports(deviceType string) bool
}
MindIESandbox extends the base DeviceSandbox interface with MindIE-specific methods.
This interface adds MindIE-specific functionality on top of the standard DeviceSandbox interface.
Note: Sandboxes can optionally implement GetSharedMemorySize() int64 method. If not implemented, a default 16GB shared memory will be used.
type Runtime ¶
type Runtime struct {
*runtime.DockerRuntimeBase // Embedded base provides common Docker operations
}
Runtime implements the runtime.Runtime interface for MindIE with Docker.
This runtime manages MindIE model instances running in Docker containers. Each instance is an isolated container with access to specified hardware devices and configured for distributed inference workloads.
Architecture:
- Embeds DockerRuntimeBase for common Docker operations
- Uses DeviceSandbox abstraction for device-specific configuration
- Implements Create() for MindIE-specific container setup
- Supports large shared memory for multi-device inference
Thread Safety:
All public methods are thread-safe via inherited mutex protection.
func NewRuntime ¶
NewRuntime creates a new MindIE Docker runtime instance.
This function:
- Initializes Docker base with "mindie-docker" runtime name
- Registers core sandbox implementations
- Verifies Docker daemon connectivity
- Loads any existing containers from previous runs
Returns:
- Configured runtime instance ready for use
- Error if Docker is unavailable or initialization fails
func (*Runtime) Create ¶
func (r *Runtime) Create(ctx context.Context, params *runtime.CreateParams) (*runtime.Instance, error)
Create creates a new model instance but does not start it.
This method implements MindIE-specific container creation:
- Validates parameters and checks for duplicate instance IDs
- Selects appropriate device sandbox based on device type
- Prepares device-specific configuration (env, mounts, devices)
- Configures MindIE environment (WORLD_SIZE, device IDs, model lengths)
- Sets large shared memory for distributed inference
- Creates Docker container with all required settings
- Registers instance in runtime's instance map
The created container is in "created" state and must be started separately via the Start method (inherited from DockerRuntimeBase).
MindIE-Specific Configuration:
- Port: Container internal port 1025 mapped to host port
- Shared Memory: 500GB for multi-device communication
- Environment Variables:
- MINDIE_NPU_DEVICE_IDS: Comma-separated device indices (e.g., "0,1,2,3")
- WORLD_SIZE: Number of devices for distributed inference
- MODEL_PATH: Container path to model files (/mnt/model)
- MODEL_NAME: Model name for inference requests (alias or model ID)
- MAX_MODEL_LEN: Maximum sequence length (optional, from ExtraConfig)
- MAX_INPUT_LEN: Maximum input length (optional, from ExtraConfig)
- Mounts: Extensive log directories for profiling and debugging
Container Configuration:
- Image: MindIE image with Ascend support or custom from params.ExtraConfig["image"]
- Command: Custom command from params.ExtraConfig["command"] or default entrypoint
- Network: Bridge mode with port mapping (container:1025 -> host:params.Port)
- Restart: unless-stopped for automatic recovery
- Init: Enabled for proper signal handling
- ShmSize: 500GB for distributed inference
Labels:
Containers are labeled with metadata for discovery and filtering: - xw.runtime: Runtime type (mindie-docker) - xw.model_id: Model identifier - xw.alias: Instance alias for inference - xw.instance_id: Unique instance identifier - xw.backend_type: Backend type (mindie) - xw.deployment_mode: Deployment mode (docker) - xw.device_indices: Comma-separated device indices - xw.server_name: Server identifier for multi-server support
Parameters:
- ctx: Context for cancellation and timeout
- params: Standard creation parameters including model info and devices
Returns:
- Instance metadata with container information
- Error if creation fails at any step