Documentation
¶
Overview ¶
Package mlguiderdocker implements MLGuider runtime with Docker deployment.
Overview ¶
MLGuider is a high-performance inference engine optimized for large language models on Huawei Ascend NPUs. This package provides Docker-based deployment with support for multi-device tensor parallelism, expert parallelism (for MoE models), and OpenAI-compatible API endpoints.
Configuration-Driven Architecture ¶
This runtime uses configuration-driven device support via configs/devices.yaml. All device-specific behavior is defined in YAML configuration.
Example configuration in devices.yaml:
chip_models:
- config_key: ascend-310p
ext_sandboxes:
# Common configuration (shared by all engines)
devices:
- /dev/davinci0
- /dev/davinci1
- /dev/davinci_manager
- /dev/devmm_svm
- /dev/hisi_hdc
volumes:
- /usr/local/Ascend/driver:/usr/local/Ascend/driver
- /root/.cache:/root/.cache
runtime: runc
# Engine-specific configurations
mlguider:
device_env: DEVICES
privileged: true
shm_size_gb: 100
Implementing Code-Based Sandboxes ¶
For complex scenarios requiring runtime logic, you can implement a code-based DeviceSandbox. See package vllmdocker documentation for detailed guide:
import "github.com/tsingmaoai/xw-cli/internal/runtime/vllm-docker" // See vllmdocker package doc for complete implementation examples
MLGuider-Specific Features ¶
MLGuider has unique characteristics:
Model Path Management: - ORIGIN_MODEL_PATH: Original model files (HuggingFace format) - MODEL_PATH: Converted/optimized model files (MLGuider format) - Supports automatic model conversion on first run
Parallelism Support: - TENSOR_PARALLEL: Tensor parallelism across devices - EXPERT_PARALLEL: Expert parallelism for MoE models - WORLD_SIZE: Must equal TENSOR_PARALLEL * EXPERT_PARALLEL
Device Specification: Uses comma-separated format for DEVICES environment variable: DEVICES='0,1,2,3'
Networking: Uses bridge networking with port mapping for multi-instance support
Dual Model Directory Support ¶
MLGuider supports two model storage patterns:
Single Directory (Original Models Only): - Mount original model to /mnt/model - MLGuider converts on first run - Converted models stored in same directory
Separate Directories (Original + Converted): - Mount original model to /mnt/model (ORIGIN_MODEL_PATH) - Mount data directory to /data (MODEL_PATH for converted models) - Faster startup when using pre-converted models
Example with separate directories:
params := &runtime.CreateParams{
ModelPath: "/models/llama-3-70b", // Original model
DataDir: "/data/converted/llama-3-70b", // Converted model cache
// ...
}
The converted models in DataDir can be reused across container restarts, avoiding repeated conversion overhead.
Configuration vs Code ¶
This package previously used code-based AscendSandbox. Migration to configuration-driven approach provides:
- Easier updates when Ascend driver paths change
- Support for new 310P variants without code changes
- User customization of log directories and mount paths
- Faster deployment cycles without binary recompilation
See configs/devices.yaml for current device configurations.
Package mlguiderdocker implements MLGuider runtime with Docker deployment.
MLGuider is a high-performance inference engine optimized for large language models on Huawei Ascend NPUs. This package provides Docker-based deployment with support for:
- Multi-device tensor parallelism and expert parallelism
- Dynamic model path management (original and converted models)
- Automatic device allocation and environment configuration
- OpenAI-compatible API endpoints
Architecture:
- Uses bridge networking with port mapping for multi-instance support
- Requires Ascend driver mounted from host system
- Supports multi-NPU distributed inference via WORLD_SIZE and device arrays
- Exposes port 8000 for OpenAI-compatible inference API
Container Requirements:
- Ascend driver mounted at /usr/local/Ascend/driver
- Model directory mounted (supports both original and converted models)
- Privileged mode for hardware access
Environment Variables:
- ORIGIN_MODEL_PATH: Path to original model files
- MODEL_PATH: Path to converted/optimized model files
- MAX_MODEL_LEN: Maximum sequence length for inference
- TENSOR_PARALLEL: Number of devices for tensor parallelism
- EXPERT_PARALLEL: Number of devices for expert parallelism (MoE models)
- WORLD_SIZE: Total number of NPU devices (should equal TENSOR_PARALLEL * EXPERT_PARALLEL)
- DEVICES: Comma-separated device indices (e.g., "0,1,2,3")
- API_PORT: HTTP server port (default: 8000)
- MODEL_NAME: Model name for API requests
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Runtime ¶
type Runtime struct {
*runtime.DockerRuntimeBase
}
Runtime implements the Runtime interface for MLGuider with Docker deployment.
MLGuider Runtime Architecture:
- Embeds DockerRuntimeBase for common Docker operations (start, stop, logs, etc.)
- Implements Create() to configure MLGuider-specific containers
- Uses AscendSandbox for device-specific parameter transformation
- Supports multi-device distributed inference with automatic configuration
Container Lifecycle:
- Create: Configures container with MLGuider-specific environment and mounts
- Start: Launches the container (via base.Start)
- Running: Container exposes OpenAI-compatible API on port 8000
- Stop: Gracefully stops the container (via base.Stop)
- Remove: Cleans up container and releases resources (via base.Remove)
Thread Safety: All operations are thread-safe through DockerRuntimeBase mutex
func NewRuntime ¶
NewRuntime creates and initializes a new MLGuider Docker runtime.
Initialization Steps:
- Creates Docker runtime base with "mlguider-docker" identifier
- Verifies Docker daemon connectivity
- Loads any existing MLGuider containers from previous sessions
- Validates Ascend driver availability (warning only, not fatal)
The runtime is immediately ready to create and manage MLGuider instances. Existing containers are automatically tracked and restored to the instance map.
Returns:
- Configured Runtime instance ready for use
- Error if Docker daemon is unreachable or initialization fails
Example:
rt, err := NewRuntime()
if err != nil {
return nil, fmt.Errorf("failed to initialize MLGuider runtime: %w", err)
}
func (*Runtime) Create ¶
func (r *Runtime) Create(ctx context.Context, params *runtime.CreateParams) (*runtime.Instance, error)
Create creates a new MLGuider model instance with Docker deployment.
This method orchestrates the complete container creation process:
- Validates parameters (instance ID, devices, model path)
- Selects appropriate device sandbox based on hardware type
- Prepares device-specific environment variables (DEVICES, WORLD_SIZE, etc.)
- Configures MLGuider-specific environment (MODEL_PATH, TENSOR_PARALLEL, etc.)
- Sets up volume mounts for drivers, models, and system binaries
- Creates Docker container with bridge networking and privileged mode
- Registers instance in tracking map
MLGuider-Specific Configuration:
- Uses bridge network mode with port mapping for multi-instance support
- Mounts Ascend driver from /usr/local/Ascend/driver
- Configures multi-device parallelism based on device count
- Supports both original and converted model paths
- Sets up OpenAI-compatible API on port 8000
Parallelism Strategy:
- Single device: No parallelism (TENSOR_PARALLEL=1)
- Multiple devices: Tensor parallelism across all devices
- MoE models: Can be configured with EXPERT_PARALLEL via ExtraConfig
Container Labels:
- xw.runtime: mlguider-docker
- xw.instance_id: Unique instance identifier
- xw.model_id: Model identifier from registry
- xw.alias: Instance alias for API requests
- xw.backend_type: mlguider
- xw.deployment_mode: docker
- xw.device_indices: Comma-separated device indices
- xw.server_name: Server identifier for multi-server setups
Parameters:
- ctx: Context for cancellation and timeout control
- params: Standard creation parameters including model info, devices, and config
Returns:
- Instance metadata with container information and port assignment
- Error if creation fails at any step
Example ExtraConfig:
{
"image": "harbor.tsingmao.com/mlguider/release:0123-xw",
"max_model_len": 66000,
"tensor_parallel": 2,
"expert_parallel": 1
}