mlguiderdocker

package

v0.0.1 Latest Latest Go to latest Published: Feb 12, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tsingmaoai/xw-cli

Links

Documentation ¶

Overview ¶

Package mlguiderdocker implements MLGuider runtime with Docker deployment.

Overview ¶

MLGuider is a high-performance inference engine optimized for large language models on Huawei Ascend NPUs. This package provides Docker-based deployment with support for multi-device tensor parallelism, expert parallelism (for MoE models), and OpenAI-compatible API endpoints.

Configuration-Driven Architecture ¶

This runtime uses configuration-driven device support via configs/devices.yaml. All device-specific behavior is defined in YAML configuration.

Example configuration in devices.yaml:

chip_models:
  - config_key: ascend-310p
    ext_sandboxes:
      # Common configuration (shared by all engines)
      devices:
        - /dev/davinci0
        - /dev/davinci1
        - /dev/davinci_manager
        - /dev/devmm_svm
        - /dev/hisi_hdc
      volumes:
        - /usr/local/Ascend/driver:/usr/local/Ascend/driver
        - /root/.cache:/root/.cache
      runtime: runc
      # Engine-specific configurations
      mlguider:
        device_env: DEVICES
        privileged: true
        shm_size_gb: 100

Implementing Code-Based Sandboxes ¶

For complex scenarios requiring runtime logic, you can implement a code-based DeviceSandbox. See package vllmdocker documentation for detailed guide:

import "github.com/tsingmaoai/xw-cli/internal/runtime/vllm-docker"
// See vllmdocker package doc for complete implementation examples

MLGuider-Specific Features ¶

MLGuider has unique characteristics:

Model Path Management: - ORIGIN_MODEL_PATH: Original model files (HuggingFace format) - MODEL_PATH: Converted/optimized model files (MLGuider format) - Supports automatic model conversion on first run
Parallelism Support: - TENSOR_PARALLEL: Tensor parallelism across devices - EXPERT_PARALLEL: Expert parallelism for MoE models - WORLD_SIZE: Must equal TENSOR_PARALLEL * EXPERT_PARALLEL
Device Specification: Uses comma-separated format for DEVICES environment variable: DEVICES='0,1,2,3'
Networking: Uses bridge networking with port mapping for multi-instance support

Dual Model Directory Support ¶

MLGuider supports two model storage patterns:

Single Directory (Original Models Only): - Mount original model to /mnt/model - MLGuider converts on first run - Converted models stored in same directory
Separate Directories (Original + Converted): - Mount original model to /mnt/model (ORIGIN_MODEL_PATH) - Mount data directory to /data (MODEL_PATH for converted models) - Faster startup when using pre-converted models

Example with separate directories:

params := &runtime.CreateParams{
    ModelPath: "/models/llama-3-70b",        // Original model
    DataDir: "/data/converted/llama-3-70b", // Converted model cache
    // ...
}

The converted models in DataDir can be reused across container restarts, avoiding repeated conversion overhead.

Configuration vs Code ¶

This package previously used code-based AscendSandbox. Migration to configuration-driven approach provides:

Easier updates when Ascend driver paths change
Support for new 310P variants without code changes
User customization of log directories and mount paths
Faster deployment cycles without binary recompilation

See configs/devices.yaml for current device configurations.

Package mlguiderdocker implements MLGuider runtime with Docker deployment.

MLGuider is a high-performance inference engine optimized for large language models on Huawei Ascend NPUs. This package provides Docker-based deployment with support for:

Multi-device tensor parallelism and expert parallelism
Dynamic model path management (original and converted models)
Automatic device allocation and environment configuration
OpenAI-compatible API endpoints

Architecture:

Uses bridge networking with port mapping for multi-instance support
Requires Ascend driver mounted from host system
Supports multi-NPU distributed inference via WORLD_SIZE and device arrays
Exposes port 8000 for OpenAI-compatible inference API

Container Requirements:

Ascend driver mounted at /usr/local/Ascend/driver
Model directory mounted (supports both original and converted models)
Privileged mode for hardware access

Environment Variables:

ORIGIN_MODEL_PATH: Path to original model files
MODEL_PATH: Path to converted/optimized model files
MAX_MODEL_LEN: Maximum sequence length for inference
TENSOR_PARALLEL: Number of devices for tensor parallelism
EXPERT_PARALLEL: Number of devices for expert parallelism (MoE models)
WORLD_SIZE: Total number of NPU devices (should equal TENSOR_PARALLEL * EXPERT_PARALLEL)
DEVICES: Comma-separated device indices (e.g., "0,1,2,3")
API_PORT: HTTP server port (default: 8000)
MODEL_NAME: Model name for API requests

Index ¶

type Runtime
- func NewRuntime() (*Runtime, error)
- func (r *Runtime) Create(ctx context.Context, params *runtime.CreateParams) (*runtime.Instance, error)
- func (r *Runtime) Name() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Runtime ¶

type Runtime struct {
	*runtime.DockerRuntimeBase
}

Runtime implements the Runtime interface for MLGuider with Docker deployment.

MLGuider Runtime Architecture:

Embeds DockerRuntimeBase for common Docker operations (start, stop, logs, etc.)
Implements Create() to configure MLGuider-specific containers
Uses AscendSandbox for device-specific parameter transformation
Supports multi-device distributed inference with automatic configuration

Container Lifecycle:

Create: Configures container with MLGuider-specific environment and mounts
Start: Launches the container (via base.Start)
Running: Container exposes OpenAI-compatible API on port 8000
Stop: Gracefully stops the container (via base.Stop)
Remove: Cleans up container and releases resources (via base.Remove)

Thread Safety: All operations are thread-safe through DockerRuntimeBase mutex

func NewRuntime ¶

func NewRuntime() (*Runtime, error)

NewRuntime creates and initializes a new MLGuider Docker runtime.

Initialization Steps:

Creates Docker runtime base with "mlguider-docker" identifier
Verifies Docker daemon connectivity
Loads any existing MLGuider containers from previous sessions
Validates Ascend driver availability (warning only, not fatal)

The runtime is immediately ready to create and manage MLGuider instances. Existing containers are automatically tracked and restored to the instance map.

Returns:

Configured Runtime instance ready for use
Error if Docker daemon is unreachable or initialization fails

Example:

rt, err := NewRuntime()
if err != nil {
    return nil, fmt.Errorf("failed to initialize MLGuider runtime: %w", err)
}

func (*Runtime) Create ¶

func (r *Runtime) Create(ctx context.Context, params *runtime.CreateParams) (*runtime.Instance, error)

Create creates a new MLGuider model instance with Docker deployment.

This method orchestrates the complete container creation process:

Validates parameters (instance ID, devices, model path)
Selects appropriate device sandbox based on hardware type
Prepares device-specific environment variables (DEVICES, WORLD_SIZE, etc.)
Configures MLGuider-specific environment (MODEL_PATH, TENSOR_PARALLEL, etc.)
Sets up volume mounts for drivers, models, and system binaries
Creates Docker container with bridge networking and privileged mode
Registers instance in tracking map

MLGuider-Specific Configuration:

Uses bridge network mode with port mapping for multi-instance support
Mounts Ascend driver from /usr/local/Ascend/driver
Configures multi-device parallelism based on device count
Supports both original and converted model paths
Sets up OpenAI-compatible API on port 8000

Parallelism Strategy:

Single device: No parallelism (TENSOR_PARALLEL=1)
Multiple devices: Tensor parallelism across all devices
MoE models: Can be configured with EXPERT_PARALLEL via ExtraConfig

Container Labels:

xw.runtime: mlguider-docker
xw.instance_id: Unique instance identifier
xw.model_id: Model identifier from registry
xw.alias: Instance alias for API requests
xw.backend_type: mlguider
xw.deployment_mode: docker
xw.device_indices: Comma-separated device indices
xw.server_name: Server identifier for multi-server setups

Parameters:

ctx: Context for cancellation and timeout control
params: Standard creation parameters including model info, devices, and config

Returns:

Instance metadata with container information and port assignment
Error if creation fails at any step

Example ExtraConfig:

{
  "image": "harbor.tsingmao.com/mlguider/release:0123-xw",
  "max_model_len": 66000,
  "tensor_parallel": 2,
  "expert_parallel": 1
}

func (*Runtime) Name ¶

func (r *Runtime) Name() string

Name returns the runtime name identifier.

This name is used for:

Runtime registration and discovery
Container labeling and filtering
Logging and monitoring

Returns: "mlguider:docker"

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL