mlguiderdocker

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Package mlguiderdocker implements MLGuider runtime with Docker deployment.

Overview

MLGuider is a high-performance inference engine optimized for large language models on Huawei Ascend NPUs. This package provides Docker-based deployment with support for multi-device tensor parallelism, expert parallelism (for MoE models), and OpenAI-compatible API endpoints.

Configuration-Driven Architecture

This runtime uses configuration-driven device support via configs/devices.yaml. All device-specific behavior is defined in YAML configuration.

Example configuration in devices.yaml:

chip_models:
  - config_key: ascend-310p
    ext_sandboxes:
      # Common configuration (shared by all engines)
      devices:
        - /dev/davinci0
        - /dev/davinci1
        - /dev/davinci_manager
        - /dev/devmm_svm
        - /dev/hisi_hdc
      volumes:
        - /usr/local/Ascend/driver:/usr/local/Ascend/driver
        - /root/.cache:/root/.cache
      runtime: runc
      # Engine-specific configurations
      mlguider:
        device_env: DEVICES
        privileged: true
        shm_size_gb: 100

Implementing Code-Based Sandboxes

For complex scenarios requiring runtime logic, you can implement a code-based DeviceSandbox. See package vllmdocker documentation for detailed guide:

import "github.com/tsingmaoai/xw-cli/internal/runtime/vllm-docker"
// See vllmdocker package doc for complete implementation examples

MLGuider-Specific Features

MLGuider has unique characteristics:

  1. Model Path Management: - ORIGIN_MODEL_PATH: Original model files (HuggingFace format) - MODEL_PATH: Converted/optimized model files (MLGuider format) - Supports automatic model conversion on first run

  2. Parallelism Support: - TENSOR_PARALLEL: Tensor parallelism across devices - EXPERT_PARALLEL: Expert parallelism for MoE models - WORLD_SIZE: Must equal TENSOR_PARALLEL * EXPERT_PARALLEL

  3. Device Specification: Uses comma-separated format for DEVICES environment variable: DEVICES='0,1,2,3'

  4. Networking: Uses bridge networking with port mapping for multi-instance support

Dual Model Directory Support

MLGuider supports two model storage patterns:

  1. Single Directory (Original Models Only): - Mount original model to /mnt/model - MLGuider converts on first run - Converted models stored in same directory

  2. Separate Directories (Original + Converted): - Mount original model to /mnt/model (ORIGIN_MODEL_PATH) - Mount data directory to /data (MODEL_PATH for converted models) - Faster startup when using pre-converted models

Example with separate directories:

params := &runtime.CreateParams{
    ModelPath: "/models/llama-3-70b",        // Original model
    DataDir: "/data/converted/llama-3-70b", // Converted model cache
    // ...
}

The converted models in DataDir can be reused across container restarts, avoiding repeated conversion overhead.

Configuration vs Code

This package previously used code-based AscendSandbox. Migration to configuration-driven approach provides:

  • Easier updates when Ascend driver paths change
  • Support for new 310P variants without code changes
  • User customization of log directories and mount paths
  • Faster deployment cycles without binary recompilation

See configs/devices.yaml for current device configurations.

Package mlguiderdocker implements MLGuider runtime with Docker deployment.

MLGuider is a high-performance inference engine optimized for large language models on Huawei Ascend NPUs. This package provides Docker-based deployment with support for:

  • Multi-device tensor parallelism and expert parallelism
  • Dynamic model path management (original and converted models)
  • Automatic device allocation and environment configuration
  • OpenAI-compatible API endpoints

Architecture:

  • Uses bridge networking with port mapping for multi-instance support
  • Requires Ascend driver mounted from host system
  • Supports multi-NPU distributed inference via WORLD_SIZE and device arrays
  • Exposes port 8000 for OpenAI-compatible inference API

Container Requirements:

  • Ascend driver mounted at /usr/local/Ascend/driver
  • Model directory mounted (supports both original and converted models)
  • Privileged mode for hardware access

Environment Variables:

  • ORIGIN_MODEL_PATH: Path to original model files
  • MODEL_PATH: Path to converted/optimized model files
  • MAX_MODEL_LEN: Maximum sequence length for inference
  • TENSOR_PARALLEL: Number of devices for tensor parallelism
  • EXPERT_PARALLEL: Number of devices for expert parallelism (MoE models)
  • WORLD_SIZE: Total number of NPU devices (should equal TENSOR_PARALLEL * EXPERT_PARALLEL)
  • DEVICES: Comma-separated device indices (e.g., "0,1,2,3")
  • API_PORT: HTTP server port (default: 8000)
  • MODEL_NAME: Model name for API requests

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Runtime

type Runtime struct {
	*runtime.DockerRuntimeBase
}

Runtime implements the Runtime interface for MLGuider with Docker deployment.

MLGuider Runtime Architecture:

  • Embeds DockerRuntimeBase for common Docker operations (start, stop, logs, etc.)
  • Implements Create() to configure MLGuider-specific containers
  • Uses AscendSandbox for device-specific parameter transformation
  • Supports multi-device distributed inference with automatic configuration

Container Lifecycle:

  1. Create: Configures container with MLGuider-specific environment and mounts
  2. Start: Launches the container (via base.Start)
  3. Running: Container exposes OpenAI-compatible API on port 8000
  4. Stop: Gracefully stops the container (via base.Stop)
  5. Remove: Cleans up container and releases resources (via base.Remove)

Thread Safety: All operations are thread-safe through DockerRuntimeBase mutex

func NewRuntime

func NewRuntime() (*Runtime, error)

NewRuntime creates and initializes a new MLGuider Docker runtime.

Initialization Steps:

  1. Creates Docker runtime base with "mlguider-docker" identifier
  2. Verifies Docker daemon connectivity
  3. Loads any existing MLGuider containers from previous sessions
  4. Validates Ascend driver availability (warning only, not fatal)

The runtime is immediately ready to create and manage MLGuider instances. Existing containers are automatically tracked and restored to the instance map.

Returns:

  • Configured Runtime instance ready for use
  • Error if Docker daemon is unreachable or initialization fails

Example:

rt, err := NewRuntime()
if err != nil {
    return nil, fmt.Errorf("failed to initialize MLGuider runtime: %w", err)
}

func (*Runtime) Create

func (r *Runtime) Create(ctx context.Context, params *runtime.CreateParams) (*runtime.Instance, error)

Create creates a new MLGuider model instance with Docker deployment.

This method orchestrates the complete container creation process:

  1. Validates parameters (instance ID, devices, model path)
  2. Selects appropriate device sandbox based on hardware type
  3. Prepares device-specific environment variables (DEVICES, WORLD_SIZE, etc.)
  4. Configures MLGuider-specific environment (MODEL_PATH, TENSOR_PARALLEL, etc.)
  5. Sets up volume mounts for drivers, models, and system binaries
  6. Creates Docker container with bridge networking and privileged mode
  7. Registers instance in tracking map

MLGuider-Specific Configuration:

  • Uses bridge network mode with port mapping for multi-instance support
  • Mounts Ascend driver from /usr/local/Ascend/driver
  • Configures multi-device parallelism based on device count
  • Supports both original and converted model paths
  • Sets up OpenAI-compatible API on port 8000

Parallelism Strategy:

  • Single device: No parallelism (TENSOR_PARALLEL=1)
  • Multiple devices: Tensor parallelism across all devices
  • MoE models: Can be configured with EXPERT_PARALLEL via ExtraConfig

Container Labels:

  • xw.runtime: mlguider-docker
  • xw.instance_id: Unique instance identifier
  • xw.model_id: Model identifier from registry
  • xw.alias: Instance alias for API requests
  • xw.backend_type: mlguider
  • xw.deployment_mode: docker
  • xw.device_indices: Comma-separated device indices
  • xw.server_name: Server identifier for multi-server setups

Parameters:

  • ctx: Context for cancellation and timeout control
  • params: Standard creation parameters including model info, devices, and config

Returns:

  • Instance metadata with container information and port assignment
  • Error if creation fails at any step

Example ExtraConfig:

{
  "image": "harbor.tsingmao.com/mlguider/release:0123-xw",
  "max_model_len": 66000,
  "tensor_parallel": 2,
  "expert_parallel": 1
}

func (*Runtime) Name

func (r *Runtime) Name() string

Name returns the runtime name identifier.

This name is used for:

  • Runtime registration and discovery
  • Container labeling and filtering
  • Logging and monitoring

Returns: "mlguider:docker"

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL