Documentation
¶
Overview ¶
Package device provides domestic chip device detection and management.
Package device - config_loader.go provides configuration-based device loading.
This module loads device and chip information from configuration files, making them available to the device detection and management system.
Package device provides domestic chip device detection and management.
This package handles detection and management of Chinese-made chip devices including:
- Hardware detection and capability querying
- Device availability checking
- Device metadata and properties
- Thread-safe access to device information
The package currently supports Huawei Ascend NPU chips (910B, 310P). In production deployments, this package integrates with vendor-specific drivers and libraries for actual hardware detection.
Index ¶
- Variables
- func FindAIChips() (map[string][]DetectedChip, error)
- type Allocator
- type ChipModel
- type ChipVendor
- type DetectedChip
- type Device
- type DeviceInfo
- type DeviceTopology
- type Manager
- func (m *Manager) GetDetectedDeviceTypes() []api.DeviceType
- func (m *Manager) GetDevice(deviceType api.DeviceType) (*Device, error)
- func (m *Manager) GetSupportedTypes() []api.DeviceType
- func (m *Manager) IsAvailable(deviceType api.DeviceType) bool
- func (m *Manager) ListAvailable() []*Device
- func (m *Manager) ListDetectedChips() ([]DetectedChip, error)
- type PCIDevice
Constants ¶
This section is empty.
Variables ¶
var KnownChips = LoadChipsFromConfig()
KnownChips loads and caches chip models from configuration
var KnownVendors = LoadVendorsFromConfig()
KnownVendors loads and caches vendors from configuration
Functions ¶
func FindAIChips ¶
func FindAIChips() (map[string][]DetectedChip, error)
FindAIChips scans for known AI chips on the system
This function combines PCI device scanning with the chip configuration to identify AI accelerators present in the system.
For multi-chip cards (where chips_per_device > 1 in config), each physical PCI device is expanded into multiple logical chips with consecutive indices. For example, if 4 dual-chip cards are detected, this returns 8 logical chips with indices 0-7, allowing the allocator to treat them as independent devices.
Returns:
- Map of device type to slice of detected chips (logical chips with consecutive indices)
- Error if scanning fails
Types ¶
type Allocator ¶
type Allocator struct {
// contains filtered or unexported fields
}
Allocator manages the allocation and release of physical devices. It scans for available AI accelerators and dynamically tracks which devices are allocated by querying running Docker containers.
The allocator supports topology-aware allocation to optimize device placement for high-speed interconnected devices (e.g., NVLink, HCCS).
func NewAllocator ¶
NewAllocator creates and initializes a new DeviceAllocator.
The allocator scans the system for AI accelerators and dynamically tracks device allocation by querying running Docker containers.
Returns:
- Configured allocator
- Error if device scanning or Docker client creation fails
func (*Allocator) Allocate ¶
func (a *Allocator) Allocate(instanceID string, count int) ([]DeviceInfo, error)
Allocate attempts to allocate 'count' devices for a given instance.
This method selects free devices by checking current Docker container allocations. The device allocation is tracked in the container labels, not in a separate state file.
Parameters:
- instanceID: Unique identifier for the instance
- count: Number of devices to allocate
Returns:
- Slice of allocated DeviceInfo
- Error if insufficient devices are available
func (*Allocator) GetAllDevices ¶
func (a *Allocator) GetAllDevices() []DeviceInfo
GetAllDevices returns information about all detected devices.
Returns:
- Slice of all DeviceInfo (both allocated and free)
func (*Allocator) GetAllocations ¶
func (a *Allocator) GetAllocations() map[string][]DeviceInfo
GetAllocations returns a map of current device allocations from Docker containers.
Returns:
- Map from instanceID to slice of allocated devices
func (*Allocator) GetAvailableDevices ¶
func (a *Allocator) GetAvailableDevices() []DeviceInfo
GetAvailableDevices returns information about all available (free) devices.
Returns:
- Slice of DeviceInfo for devices that are currently unallocated
func (*Allocator) Release ¶
Release frees devices previously allocated to an instance.
Since devices are tracked via Docker containers, this method only logs the release. The actual device freeing happens when the container is stopped/removed.
Parameters:
- instanceID: Unique identifier for the instance
Returns:
- Always returns nil (kept for API compatibility)
type ChipModel ¶
type ChipModel struct {
// VendorID is the PCI vendor ID
VendorID string
// DeviceID is the PCI device ID
DeviceID string
// ModelName is the human-readable model name
ModelName string
// ConfigKey is the key used in runtime configuration (e.g., "ascend-910b")
ConfigKey string
// DeviceType is the corresponding xw device type
DeviceType api.DeviceType
// Generation is the chip generation (optional)
Generation string
// Capabilities lists the chip's capabilities
Capabilities []string
}
ChipModel represents a specific chip model with its PCI device ID
func GetChipByID ¶
GetChipByID looks up a chip model by PCI vendor and device IDs.
This function searches the configured chip models for a match with the specified PCI identifiers. It's commonly used during device detection to identify discovered hardware.
Parameters:
- vendorID: PCIe vendor ID (e.g., "0x19e5")
- deviceID: PCIe device ID (e.g., "0xd802")
Returns:
- Pointer to ChipModel if found
- nil if not found
Example:
chip := GetChipByID("0x19e5", "0xd802")
if chip != nil {
fmt.Printf("Found: %s\n", chip.ModelName)
}
func GetChipsByDeviceType ¶
func GetChipsByDeviceType(deviceType api.DeviceType) []ChipModel
GetChipsByDeviceType returns all chip models for a specific device type.
This function filters chip models by device type, useful for showing all chips that support a particular device type.
Parameters:
- deviceType: The device type to filter by
Returns:
- Slice of ChipModel structs matching the device type
Example:
chips := GetChipsByDeviceType(api.DeviceType("ascend-910b"))
for _, chip := range chips {
fmt.Printf("- %s\n", chip.ModelName)
}
func GetChipsByVendor ¶
GetChipsByVendor returns all chip models for a specific vendor.
This function filters chip models by vendor ID, returning only those that match. Useful for displaying vendor-specific chip information.
Parameters:
- vendorID: PCIe vendor ID to filter by
Returns:
- Slice of ChipModel structs matching the vendor
Example:
chips := GetChipsByVendor("0x19e5")
fmt.Printf("Huawei has %d chip model(s)\n", len(chips))
func LoadChipsFromConfig ¶
func LoadChipsFromConfig() []ChipModel
LoadChipsFromConfig loads chip model information from device configuration.
This function reads the device configuration file and extracts chip model details, including PCI IDs, capabilities, and device types.
Returns:
- Slice of ChipModel structs
- Empty slice if configuration loading fails
Example:
chips := LoadChipsFromConfig()
for _, chip := range chips {
fmt.Printf("Chip: %s (%s:%s)\n", chip.ModelName, chip.VendorID, chip.DeviceID)
}
type ChipVendor ¶
type ChipVendor struct {
// VendorID is the PCI vendor ID (e.g., "0x19e5" for Huawei)
VendorID string
// VendorName is the human-readable vendor name
VendorName string
}
ChipVendor represents a chip vendor's PCI vendor ID and name
func LoadVendorsFromConfig ¶
func LoadVendorsFromConfig() []ChipVendor
LoadVendorsFromConfig loads vendor information from device configuration.
This function reads the device configuration file and extracts vendor information, making it available for device identification and display.
Returns:
- Slice of ChipVendor structs
- Empty slice if configuration loading fails
Example:
vendors := LoadVendorsFromConfig()
for _, vendor := range vendors {
fmt.Printf("Vendor: %s (%s)\n", vendor.VendorName, vendor.VendorID)
}
type DetectedChip ¶
type DetectedChip struct {
// VendorID is the PCI vendor ID
VendorID string `json:"vendor_id"`
// DeviceID is the PCI device ID
DeviceID string `json:"device_id"`
// BusAddress is the PCI bus address
BusAddress string `json:"bus_address"`
// ModelName is the chip model name
ModelName string `json:"model_name"`
// ConfigKey is the base model config key (e.g., "ascend-910b")
// Used for sandbox selection and image lookup
ConfigKey string `json:"config_key"`
// VariantKey is the specific variant key if matched (e.g., "ascend-910b1")
// Used for runtime_params matching, empty if no variant matched
VariantKey string `json:"variant_key,omitempty"`
// DeviceType is the xw device type (same as VariantKey if variant matched, otherwise ConfigKey)
DeviceType api.DeviceType `json:"device_type"`
// Generation is the chip generation
Generation string `json:"generation"`
// Capabilities lists the chip's capabilities
Capabilities []string `json:"capabilities"`
// PhysicalDeviceIndex is the index of the physical PCI device (0-based)
// Used to identify which physical card this chip belongs to
PhysicalDeviceIndex int `json:"physical_device_index"`
// ChipIndex is the chip index within a multi-chip card (0-based)
// For single-chip cards: 0
// For dual-chip cards: 0 or 1
ChipIndex int `json:"chip_index"`
// ChipsPerDevice indicates total chips on this physical device
ChipsPerDevice int `json:"chips_per_device"`
}
DetectedChip represents a detected AI chip with full information
type Device ¶
type Device struct {
// Type is the device type identifying the chip architecture.
// Example: DeviceTypeAscend
Type api.DeviceType `json:"type"`
// Name is the human-readable device name.
// Example: "Huawei Ascend 910B"
Name string `json:"name"`
// Available indicates if the device is currently available for use.
// A device may be unavailable if it's in use, has an error, or lacks drivers.
Available bool `json:"available"`
// Properties contains device-specific metadata and capabilities.
// Common keys: "vendor", "version", "memory", "cores"
// Values are stored as strings for flexibility.
Properties map[string]string `json:"properties"`
}
Device represents a detected domestic chip device with its metadata.
A Device instance contains information about a specific hardware device including its type, availability status, and vendor-specific properties. This information is used to determine model compatibility and optimize model execution.
type DeviceInfo ¶
type DeviceInfo struct {
Type string `json:"type"` // Device type (e.g., "ascend", "cuda")
Index int `json:"index"` // Device index (0-based)
BusAddress string `json:"bus_address"` // PCI bus address
ModelName string `json:"model_name"` // Device model name
ConfigKey string `json:"config_key"` // Base model config key (for sandbox, image lookup)
VariantKey string `json:"variant_key,omitempty"` // Specific variant key (for runtime_params)
Properties map[string]string `json:"properties"` // Additional properties
}
DeviceInfo represents information about a device for runtime use. This is a simplified version focused on runtime needs.
type DeviceTopology ¶
type DeviceTopology struct {
// contains filtered or unexported fields
}
DeviceTopology provides distance information between logical chips.
Topology enables distance-aware device allocation to minimize inter-chip communication latency by preferring chips with shorter distances.
func NewDeviceTopology ¶
func NewDeviceTopology(topologyConfig *config.TopologyConfig) *DeviceTopology
NewDeviceTopology creates a topology from configuration.
Parameters:
- topologyConfig: Topology configuration from devices.yaml
Returns:
- Initialized DeviceTopology or nil if no topology configured
func (*DeviceTopology) GetDistance ¶
func (dt *DeviceTopology) GetDistance(chipA, chipB int) int
GetDistance calculates the distance between two logical chips.
Distance rules:
- Same box: distance = 0 (high-speed interconnect)
- Different boxes: distance = |box_a - box_b|
- Unknown chip: distance = 999 (avoid allocation)
Parameters:
- chipA: Logical chip index A
- chipB: Logical chip index B
Returns:
- Distance value (0 = closest, higher = farther)
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager manages device detection and maintains device availability state.
The Manager provides thread-safe access to information about detected hardware devices. It performs initial device detection at creation and maintains a registry of available devices.
In production, the Manager would integrate with vendor-specific APIs and drivers to perform actual hardware probing and capability detection.
func NewManager ¶
func NewManager() *Manager
NewManager creates and initializes a new device manager.
The manager is created with an empty devices map and immediately performs device detection through detectDevices(). This identifies all available domestic chip devices on the system.
Device detection happens synchronously during initialization to ensure device information is available immediately after creation.
Returns:
- A pointer to a fully initialized Manager with detected devices.
Example:
manager := device.NewManager()
if manager.IsAvailable(ConfigKeyAscend910B) {
fmt.Println("Ascend 910B NPU is available")
}
func (*Manager) GetDetectedDeviceTypes ¶
func (m *Manager) GetDetectedDeviceTypes() []api.DeviceType
GetDetectedDeviceTypes returns the types of all detected devices.
This method returns only the device types that have been detected on the current system. It's used to filter models to show only those compatible with available hardware.
Returns:
- A slice of DeviceType values for detected devices. Returns an empty slice if no devices are detected.
Example:
detected := manager.GetDetectedDeviceTypes()
if len(detected) == 0 {
fmt.Println("No AI accelerators detected")
} else {
fmt.Printf("Detected: %v\n", detected)
}
func (*Manager) GetDevice ¶
func (m *Manager) GetDevice(deviceType api.DeviceType) (*Device, error)
GetDevice retrieves detailed information for a specific device type.
This method returns the full Device struct for a specified device type, including all properties and metadata. Unlike IsAvailable(), this method returns detailed information even if the device is unavailable, allowing callers to determine why a device isn't available.
The method is thread-safe and can be called concurrently.
Parameters:
- deviceType: The device type to retrieve information for
Returns:
- A pointer to the Device struct if the device type is detected
- An error if the device type was not detected on the system
Example:
device, err := manager.GetDevice(ConfigKeyAscend910B)
if err != nil {
log.Printf("Ascend device not found: %v", err)
return
}
fmt.Printf("Device version: %s\n", device.Properties["version"])
func (*Manager) GetSupportedTypes ¶
func (m *Manager) GetSupportedTypes() []api.DeviceType
GetSupportedTypes returns all device types supported by the application.
This method returns a complete list of all device types that the xw application is designed to work with, regardless of whether they are currently detected on the system. This is useful for:
- Displaying supported hardware to users
- Validation of configuration files
- Documentation and help text generation
The method reads device types from configuration. If configuration loading fails, it returns a fallback list of known device types.
Returns:
- A slice of all supported DeviceType values.
Example:
supported := manager.GetSupportedTypes()
fmt.Printf("This application supports %d device types\n", len(supported))
for _, dt := range supported {
fmt.Printf("- %s\n", dt)
}
func (*Manager) IsAvailable ¶
func (m *Manager) IsAvailable(deviceType api.DeviceType) bool
IsAvailable checks if a specific device type is currently available.
This method performs a quick check to determine if a device of the specified type exists and is marked as available. It's commonly used before attempting to run models on specific hardware.
The method is thread-safe and can be called concurrently.
Parameters:
- deviceType: The device type to check (e.g., ConfigKeyAscend910B)
Returns:
- true if the device type exists and is available
- false if the device doesn't exist or is unavailable
Example:
if !manager.IsAvailable(ConfigKeyAscend910B) {
return fmt.Errorf("Ascend 910B device required but not available")
}
func (*Manager) ListAvailable ¶
ListAvailable returns all currently available devices.
This method returns only devices that are marked as available, filtering out devices that are unavailable due to errors, being in use, or missing drivers.
The method is thread-safe and can be called concurrently. It returns pointers to Device structs, allowing callers to inspect detailed device properties.
Returns:
- A slice of pointers to Device structs for all available devices. Returns an empty slice if no devices are available.
Example:
devices := manager.ListAvailable()
for _, dev := range devices {
fmt.Printf("%s (%s): vendor=%s\n",
dev.Name, dev.Type, dev.Properties["vendor"])
}
func (*Manager) ListDetectedChips ¶
func (m *Manager) ListDetectedChips() ([]DetectedChip, error)
ListDetectedChips returns detailed information for all detected AI chips.
This method performs a fresh scan of the system and returns individual chip information including PCI addresses, vendor/device IDs, and capabilities. Unlike ListAvailable() which returns aggregated Device entries, this returns one entry per physical chip.
Returns:
- A slice of DetectedChip with details for each physical chip
- An error if hardware scanning fails
Example:
chips, err := manager.ListDetectedChips()
if err != nil {
return err
}
for _, chip := range chips {
fmt.Printf("%s: %s at %s\n", chip.DeviceType, chip.ModelName, chip.BusAddress)
}
type PCIDevice ¶
type PCIDevice struct {
// VendorID is the PCI vendor ID (e.g., "0x1db7")
VendorID string
// DeviceID is the PCI device ID
DeviceID string
// SubsystemVendorID is the subsystem vendor ID (optional)
SubsystemVendorID string
// SubsystemDeviceID is the subsystem device ID (optional)
SubsystemDeviceID string
// BusAddress is the PCI bus address (e.g., "0000:01:00.0")
BusAddress string
// Class is the PCI device class
Class string
}
PCIDevice represents a PCI device with its identifiers
func ParseLspciOutput ¶
ParseLspciOutput parses the output of `lspci -nn` command
This is an alternative method for systems where sysfs access is restricted. The output format should be: "bus:dev.fn Class [class]: Vendor [vid:did]"
Parameters:
- output: The output from lspci -nn command
Returns:
- Slice of PCIDevice parsed from the output
func ScanPCIDevices ¶
ScanPCIDevices scans the system for PCI devices
This function reads PCI device information from /sys/bus/pci/devices which is the standard location on Linux systems.
Returns:
- Slice of PCIDevice found on the system
- Error if scanning fails