README
¶
Overview
AMD Container Toolkit offers tools to streamline the use of AMD GPUs with containers. The toolkit includes the following packages.
amd-container-runtime- The AMD Container Runtimeamd-ctk- The AMD Container Toolkit CLI
Requirements
- Ubuntu 22.02 or 24.04, or RHEL/CentOS 9
- Docker version 25 or later
- All the 'amd-ctk runtime configure' commands should be run as root/sudo
Quick Start
Install the Container toolkit.
Installing on Ubuntu
To install the AMD Container Toolkit on Ubuntu systems, follow these steps:
-
Ensure pre-requisites are installed
apt update && apt install -y wget gnupg2 -
Add the GPG key for the repository:
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null -
Add the repository to your system. Replace
noblewithjammyif you are using Ubuntu 22.04:echo "deb [signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amd-container-toolkit/apt/ noble main" > /etc/apt/sources.list.d/amd-container-toolkit.list -
Update the package list and install the toolkit:
apt update && apt install amd-container-toolkit
Installing on RHEL/CentOS 9
To install the AMD Container Toolkit on RHEL/CentOS 9 systems, follow these steps:
-
Add the repository configuration:
tee --append /etc/yum.repos.d/amd-container-toolkit.repo <<EOF [amd-container-toolkit] name=amd-container-toolkit baseurl=https://repo.radeon.com/amd-container-toolkit/el9/main/ enabled=1 priority=50 gpgcheck=1 gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key EOF -
Clean the package cache and install the toolkit:
dnf clean all dnf install -y amd-container-toolkit
Configuring Docker
-
Configure the AMD container runtime for Docker as follows. The following command modifies the docker configuration file, /etc/docker/daemon.json, so that Docker can use the AMD container runtime.
> sudo amd-ctk runtime configure -
Restart the Docker daemon.
> sudo systemctl restart docker
Docker Runtime Integration
- Configure Docker to use AMD container runtime.
> amd-ctk runtime configure --runtime=docker
-
Specify the required GPUs. There are 3 ways to do this.
-
Using
AMD_VISIBLE_DEVICESenvironment variable- To use all available GPUs,
> docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=all rocm/rocm-terminal rocm-smi- To use a subset of available GPUs,
> docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0,1,2 rocm/rocm-terminal rocm-smi- To use many contiguously numbered GPUs,
> docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0-3,5,8 rocm/rocm-terminal rocm-smi -
Using CDI style
- First, generate the CDI spec.
> amd-ctk cdi generate --output=/etc/cdi/amd.json- Validate the generated CDI spec.
> amd-ctk cdi validate --path=/etc/cdi/amd.json- To use all available GPUs,
> docker run --rm --device amd.com/gpu=all rocm/rocm-terminal rocm-smi- To use a subset of available GPUs,
> docker run --rm --device amd.com/gpu=0 --device amd.com/gpu=1 rocm/rocm-terminal rocm-smi- Note that once the CDI spec,
/etc/cdi/amd.jsonis available,runtime=amdis not required in the docker run command.
-
Using explicit paths. Note that
runtime=amdis not required here.
> docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD129 rocm/rocm-terminal rocm-smi -
-
List available GPUs. If this command is run as root, the container-toolkit logs go to /var/log/amd-container-runtime.log, otherwise they go to the user's home directory.
> amd-ctk cdi list
Found 1 AMD GPU device
amd.com/gpu=all
amd.com/gpu=0
/dev/dri/card1
/dev/dri/renderD128
- Make AMD container runtime default runtime. Avoid specifying
--runtime=amdoption with thedocker runcommand by setting the AMD container runtime as the default for Docker.
> amd-ctk runtime configure --runtime=docker --set-as-default
- Remove AMD container runtime as default runtime.
> amd-ctk runtime configure --runtime=docker --unset-as-default
- Remove AMD container runtime configuration in Docker (undo the earlier configuration).
> amd-ctk runtime configure --runtime=docker --remove
Device discovery and enumeration
The following command can be used to list the GPUs available on the system and their enumberation. The GPUs are listed in the CDI format, but the same enumeration applies to usage with the OCI environment variable, AMD_VISIBLE_DEVICES.
> amd-ctk cdi list
Found 1 AMD GPU device
amd.com/gpu=all
amd.com/gpu=0
/dev/dri/card1
/dev/dri/renderD128
GPU UUID Support
The AMD Container Toolkit supports GPU selection using unique identifiers (UUIDs) in addition to device indices. This enables more precise and reliable GPU targeting, especially in multi-GPU systems and orchestrated environments.
Getting GPU UUIDs
You can obtain GPU UUIDs using different tools:
Using ROCm SMI
rocm-smi --showuniqueid
This will display output similar to:
GPU[0] : Unique ID: 0xef2c1799a1f3e2ed
GPU[1] : Unique ID: 0x1234567890abcdef
Using AMD-SMI
You can also use amd-smi to get the ASIC_SERIAL, which serves as the GPU UUID:
amd-smi static -aB
This will display output similar to:
GPU: 0
ASIC:
MARKET_NAME: AMD Instinct MI210
VENDOR_ID: 0x1002
VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
SUBVENDOR_ID: 0x1002
DEVICE_ID: 0x740f
SUBSYSTEM_ID: 0x0c34
REV_ID: 0x02
ASIC_SERIAL: 0xD1CC3F11CFDD5112
OAM_ID: N/A
NUM_COMPUTE_UNITS: 104
TARGET_GRAPHICS_VERSION: gfx90a
BOARD:
MODEL_NUMBER: 102-D67302-00
PRODUCT_SERIAL: 692231000131
FRU_ID: 113-HPED67302000B.009
PRODUCT_NAME: Instinct MI210
MANUFACTURER_NAME: AMD
Use the ASIC_SERIAL value (e.g., 0xD1CC3F11CFDD5112) as the GPU UUID in your container configurations.
Using UUIDs with Environment Variables
Both AMD_VISIBLE_DEVICES and DOCKER_RESOURCE_* environment variables support UUID specification:
Using AMD_VISIBLE_DEVICES
# Use specific GPUs by UUID
docker run --rm --runtime=amd \
-e AMD_VISIBLE_DEVICES=0xef2c1799a1f3e2ed,0x1234567890abcdef \
rocm/dev-ubuntu-24.04 rocm-smi
# Mix device indices and UUIDs
docker run --rm --runtime=amd \
-e AMD_VISIBLE_DEVICES=0,0xef2c1799a1f3e2ed \
rocm/dev-ubuntu-24.04 rocm-smi
Using DOCKER_RESOURCE_* Variables
# Docker Swarm generic resource format
docker run --rm --runtime=amd \
-e DOCKER_RESOURCE_GPU=0xef2c1799a1f3e2ed \
rocm/dev-ubuntu-24.04 rocm-smi
Docker Swarm Integration
GPU UUID support significantly improves Docker Swarm deployments by enabling precise GPU allocation across cluster nodes.
Docker Daemon Configuration for Swarm
Configure each swarm node's Docker daemon with GPU resources in /etc/docker/daemon.json:
{
"runtimes": {
"amd": {
"path": "amd-container-runtime",
"runtimeArgs": []
}
},
"node-generic-resources": [
"AMD_GPU=0x378041e1ada6015",
"AMD_GPU=0xef39dad16afb86ad",
"GPU_COMPUTE=0x583de6f2d99dc333"
]
}
After updating the configuration, restart the Docker daemon:
sudo systemctl restart docker
Service Definition
Deploy services with specific GPU requirements using docker-compose:
# docker-compose.yml for Swarm deployment
version: '3.8'
services:
rocm-service:
image: rocm/dev-ubuntu-24.04
command: rocm-smi
runtime: amd
deploy:
replicas: 1
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: 'AMD_GPU' # Matches daemon.json key
value: 1
Deploy the service:
docker stack deploy -c docker-compose.yml rocm-stack
GPU Tracker
Currently, barebones Docker provides no way to track access of GPUs in containers. Additionally, by default, multiple containers in Docker can be granted access to the same GPU simultaneously. GPU Tracker is an extremely lightweight feature of AMD Container Toolkit that solves these issues.
GPU Tracker state is initialized during AMD Container Toolkit installation and is by default disabled. Users can enable or disable the GPU Tracker feature by using the enable or disable CLIs. When enabled, the GPU Tracker automatically maintains the state of GPUs and the containers that they are made accessible to, only if the containers are launched and granted access to the GPUs using the AMD_VISILE_DEVICES environment variable. When the container process completes execution or is stopped, the GPU Tracker state is automatically updated to reflect GPUs released by the specific container.
NOTE: GPU Tracker feature is currenly supported only if containers are started using the docker run command and GPUs are made accessible in containers using the AMD_VISIBLE_DEVICES environment variable. If containers are started and granted access to GPUs in any other manner, GPU Tracker feature is not supported.
GPU Tracker provides CLIs that can be used to control the accessibility of GPUs in containers. The accessibility of GPUs can be set to either shared or exclusive.
- The
sharedaccessibility indicates that the GPU can be made accessibile to multiple containers simultaneously. By default, all GPUs are granted thesharedaccessibility to reflect the default Docker behavior. - The
exclusiveaccessibility indicates that the GPU can be made accessible to at most one container at any point of time.
GPU Tracker status can be queried at any point of time using the status command and reset using the reset CLIs.
> sudo amd-ctk gpu-tracker -h
NAME:
AMD Container Toolkit CLI gpu-tracker - GPU Tracker related commands
USAGE:
amd-ctk gpu-tracker [gpu-ids] [accessibility]
Arguments:
gpu-ids Comma-separated list of GPU IDs (comma separated list, range operator, all)
accessibility Must be either 'exclusive' or 'shared'
Examples:
amd-ctk gpu-tracker 0,1,2 exclusive
amd-ctk gpu-tracker 0,1-2 shared
amd-ctk gpu-tracker all shared
OR
amd-ctk gpu-tracker [command] [options]
COMMANDS:
disable Disable the GPU Tracker
enable Enable the GPU Tracker
reset Reset the GPU Tracker
status Show Status of GPUs
help, h Shows a list of commands or help for one command
OPTIONS:
--help, -h show help
Using GPU Tracker
Let us assume that the node has 4 GPUs as indicated below
> rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
====================================================================================================================
0 4 0x740f, 12261 33.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
1 5 0x740f, 13566 38.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
2 3 0x740f, 57300 34.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
3 2 0x740f, 1997 38.0°C 41.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================
-
Show GPU Tracker Status:
Once AMD Container Toolkit, is installed, the GPU Tracker is initialized and the status can be queried using the
statusCLI. If GPU Tracker is enabled, by default it can be seen that GPUs are granted thesharedaccessibility.> amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared - 1 0x89CAA15875FF5A43 Shared - 2 0x6E32F10EFC982B4C Shared - 3 0x12FE4F7FDAF06B9 Shared -If GPU Tracked feature is not enabled, then a message indicating this is printed.
> amd-ctk gpu-tracker status GPU Tracker is disabled -
Enabling GPU Tracker:
GPU Tracker can be enabled using the
enableCLI. When GPU Tracker is newly enabled, it starts tracking usage of GPUs in containers with no prior knowledge of GPUs state. If GPU Tracker is already currently enabled, then nothing happens and a message indicating this is printed.> amd-ctk gpu-tracker status GPU Tracker is disabled > amd-ctk gpu-tracker enable GPU Tracker has been enabled > amd-ctk gpu-tracker enable GPU Tracker is already enabled -
Disabling GPU Tracker:
GPU Tracker can be disabled using the
disableCLI. If GPU Tracker is again enabled in the future, all the GPUs state related information will be lost.> amd-ctk gpu-tracker disable GPU Tracker has been disabled > amd-ctk gpu-tracker status GPU Tracker is disabled -
Granting access to GPUs in Docker containers:
If GPU Tracker is enabled before launching container, it automatically tracks the usage of GPUs in containers as indicated below.
> docker run --runtime=amd -itd -e AMD_VISIBLE_DEVICES=0-2 rocm/rocm-terminal bash 36b012bb34c96149a6ef5b28623e6e75cf9f71eb2b824b2c8f44e0449c7a1aa8 > docker run --runtime=amd -itd -e AMD_VISIBLE_DEVICES=1,3 rocm/rocm-terminal bash 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd > amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared 36b012bb34c96149a6ef5b28623e6e75cf9f71eb2b824b2c8f44e0449c7a1aa8 1 0x89CAA15875FF5A43 Shared 36b012bb34c96149a6ef5b28623e6e75cf9f71eb2b824b2c8f44e0449c7a1aa8 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd 2 0x6E32F10EFC982B4C Shared 36b012bb34c96149a6ef5b28623e6e75cf9f71eb2b824b2c8f44e0449c7a1aa8 3 0x12FE4F7FDAF06B9 Shared 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd > docker rm -f 36b012bb34c96149a6ef5b28623e6e75cf9f71eb2b824b2c8f44e0449c7a1aa8 36b012bb34c96149a6ef5b28623e6e75cf9f71eb2b824b2c8f44e0449c7a1aa8 > amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared - 1 0x89CAA15875FF5A43 Shared 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd 2 0x6E32F10EFC982B4C Shared - 3 0x12FE4F7FDAF06B9 Shared 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd -
Setting GPUs to have
exclusiveaccessibility:If GPU Tracker is enabled, GPUs can be set to have exclusive access in containers. If the user tries to make GPUs exclusive when GPU Tracker is disabled, nothing happens and a message indicating that GPU Tracker is disabled is printed.
> amd-ctk gpu-tracker 1-3 exclusive GPUs [1 2 3] have been made exclusive > amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared - 1 0x89CAA15875FF5A43 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd 2 0x6E32F10EFC982B4C Exclusive - 3 0x12FE4F7FDAF06B9 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd > docker run --runtime=amd -itd -e AMD_VISIBLE_DEVICES=0-2 rocm/rocm-terminal bash d23ff3dce1839cbf8ce7ad362641ab85e80b315c319edf73b269c460e348053a docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/d23ff3dce1839cbf8ce7ad362641ab85e80b315c319edf73b269c460e348053a/log.json: no such file or directory): amd-container-runtime did not terminate successfully: exit status 1: GPUs [0 2] allocated GPUs [1] are exclusive and already in use Released GPUs [2 0] used by container d23ff3dce1839cbf8ce7ad362641ab85e80b315c319edf73b269c460e348053a : unknown. > amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared - 1 0x89CAA15875FF5A43 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd 2 0x6E32F10EFC982B4C Exclusive - 3 0x12FE4F7FDAF06B9 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffdIn the above example, GPUs 1,2 and 3 have been granted
exclusiveaccess.When a new container
d23ff3dce1839cbf8ce7ad362641ab85e80b315c319edf73b269c460e348053athat requests access to GPUs 0,1 and 2 is launched, the following happens:- The new container is created.
- The new container is granted access to GPU 0 as no container is currently using GPU 0.
- GPUs 1 is already being used by container
90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd. Hence, the new container is not granted access to it as GPU 1 hasexclusiveaccessibility. - The new container is granted access to GPU 2 as no container is currently using GPU 2 though GPU 2 has
exclusiveaccessibility. - The container is not started since it has not been granted access to the required GPU resources.
- The resources that have been granted to the new container are released.
NOTE:
-
Even though the new container
d23ff3dce1839cbf8ce7ad362641ab85e80b315c319edf73b269c460e348053ais not successfully started, it is still visible when we rundocker ps -acommand.> docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d23ff3dce183 rocm/rocm-terminal "bash" 11 seconds ago Created funny_gagarin 90cb29e11e83 rocm/rocm-terminal "bash" 45 seconds ago Up 44 seconds practical_williamsThis is because Docker has already created the container when the runtime errors out due to non-availability of resources. This behavior is similar to behavior exhibited by Docker when a container fails to start in any stage after the container is created in Docker. In such cases also, the container is visible in the
docker ps -acommand output with status asCreatedas depicted below.> docker run -itd ubuntu incorrect_command 94f11c132e8cd0a35d05bcc8bcaf77264563998d07f6ad5c73798cf9ddd94726 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "incorrect_command": executable file not found in $PATH: unknown. > docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 94f11c132e8c ubuntu "incorrect_command" 17 seconds ago Created elastic_ardinghelli -
Only GPUs that are currently not being used by more than 1 container can be set to have
exclusiveaccessibility.> amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared 8463b475b55b104b30edec8ddf6249b6214b27127106aa0ff4a8a514b856810e 1 0x89CAA15875FF5A43 Shared 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd 8463b475b55b104b30edec8ddf6249b6214b27127106aa0ff4a8a514b856810e 2 0x6E32F10EFC982B4C Exclusive 8463b475b55b104b30edec8ddf6249b6214b27127106aa0ff4a8a514b856810e 3 0x12FE4F7FDAF06B9 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd > amd-ctk gpu-tracker 1 exclusive GPUs [1] have not been made exclusive because more than one container is currently using it
-
Setting GPUs to have
sharedaccessibility:If GPU Tracker is enabled, GPUs can be set to have shared access in containers. If the user tries to make GPUs shared when GPU Tracker is disabled, nothing happens and a message indicating that GPU Tracker is disabled is printed. By default when GPU Tracker is disabled, GPUs have shared accessibility.
> amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared - 1 0x89CAA15875FF5A43 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd 2 0x6E32F10EFC982B4C Exclusive - 3 0x12FE4F7FDAF06B9 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd > amd-ctk gpu-tracker 1 shared GPUs [1] have been made shared > docker run --runtime=amd -itd -e AMD_VISIBLE_DEVICES=0-2 rocm/rocm-terminal bash a8ce87c99727107ab467508bd431a170b148001fe8a866fcf96d5cc6af9a7f5e > amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared a8ce87c99727107ab467508bd431a170b148001fe8a866fcf96d5cc6af9a7f5e 1 0x89CAA15875FF5A43 Shared 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd a8ce87c99727107ab467508bd431a170b148001fe8a866fcf96d5cc6af9a7f5e 2 0x6E32F10EFC982B4C Exclusive a8ce87c99727107ab467508bd431a170b148001fe8a866fcf96d5cc6af9a7f5e 3 0x12FE4F7FDAF06B9 Exclusive 90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffdIn the above example, GPU 1 has been set to
sharedaccess from the previousexclusiveaccess.When a new container
a8ce87c99727107ab467508bd431a170b148001fe8a866fcf96d5cc6af9a7f5ethat requests access to GPUs 0,1 and 2 is launched, the following happens:- The new container is created.
- The new container is granted access to GPU 0 as no container is currently using GPU 0.
- GPUs 1 is already being used by container
90cb29e11e83aa3ae497c68c90e1f0894b85262188c1ef9c7284457a9bc35ffd. However, the new container is granted access to GPU 1 as GPU 1 hassharedaccessibility. - The new container is granted access to GPU 2 as no container is currently using GPU 2 though GPU 2 has
exclusiveaccessibility. - The container is successfully started since it has been granted access to the required GPU resources.
-
Resetting GPU Tracker Status:
Resetting GPU Tracker clears the GPU Tracker state, i.e. the accessibility of all GPU is set to
sharedand all information about which GPUs have been made accessible in containers is cleared. If GPU Tracker is enabled, then after the reset operation also the GPU Tracker is enabled. Conversely, if GPU Tracker is cdisabled, then after the reset operation also the GOU Tracker remains disabled.Resetting GPU Tracker is primarily useful in cases where GPU Tracker is enabled and the partitioning scheme of the GPUs has been altered. Changing the partitioning scheme of the GPUs invalidated the CDI Spec and GPU Tracker state. In these cases, it is required to:
- Stop all running containers
- Reset GPU Tracker
- Regenerate CDI Spec
- Restart containers
If GPU Tracker is disabled when the partitioning scheme of the GPUs have been altered, then GPU Tracker need not be reset. However, it is recommended to still perform the other actions. It makes no difference if GPU Tracker is reset when it is disabled.
> amd-ctk gpu-tracker status GPUs info is invalid. Please reset GPU Tracker. > amd-ctk gpu-tracker reset GPU Tracker has been reset Since GPU Tracker was enabled, it is recommended to stop and restart running containers to get the most accurate GPU Tracker status > sudo amd-ctk cdi generate Generated CDI spec: /etc/cdi/amd.json > sudo docker run --runtime=amd -itd -e AMD_VISIBLE_DEVICES=0-2 rocm/rocm-terminal bash 988135dafcd94bf98fbd92ca97f4a07c9bcfff0521359ee9bc8a6973cc3e25ce > sudo amd-ctk gpu-tracker status ------------------------------------------------------------------------------------------------------------------------ GPU Id UUID Accessibility Container Ids ------------------------------------------------------------------------------------------------------------------------ 0 0xEA35F57CC80DEB35 Shared 988135dafcd94bf98fbd92ca97f4a07c9bcfff0521359ee9bc8a6973cc3e25ce 1 0x89CAA15875FF5A43 Shared 988135dafcd94bf98fbd92ca97f4a07c9bcfff0521359ee9bc8a6973cc3e25ce 2 0x6E32F10EFC982B4C Shared 988135dafcd94bf98fbd92ca97f4a07c9bcfff0521359ee9bc8a6973cc3e25ce 3 0x12FE4F7FDAF06B9 Shared -
Release notes
| Release | Features | Known Issues |
|---|---|---|
| v1.2.0 | 1. GPU Tracker feature support 2. Docker Swarm support |
None |
| v1.1.0 | 1. GPU partitioning support 2. Full RPM package support 3. Support for range operator in the input string to AMD_VISIBLE_DEVICES ENV variable. |
None |
| v1.0.0 | Initial release | 1. Partitioned GPUs are not supported. 2. RPM builds are experimental. |
Building from Source
To build debian package, use the following command.
make
make pkg-deb
To build rpm package, use the following command.
make build-dev-container-rpm
make pkg-rpm
The packages will be generated in the bin folder.
Documentation
For detailed documentation including installation guides and configuration options, see the documentation.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.