processes

package
v0.11.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Overview

Package processes tracks the NVIDIA per-GPU processes.

Index

Constants

View Source
const Name = "accelerator-nvidia-processes"

Name is the component name for NVIDIA GPU process monitoring.

View Source
const SubSystem = "accelerator_nvidia_processes"

SubSystem is the Prometheus subsystem name for process metrics.

Variables

This section is empty.

Functions

func New

func New(gpudInstance *components.GPUdInstance) (components.Component, error)

New returns the NVIDIA processes component.

Types

type Process added in v0.9.0

type Process struct {
	PID    uint32   `json:"pid"`
	Status []string `json:"status,omitempty"`

	// ZombieStatus is set to true if the process is defunct
	// (terminated but not reaped by its parent).
	ZombieStatus bool `json:"zombie_status,omitempty"`

	// BadEnvVarsForCUDA is a map of environment variables that are known to hurt CUDA
	// that is set for this specific process.
	// Empty if there is no bad environment variable found for this process.
	// This implements "DCGM_FR_BAD_CUDA_ENV" logic in DCGM.
	BadEnvVarsForCUDA map[string]string `json:"bad_env_vars_for_cuda,omitempty"`

	CmdArgs                     []string    `json:"cmd_args,omitempty"`
	CreateTime                  metav1.Time `json:"create_time,omitzero"`
	GPUUsedPercent              uint32      `json:"gpu_used_percent,omitempty"`
	GPUUsedMemoryBytes          uint64      `json:"gpu_used_memory_bytes,omitempty"`
	GPUUsedMemoryBytesHumanized string      `json:"gpu_used_memory_bytes_humanized,omitempty"`
}

Process describes a single GPU-backed process observed through NVML.

type Processes added in v0.9.0

type Processes struct {
	// Represents the GPU UUID.
	UUID string `json:"uuid"`

	// BusID is the GPU bus ID from the nvml API.
	//  e.g., "0000:0f:00.0"
	BusID string `json:"bus_id"`

	// A list of running processes.
	RunningProcesses []Process `json:"running_processes"`

	// GetComputeRunningProcessesSupported is true if the device supports the getComputeRunningProcesses API.
	GetComputeRunningProcessesSupported bool `json:"get_compute_running_processes_supported"`

	// GetProcessUtilizationSupported is true if the device supports the getProcessUtilization API.
	GetProcessUtilizationSupported bool `json:"get_process_utilization_supported"`
}

Processes represents the current clock events from the nvmlDeviceGetCurrentClocksEventReasons API. ref. https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g7e505374454a0d4fc7339b6c885656d6 ref. https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1ga115e41a14b747cb334a0e7b49ae1941 ref. https://docs.nvidia.com/deploy/nvml-api/group__nvmlClocksEventReasons.html#group__nvmlClocksEventReasons

func GetProcesses added in v0.9.0

func GetProcesses(uuid string, dev device.Device) (Processes, error)

GetProcesses returns the running GPU processes observed for the given device.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL