nvproxy

package
v0.0.0-...-75bfadc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 14, 2026 License: Apache-2.0, BSD-3-Clause, MIT Imports: 36 Imported by: 0

README

nvproxy

The nvproxy package is a core component of gVisor that enables support for NVIDIA GPUs, allowing sandboxed applications to perform GPU-accelerated computations. This is achieved by intercepting and forwarding NVIDIA driver calls from the sandboxed application to the host's NVIDIA driver.

How it Works

The nvproxy driver operates by implementing virtual character devices within the gVisor sandbox that mimic actual NVIDIA device files (like /dev/nvidiactl, /dev/nvidia-uvm and /dev/nvidia#). When an application inside the sandbox opens and interacts with these devices, nvproxy intercepts the ioctl and mmap system calls. These calls, which are typically used for communication with the NVIDIA driver, are then forwarded to the actual host NVIDIA driver after necessary translations.

This proxying mechanism allows gVisor to maintain a strong security boundary while still providing applications with access to the powerful computational capabilities of the GPU. All other system calls from the application continue to be handled by the gVisor Sentry.

For more information about gVisor GPU support, see the user guide.

Adding Support for New Driver Versions

The nvproxy package is sensitive to changes in the NVIDIA driver's Application Binary Interface (ABI), which can occur between driver releases. This mainly happens when ioctl(2) structs are modified. To manage this, nvproxy is designed to support multiple driver versions explicitly.

This is accomplished using a sparse version tree defined in version.go. This tree doesn't list every NVIDIA driver release; instead, it only contains the specific versions required to model the ABI's evolution across all supported versions.

The tree's structure mimics the commit history of NVIDIA kernel driver repo, including releases from both the master branch and separate development branches. This is critical because ABI changes introduced in a parent version affect all subsequent child versions. An accurate tree allows nvproxy to correctly compose the final ABI for any given version.

At runtime, nvproxy performs the following steps:

  1. Reads the host's NVIDIA driver version.
  2. Finds the corresponding version in its tree.
  3. Traverses the tree from the root to that version's node, applying all ABI modifications along the path.

Here is the step-by-step process for adding support for a new driver version.

Step 1: Place the New Version in the Tree

First, determine the new version's correct position in nvproxy's version tree.

  1. Find the parent commit: Go to the NVIDIA driver releases page and find the commit hash for your target version. Traverse the commit history upwards (following parent links) until you find a version that already exists in nvproxy's version tree.
  2. Insert the new version: Add the new version to the tree under its identified parent.
  3. Mimic the branch structure: If the new version was created on a separate development branch (i.e., not master), you must replicate that branch structure in the nvproxy tree. If the branch point is a version nvproxy doesn't officially support, add it as an "unqualified" node (a version without a checksum or official support) to maintain structural integrity.
Step 2: Calculate the Driver Checksum

The version tree requires a SHA256 checksum of the official NVIDIA driver installer (.runfile) for verification. You can calculate this using the provided tool:

bazel run tools/gpu:main checksum -- --version=<DRIVER_VERSION>
Step 3: Account for ABI Changes

Use our nvidia_driver_differ tool to detect changes to proxied ABI structs between the parent and the new version. The tool analyzes the NVIDIA kernel driver source code and outputs the impacted structs.

bazel run tools/nvidia_driver_differ:run_differ -- --base <PARENT_VERSION> --next <NEW_VERSION>
  • <PARENT_VERSION> is the version of the parent node in nvproxy's version tree.
  • <NEW_VERSION> is the version you are adding.

Warning: This tool is for assistance and does not guarantee completeness. You must still perform manual verification and testing. GPU tests are run against all supported driver versions during Buildkite presubmits.

To verify changes in nvproxy, you can run the nvproxy_driver_parity_test test, which compares nvproxy's struct definitions with driver struct definitions:

bazel test pkg/sentry/devices/nvproxy:nvproxy_driver_parity_test
Handling Intermediate Versions

It is crucial to introduce ABI changes at the exact version they appear in the driver source, even if nvproxy doesn't officially support that intermediate version. When you identify an ABI struct change, go to its source code in NVIDIA kernel driver repo and see which commit introduced the change (using Blame view).

For example, imagine nvproxy supports version [A] and you want to add support for [C]. However, an ABI change that affects nvproxy was introduced in an intermediate version B.

Incorrect:
Do not apply the changes from B directly into [C].

Correct:
Create an intermediate, unqualified node for B that contains the necessary code changes. The new node for [C] can then inherit these changes from B. This ensures the version history is accurate.

This approach is essential for long-term maintainability. If you later need to support another version [D] that also branched from B, it can accurately inherit the same changes.

[A] -> B -> [C]
        \
         -> [D]
Step 4: Add Support for New or Modified Ioctls

After running the nvidia_driver_differ tool, you may need to add or update ioctl command handlers. To do this correctly, you must find the ioctl's implementation in the NVIDIA kernel driver source code to understand its function and data structures.

The implementation details depend on the ioctl type:

  • Frontend Ioctls (/dev/nvidiactl or /dev/nvidia#):
    • For top level commands, see documentation in frontendFD.Ioctl() in frontend.go.
    • For sub-commands of NV_ESC_RM_ALLOC (allocation classes): See documentation in rmAlloc() in frontend.go.
    • For sub-commands of NV_ESC_RM_CONTROL (control commands): See documentation in rmControl() in frontend.go.
  • UVM Ioctls (/dev/nvidia-uvm): These require manual implementation by studying the kernel driver source and replicating the logic within nvproxy.
Handling File Descriptors and Pointers

A critical responsibility of nvproxy is to translate file descriptors (FDs) and pointers within ioctl data structures.

  • File Descriptors: FDs used by the sandboxed application are local to the gVisor Sentry's FD table. They must be translated to the corresponding host FDs before being passed to the host driver.
  • Pointers: Pointers in ioctl structs are virtual addresses within the sandboxed application's memory space. These are invalid on the host. The structs containing them must be copied from the application's memory into the Sentry's memory. The host ioctl call must then be made using a pointer to this Sentry-managed memory.
Simple Ioctls

If an ioctl data structure contains neither pointers nor FDs and has no special mmap semantics, it requires no translation and is considered "simple". Helper utilities exist in nvproxy to proxy these simple ioctls directly, which you should use whenever possible. Majority of ioctls proxied today are simple.

Documentation

Overview

Package nvproxy implements proxying for the Nvidia GPU Linux kernel driver: https://github.com/NVIDIA/open-gpu-kernel-modules.

Supported Nvidia GPUs: T4, L4, A100, A10G and H100.

Lock ordering:

- nvproxy.fdsMu - rootClient.objsMu

  • nvproxy.clientsMu

Index

Constants

View Source
const (
	// ChecksumNoDriver is a special value that indicates that the driver runfile does not exist. This
	// is mostly for ARM drivers that NVIDIA does not provide a driver installer.
	ChecksumNoDriver = "NO_DRIVER"
)

Variables

This section is empty.

Functions

func Filters

func Filters(enabledCaps nvconf.DriverCaps) seccomp.SyscallRules

Filters returns seccomp-bpf filters for this package when using the given set of capabilities.

func ForEachSupportDriver

func ForEachSupportDriver(f func(version nvconf.DriverVersion, checksums Checksums))

ForEachSupportDriver calls f on all supported drivers. Precondition: Init() must have been called.

func HostDriverVersion

func HostDriverVersion() (string, error)

HostDriverVersion returns the version of the host Nvidia driver.

func Init

func Init()

Init initializes abis global map.

func LatestDriver

func LatestDriver() nvconf.DriverVersion

LatestDriver returns the latest supported driver. Precondition: Init() must have been called.

func SupportedDrivers

func SupportedDrivers() []nvconf.DriverVersion

SupportedDrivers returns a list of all supported drivers. Precondition: Init() must have been called.

func SupportedIoctlsNumbers

func SupportedIoctlsNumbers(version nvconf.DriverVersion) (frontendIoctls map[uint32]struct{}, uvmIoctls map[uint32]struct{}, controlCmds map[uint32]struct{}, allocClasses map[uint32]struct{}, ok bool)

SupportedIoctlsNumbers returns the ioctl numbers that are supported by nvproxy at a given version.

Types

type Checksums

type Checksums struct {
	// contains filtered or unexported fields
}

Checksums is a struct containing the SHA256 checksum of the linux .run driver installer file from NVIDIA.

func ExpectedDriverChecksum

func ExpectedDriverChecksum(version nvconf.DriverVersion) (Checksums, bool)

ExpectedDriverChecksum returns the expected checksum for a given version. Precondition: Init() must have been called.

func NewChecksums

func NewChecksums(checksumX86_64, checksumARM64 string) Checksums

NewChecksums creates a new Checksums struct.

func (Checksums) Arm64

func (c Checksums) Arm64() string

Arm64 returns the SHA256 checksum of the linux .run driver installer file from NVIDIA for ARM64.

func (Checksums) Checksum

func (c Checksums) Checksum() (string, error)

Checksum returns the SHA256 checksum of the linux .run driver installer file from NVIDIA for the given architecture.

func (Checksums) X86_64

func (c Checksums) X86_64() string

X86_64 returns the SHA256 checksum of the linux .run driver installer file from NVIDIA for X86_64.

type DeviceInfo

type DeviceInfo struct {
	// CapsDevMajor is nvidia-caps' device major number. If CapsDevMajor is 0,
	// nvidia-caps is not enabled.
	CapsDevMajor uint32

	// If HaveFabricIMEXManagement is true, FabricIMEXManagementDevMinor is the
	// fabric-imex-mgmt capability's device minor number, which matches the
	// value on the host. (Its device major number is CapsDevMajor, which must
	// be non-zero and might not match the host's value.)
	HaveFabricIMEXManagement     bool
	FabricIMEXManagementDevMinor uint32

	// CapsIMEXChannelsDevMajor is nvidia-caps-imex-channels's device major
	// number. If CapsIMEXChannelsDevMajor is 0, nvidia-caps-imex-channels is
	// not enabled.
	CapsIMEXChannelsDevMajor uint32

	// UVMDevMajor is nvidia-uvm's device major number. If UVMDevMajor is 0,
	// nvidia-uvm is enabled.
	UVMDevMajor uint32
}

DeviceInfo contains information on registered nvproxy devices.

+stateify savable

func DeviceInfoFromVFS

func DeviceInfoFromVFS(vfsObj *vfs.VirtualFilesystem) *DeviceInfo

DeviceInfoFromVFS returns device information for nvproxy devices registered in vfsObj. The returned DeviceInfo must not be mutated. If DeviceInfoFromVFS returns nil, nvproxy.Register(vfsObj) has not been called.

func Register

func Register(vfsObj *vfs.VirtualFilesystem, opts *Options) (*DeviceInfo, error)

Register registers all devices implemented by this package, and specified by opts, in vfsObj. If it succeeds, it returns information about registered devices; the returned DeviceInfo must not be mutated.

type DriverABIInfo

type DriverABIInfo struct {
	FrontendInfos   map[uint32]IoctlInfo
	UvmInfos        map[uint32]IoctlInfo
	ControlInfos    map[uint32]IoctlInfo
	AllocationInfos map[nvgpu.ClassID]IoctlInfo
}

DriverABIInfo defines all the structs and ioctls used by a driverABI. This is used to help with verifying and supporting new driver versions. This helps keep track of all the driver structs and ioctls that we currently support. We do so by mapping ioctl numbers to its name in the driver and a list of DriverStructs used by that ioctl.

func SupportedIoctls

func SupportedIoctls(version nvconf.DriverVersion) (*DriverABIInfo, bool)

SupportedIoctls returns the DriverABIInfo struct for the given version, which describes the ioctls supported in nvproxy for the given version.

type DriverStruct

type DriverStruct struct {
	Name DriverStructName
	Type reflect.Type
}

DriverStruct ties an nvproxy struct type to its corresponding driver struct name.

type DriverStructName

type DriverStructName = string

DriverStructName is the name of a struct used by the Nvidia driver.

type IoctlInfo

type IoctlInfo struct {
	Name    IoctlName
	Structs []DriverStruct
}

IoctlInfo contains information about an ioctl defined by the Nvidia driver.

type IoctlName

type IoctlName = string

IoctlName is the name of the constant used by the Nvidia driver to define the ioctl number/control command/allocation class.

type NvidiaDeviceFD

type NvidiaDeviceFD interface {
	IsNvidiaDeviceFD()
}

NvidiaDeviceFD is an interface that should be implemented by all vfs.FileDescriptionImpl of Nvidia devices.

type Options

type Options struct {
	// DriverVersion is the Nvidia GPU driver version.
	DriverVersion nvconf.DriverVersion

	// DriverCaps is the set of driver capabilities exposed to applications.
	DriverCaps nvconf.DriverCaps

	HostSettings *nvconf.HostSettings

	// If UseDevGofer is true, open device files via gofer.
	UseDevGofer bool
}

Options holds arguments to Register.

type ProcfsInfo

type ProcfsInfo struct {
	// StaticFiles maps paths relative to /proc/driver/nvidia/ to the contents of
	// files at those paths.
	StaticFiles map[string]string
}

ProcfsInfo contains information about procfs files maintained by nvproxy.

func ProcfsInfoFromVFS

func ProcfsInfoFromVFS(vfsObj *vfs.VirtualFilesystem) *ProcfsInfo

ProcfsInfoFromVFS returns procfs information for nvproxy devices registered in vfsObj. If ProcfsInfoFromVFS returns nil, nvproxy.Register(vfsObj) has not been called.

Directories

Path Synopsis
Package nvconf provides configuration structures and utilities for nvproxy.
Package nvconf provides configuration structures and utilities for nvproxy.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL