fleet-intelligence-agent

module
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2026 License: Apache-2.0

README

NVIDIA Fleet Intelligence Agent

NVIDIA Fleet Intelligence Agent - Host agent for GPU telemetry collection and attestation.

Built on top of leptonai/gpud

Overview

For installation prerequisites and setup details, see: Helm Installation, DEB Installation, and RPM Installation.

What It Monitors:

  • GPU Metrics: Power, temperature, clocks, utilization, memory, Xid events
  • System Metrics: CPU, memory, disk, network usage
  • Infrastructure: NVIDIA drivers, CUDA runtime, InfiniBand, containers

Export Formats:

  • HTTP API Server: Serves data via REST endpoints (JSON) and Prometheus metrics (/metrics)
  • File Export (Offline Mode): Writes data to local files in CSV or JSON format
  • Remote Export: Sends telemetry data to OpenTelemetry-compatible endpoints via OTLP over HTTP

Key Features:

  • Lightweight: <100MB RAM, <1% CPU usage
  • Non-intrusive: Read-only operations, no system modifications
  • Production-ready: 24/7 datacenter operation

Supported Platforms

OS Family Supported Versions Architecture GPU
Ubuntu 22.04, 24.04 x86_64, ARM64 Hopper, Blackwell, Rubin
RHEL 8, 9, 10 x86_64, ARM64 Hopper, Blackwell, Rubin
Rocky Linux 8, 9, 10 x86_64, ARM64 Hopper, Blackwell, Rubin
AlmaLinux 8, 9, 10 x86_64, ARM64 Hopper, Blackwell, Rubin
Amazon Linux 2023 x86_64, ARM64 Hopper, Blackwell, Rubin

Documentation

  • Helm Installation - Kubernetes (Helm) installation and troubleshooting
  • DEB Installation - Ubuntu package install, update, and uninstall
  • RPM Installation - RHEL/Rocky/Alma/Amazon package install, update, and uninstall
  • Architecture - Bare metal and Kubernetes architecture, dependencies, and runtime flow
  • Usage - Commands, HTTP API, integration, and troubleshooting
  • Configuration - Environment variables and service configuration
  • Development - Building from source and contributing

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Related: leptonai/gpud (upstream dependency)

License

Apache License 2.0 - see LICENSE for details.

Directories

Path Synopsis
cmd
fleetint command
internal
attestation
Package attestation provides functionality for GPU attestation
Package attestation provides functionality for GPU attestation
enrollment
Package enrollment provides shared enrollment functionality for the Fleet Intelligence agent
Package enrollment provides shared enrollment functionality for the Fleet Intelligence agent
exporter
Package healthexporter provides functionality to export health data from local SQLite to a global health endpoint for centralized monitoring and long-term storage using OTLP format.
Package healthexporter provides functionality to export health data from local SQLite to a global health endpoint for centralized monitoring and long-term storage using OTLP format.
exporter/collector
Package collector handles health data collection from various sources
Package collector handles health data collection from various sources
exporter/converter
Package converter handles conversion of health data to different formats
Package converter handles conversion of health data to different formats
exporter/writer
Package writer handles writing health data to various outputs
Package writer handles writing health data to various outputs
machineinfo
Package machineinfo provides a shim layer over gpud's machine-info package to customize version information for Fleet Intelligence.
Package machineinfo provides a shim layer over gpud's machine-info package to customize version information for Fleet Intelligence.
precheck
Package precheck evaluates prerequisite checks for enrollment and installation flows.
Package precheck evaluates prerequisite checks for enrollment and installation flows.
registry
Package registry provides component registration and management for fleetint, allowing fine-grained control over which components are enabled.
Package registry provides component registration and management for fleetint, allowing fine-grained control over which components are enabled.
scan
Package scan provides system scanning functionality for Fleet Intelligence monitoring.
Package scan provides system scanning functionality for Fleet Intelligence monitoring.
server
Package healthserver provides a simplified HTTP server for Fleet Intelligence metrics export.
Package healthserver provides a simplified HTTP server for Fleet Intelligence metrics export.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL