Directories
¶
| Path | Synopsis |
|---|---|
|
api
|
|
|
client
|
|
|
v1
Package v1 provides the gpud v1 client for the server.
|
Package v1 provides the gpud v1 client for the server. |
|
cmd
|
|
|
common
Package common provides common utilities for the gpud command.
|
Package common provides common utilities for the gpud command. |
|
gpud
command
|
|
|
gpud/inject-fault
Package injectfault provides a command to inject faults into the system.
|
Package injectfault provides a command to inject faults into the system. |
|
gpud/run
Package run implements the "run" command.
|
Package run implements the "run" command. |
|
gpud/up
Package up implements the "up" command.
|
Package up implements the "up" command. |
|
swagger
command
|
|
|
accelerator
Package accelerator contains the accelerator components and its query interface.
|
Package accelerator contains the accelerator components and its query interface. |
|
accelerator/nvidia
Package nvidia contains the NVIDIA accelerator components and its query interface.
|
Package nvidia contains the NVIDIA accelerator components and its query interface. |
|
accelerator/nvidia/clock-speed
Package clockspeed tracks the NVIDIA per-GPU clock speed.
|
Package clockspeed tracks the NVIDIA per-GPU clock speed. |
|
accelerator/nvidia/ecc
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information.
|
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information. |
|
accelerator/nvidia/fabric-manager
Package fabricmanager tracks NVIDIA fabric manager and fabric health monitoring services.
|
Package fabricmanager tracks NVIDIA fabric manager and fabric health monitoring services. |
|
accelerator/nvidia/gpm
Package gpm tracks the NVIDIA per-GPU GPM metrics.
|
Package gpm tracks the NVIDIA per-GPU GPM metrics. |
|
accelerator/nvidia/gpu-counts
Package gpucounts monitors the GPU count of the system.
|
Package gpucounts monitors the GPU count of the system. |
|
accelerator/nvidia/hw-slowdown
Package hwslowdown monitors NVIDIA GPU hardware clock events of all GPUs, such as HW Slowdown events.
|
Package hwslowdown monitors NVIDIA GPU hardware clock events of all GPUs, such as HW Slowdown events. |
|
accelerator/nvidia/infiniband
Package infiniband monitors the infiniband status of the system.
|
Package infiniband monitors the infiniband status of the system. |
|
accelerator/nvidia/infiniband/class
Package class implements the infiniband class sysfs interface.
|
Package class implements the infiniband class sysfs interface. |
|
accelerator/nvidia/infiniband/store
Package store stores infiniband states in time-series.
|
Package store stores infiniband states in time-series. |
|
accelerator/nvidia/infiniband/types
Package types contains shared types for the infiniband package to avoid import cycles.
|
Package types contains shared types for the infiniband package to avoid import cycles. |
|
accelerator/nvidia/memory
Package memory tracks the NVIDIA per-GPU memory usage.
|
Package memory tracks the NVIDIA per-GPU memory usage. |
|
accelerator/nvidia/nccl
Package nccl monitors the NCCL status.
|
Package nccl monitors the NCCL status. |
|
accelerator/nvidia/nvlink
Package nvlink monitors the NVIDIA per-GPU nvlink devices.
|
Package nvlink monitors the NVIDIA per-GPU nvlink devices. |
|
accelerator/nvidia/peermem
Package peermem monitors the peermem module status.
|
Package peermem monitors the peermem module status. |
|
accelerator/nvidia/persistence-mode
Package persistencemode tracks the NVIDIA persistence mode.
|
Package persistencemode tracks the NVIDIA persistence mode. |
|
accelerator/nvidia/power
Package power tracks the NVIDIA per-GPU power usage.
|
Package power tracks the NVIDIA per-GPU power usage. |
|
accelerator/nvidia/processes
Package processes tracks the NVIDIA per-GPU processes.
|
Package processes tracks the NVIDIA per-GPU processes. |
|
accelerator/nvidia/remapped-rows
Package remappedrows tracks the NVIDIA per-GPU remapped rows.
|
Package remappedrows tracks the NVIDIA per-GPU remapped rows. |
|
accelerator/nvidia/sxid
Package sxid tracks the NVIDIA GPU SXid errors scanning the kmsg.
|
Package sxid tracks the NVIDIA GPU SXid errors scanning the kmsg. |
|
accelerator/nvidia/temperature
Package temperature tracks the NVIDIA per-GPU temperatures.
|
Package temperature tracks the NVIDIA per-GPU temperatures. |
|
accelerator/nvidia/utilization
Package utilization tracks the NVIDIA per-GPU utilization.
|
Package utilization tracks the NVIDIA per-GPU utilization. |
|
accelerator/nvidia/xid
Package xid tracks the NVIDIA GPU Xid errors scanning the kmsg See Xid messages https://docs.nvidia.com/deploy/gpu-debug-guidelines/index.html#xid-messages.
|
Package xid tracks the NVIDIA GPU Xid errors scanning the kmsg See Xid messages https://docs.nvidia.com/deploy/gpu-debug-guidelines/index.html#xid-messages. |
|
all
Package all contains all the components.
|
Package all contains all the components. |
|
containerd
Package containerd tracks the current containerd status.
|
Package containerd tracks the current containerd status. |
|
cpu
Package cpu tracks the combined usage of all CPUs (not per-CPU).
|
Package cpu tracks the combined usage of all CPUs (not per-CPU). |
|
disk
Package disk tracks the disk usage of all the mount points specified in the configuration.
|
Package disk tracks the disk usage of all the mount points specified in the configuration. |
|
docker
Package docker tracks the current docker status.
|
Package docker tracks the current docker status. |
|
fuse
Package fuse monitors the FUSE (Filesystem in Userspace).
|
Package fuse monitors the FUSE (Filesystem in Userspace). |
|
kernel-module
Package kernelmodule provides a component that checks the kernel modules in Linux.
|
Package kernelmodule provides a component that checks the kernel modules in Linux. |
|
kubelet
Package kubelet tracks the current kubelet status.
|
Package kubelet tracks the current kubelet status. |
|
library
Package library provides a component that returns healthy if and only if all the specified libraries exist.
|
Package library provides a component that returns healthy if and only if all the specified libraries exist. |
|
memory
Package memory tracks the memory usage of the host.
|
Package memory tracks the memory usage of the host. |
|
network/latency
Package latency tracks the global network connectivity statistics.
|
Package latency tracks the global network connectivity statistics. |
|
nfs
Package nfs writes to and reads from the specified NFS mount points.
|
Package nfs writes to and reads from the specified NFS mount points. |
|
os
Package os queries the host OS information (e.g., kernel version).
|
Package os queries the host OS information (e.g., kernel version). |
|
pci
Package pci tracks the PCI devices and their Access Control Services (ACS) status.
|
Package pci tracks the PCI devices and their Access Control Services (ACS) status. |
|
tailscale
Package tailscale tracks the current tailscale status.
|
Package tailscale tracks the current tailscale status. |
|
docs
|
|
|
e2e
|
|
|
Package pkg contains a set of generic Go packages that are useful to gpud and possibly to other projects.
|
Package pkg contains a set of generic Go packages that are useful to gpud and possibly to other projects. |
|
config
Package config provides the gpud configuration data for the server.
|
Package config provides the gpud configuration data for the server. |
|
custom-plugins
Package customplugins provides a way to register and run custom plugins.
|
Package customplugins provides a way to register and run custom plugins. |
|
disk
Package disk provides utilities for disk operations.
|
Package disk provides utilities for disk operations. |
|
errdefs
Package errdefs provides common error definitions for gpud.
|
Package errdefs provides common error definitions for gpud. |
|
fault-injector
Package faultinjector provides a way to inject failures into the system.
|
Package faultinjector provides a way to inject failures into the system. |
|
file
Package file implements file utils.
|
Package file implements file utils. |
|
fuse
Package fuse provides a client for the FUSE (Filesystem in Userspace) protocol.
|
Package fuse provides a client for the FUSE (Filesystem in Userspace) protocol. |
|
gpud-manager/systemd
Package systemd provides the systemd artifacts and variables for the gpud server.
|
Package systemd provides the systemd artifacts and variables for the gpud server. |
|
host
Package host provides the host information.
|
Package host provides the host information. |
|
httputil
Package httputil provides utilities for HTTP requests.
|
Package httputil provides utilities for HTTP requests. |
|
kmsg/writer
Package writer implements the kmsg writer.
|
Package writer implements the kmsg writer. |
|
log
Package log provides the logging functionality for gpud.
|
Package log provides the logging functionality for gpud. |
|
login
Package login provides login functionality for GPUd.
|
Package login provides login functionality for GPUd. |
|
machine-info
Package machineinfo provides information about the machine.
|
Package machineinfo provides information about the machine. |
|
memory
Package memory provides utilities for memory usage.
|
Package memory provides utilities for memory usage. |
|
metadata
Package metadata provides the persistent storage layer for GPUd metadata.
|
Package metadata provides the persistent storage layer for GPUd metadata. |
|
metrics/recorder
Package recorder records internal GPUd metrics to Prometheus.
|
Package recorder records internal GPUd metrics to Prometheus. |
|
metrics/scraper
Package scraper scrapes internal GPUd metrics from Prometheus.
|
Package scraper scrapes internal GPUd metrics from Prometheus. |
|
metrics/store
Package store provides the persistent storage layer for the metrics.
|
Package store provides the persistent storage layer for the metrics. |
|
metrics/syncer
Package syncer provides a syncer for the metrics.
|
Package syncer provides a syncer for the metrics. |
|
netutil
Package netutil provides utility functions for network operations.
|
Package netutil provides utility functions for network operations. |
|
netutil/latency
Package latency contains logic for egress traffic from each device.
|
Package latency contains logic for egress traffic from each device. |
|
netutil/latency/edge
Package edge provides a client for the Tailscale DERP (Designated Edge Router Protocol) service.
|
Package edge provides a client for the Tailscale DERP (Designated Edge Router Protocol) service. |
|
netutil/latency/edge/derpmap
Package derpmap provides the tailscale derp map implementation.
|
Package derpmap provides the tailscale derp map implementation. |
|
netutil/latency/edge/derpmap/sync
command
"sync" syncs the tailscale derp map.
|
"sync" syncs the tailscale derp map. |
|
nfs-checker
Package nfschecker checks the health of the NFS mount points.
|
Package nfschecker checks the health of the NFS mount points. |
|
nvidia-query/nvml
Package nvml implements the NVIDIA Management Library (NVML) interface.
|
Package nvml implements the NVIDIA Management Library (NVML) interface. |
|
nvidia-query/nvml/device
Package device provides a wrapper around the "github.com/NVIDIA/go-nvlib/pkg/nvlib/device".Device type that adds a PCIBusID and UUID method, with support for test failure injection.
|
Package device provides a wrapper around the "github.com/NVIDIA/go-nvlib/pkg/nvlib/device".Device type that adds a PCIBusID and UUID method, with support for test failure injection. |
|
nvidia-query/nvml/lib
Package lib implements the NVIDIA Management Library (NVML) interface.
|
Package lib implements the NVIDIA Management Library (NVML) interface. |
|
osutil
Package osutil provides utilities for the operating system.
|
Package osutil provides utilities for the operating system. |
|
process
Package process provides the process runner implementation on the host.
|
Package process provides the process runner implementation on the host. |
|
process/examples/simple-command
command
|
|
|
process/examples/simple-commands
command
|
|
|
process/examples/stream-commands
command
|
|
|
providers
Package providers contains machine/cloud providers.
|
Package providers contains machine/cloud providers. |
|
providers/all
Package all provides a list of known providers.
|
Package all provides a list of known providers. |
|
providers/aws
Package aws implements "AWS" provider and helpers.
|
Package aws implements "AWS" provider and helpers. |
|
providers/aws/imds
Package imds provides functions for interacting with the AWS Instance Metadata Service.
|
Package imds provides functions for interacting with the AWS Instance Metadata Service. |
|
providers/azure
Package azure implements "azure" provider and helpers.
|
Package azure implements "azure" provider and helpers. |
|
providers/azure/imds
Package imds provides functions for interacting with the Azure Instance Metadata Service.
|
Package imds provides functions for interacting with the Azure Instance Metadata Service. |
|
providers/gcp
Package gcp implements Google Cloud Platform (GCP) provider and helpers.
|
Package gcp implements Google Cloud Platform (GCP) provider and helpers. |
|
providers/gcp/imds
Package imds provides functions for interacting with the Google Cloud Platform Instance Metadata Service.
|
Package imds provides functions for interacting with the Google Cloud Platform Instance Metadata Service. |
|
pstore
Package pstore provides operations for Linux pstore, mainly to read the pstore log on reboot.
|
Package pstore provides operations for Linux pstore, mainly to read the pstore log on reboot. |
|
release
Package release provides utilities for releasing new versions of gpud.
|
Package release provides utilities for releasing new versions of gpud. |
|
release/distsign
Package distsign implements signature and validation of arbitrary distributable files.
|
Package distsign implements signature and validation of arbitrary distributable files. |
|
session/states
Package states provides tracking of login success and failure events as well as the state of ongoing session loops (token expiration, etc.).
|
Package states provides tracking of login success and failure events as well as the state of ongoing session loops (token expiration, etc.). |
|
sqlite
Package sqlite provides a SQLite3 database utils.
|
Package sqlite provides a SQLite3 database utils. |
|
systemd
Package systemd provides the common systemd helper functions.
|
Package systemd provides the common systemd helper functions. |
|
update
Package update provides the update functionality for the server.
|
Package update provides the update functionality for the server. |
|
uptime
Package uptime provides utilities for uptime.
|
Package uptime provides utilities for uptime. |
|
Package version provides the version information for the gpud server.
|
Package version provides the version information for the gpud server. |
Click to show internal directories.
Click to hide internal directories.