nvme

package
v0.45.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2022 License: GPL-3.0 Imports: 13 Imported by: 0

README

NVMe devices monitoring with Netdata

Monitors health metrics (estimated endurance, space capacity, critical warnings, temperature, etc.) using the nvme CLI tool.

The module uses nvme, which can only be executed by root. It uses sudo and assumes that it is configured such that the netdata user can execute nvme as root without a password.

Requirements

  • Install nvme-cli.

  • Add the netdata user to the /etc/sudoers file (use which nvme to find the full path to the binary):

    netdata ALL=(root) NOPASSWD: /usr/sbin/nvme
    

Additionally, you may need to adjust Netdata's system unit on Linux distributions using systemd.

Note: This is an optional step. Only do this if adding netdata to /etc/sudoers didn't help.

The default CapabilityBoundingSet doesn't allow using sudo, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute nvme using sudo.

As the root user, do the following:

mkdir /etc/systemd/system/netdata.service.d
echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf
systemctl daemon-reload
systemctl restart netdata.service

Metrics

All metrics have "nvme." prefix.

Labels per scope:

  • device: device.
Metric Scope Dimensions Units
device_estimated_endurance_perc device used %
device_available_spare_perc device spare %
device_composite_temperature device temperature celsius
device_io_transferred_count device read, written bytes
device_power_cycles_count device power cycles
device_power_on_time device power-on seconds
device_critical_warnings_state device available_spare, temp_threshold, nvm_subsystem_reliability, read_only, volatile_mem_backup_failed, persistent_memory_read_only state
device_unsafe_shutdowns_count device unsafe shutdowns
device_media_errors_rate device media errors/s
device_error_log_entries_rate device error_log entries/s
device_warning_composite_temperature_time device wctemp seconds
device_critical_composite_temperature_time device cctemp seconds
device_thermal_mgmt_temp1_transitions_rate device temp1 transitions/s
device_thermal_mgmt_temp2_transitions_rate device temp2 transitions/s
device_thermal_mgmt_temp1_time device temp1 seconds
device_thermal_mgmt_temp2_time device temp2 seconds

Configuration

No configuration required.

Troubleshooting

To troubleshoot issues with the nvme collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
    
  • Switch to the netdata user.

    sudo -u netdata -s
    
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m nvme
    

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	Timeout    web.Duration
	BinaryPath string `yaml:"binary_path"`
}

type NVMe

type NVMe struct {
	module.Base
	Config `yaml:",inline"`
	// contains filtered or unexported fields
}

func New

func New() *NVMe

func (*NVMe) Charts

func (n *NVMe) Charts() *module.Charts

func (*NVMe) Check

func (n *NVMe) Check() bool

func (*NVMe) Cleanup

func (n *NVMe) Cleanup()

func (*NVMe) Collect

func (n *NVMe) Collect() map[string]int64

func (*NVMe) Init

func (n *NVMe) Init() bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL