sys

package
v1.4.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2026 License: MIT Imports: 15 Imported by: 2

README

sys package

The sys package provides lightweight runtime visibility into host and container resources, including:

  • CPU count and load estimation
  • memory statistics
  • container-aware reporting on Linux (cgroup v2 and v1)
  • simple, dependency-free fallbacks when preferred sources are unavailable

Table of Contents

Initialization

Package initialization is split into two phases:

  • init(): sets NumCPU to runtime.NumCPU() - always safe, no dependencies.
  • Init(contForced bool): performs container auto-detection, applies cgroup-aware CPU count, adjusts GOMAXPROCS. External modules may skip calling Init() and still get a sane NumCPU().

A package-level cgroupVer variable (0, 1, or 2) is set once during Init() and drives all subsequent CPU and memory reads. This avoids re-probing cgroup paths on every sample.

The ForceContainerCPUMem feature flag (in cmn/feat) can override failed auto-detection for deployments where the heuristics miss. Requires restart.

CPU reporting

Function Description
NumCPU() effective CPU count (container-aware)
MaxParallelism() derives internal parallelism from NumCPU()
Refresh(now, periodic) samples CPU, updates ring and EMA; returns (util, throttled, error)
CPU(periodic) returns smoothed CPU utilization percentage (0–100) plus isExtreme boolean
HighLoadWM() high-load watermark derived from CPU count
LoadAverage() system load averages (fallback only)
Effective CPU count

At startup, the package initializes a process-wide CPU count:

  • default: runtime.NumCPU()
  • container override: cgroup-based CPU quota when detected

The cgroup version is determined once during package's Init() call:

  1. Try cgroup v2 (cpu.max).
  2. Fall back to cgroup v1 (cpu.cfs_quota_us / cpu.cfs_period_us).
  3. If both fail or are missing, keep runtime.NumCPU().

Errors from both paths are aggregated and reported to stderr (nlog is not yet available at init time).

Utilization model

CPU utilization is sampled as a delta of cumulative CPU time over wall-clock time. Utilization is computed as:

(delta_cpu_usage * 100) / (delta_wall_time * NumCPU)

All values are integer percentages. Utilization and throttling are computed atomically in a single pass from the same time delta.

Sampling and smoothing

CPU samples are collected in a circular ring of 4 entries, each containing cumulative usage, throttled time, and a monotonic timestamp. The ring is advanced on each Refresh() call; instantaneous utilization is computed as a delta between the current and previous ring entries.

Refresh() is called from multiple paths:

  • on-demand (gated at minIvalLong, 8s): by CPU() callers that need a current value but aren't periodic
  • periodic (gated at MinIvalShort, 2s): piggybacked on three existing ticks:
    • ios.refresh() (disk stats)
    • stats-runner (periodic.stats_time), and
    • memsys housekeeping callback

When gated, Refresh() returns the cached EMA value without reading /proc/stat or cgroup files.

Raw instantaneous utilization is smoothed using a time-scaled exponential moving average (EMA). The smoothing alpha is adjusted based on the elapsed time since the previous sample:

For details, see compute() method in sys/cpu.go

The smoothed value and throttled percentage are stored atomically; CPU() reads them without locking.

This approach is reminiscent of the disk utilization smoothing in the ios package, but it is much simpler: CPU has a single global value (not per-mountpath), so no ring walk or per-device map lookup is needed.

Linux source hierarchy

The cgroup version is determined at init time and stored in cgroupVer. The cpu.read() method switches on it - there is no fallback cascade per sample:

cgroupVer Source Fields
2 cpu.stat usage_usec, throttled_usec
1 cpuacct.usage cumulative nanoseconds
0 /proc/stat aggregate jiffy line

If all sources fail at read time, CPU() falls back to /proc/loadavg converted to a percentage.

Bare-metal /proc/stat parsing

The aggregate cpu line from /proc/stat is parsed using a whitelist of fields:

  • user (1), nice (2), system (3), irq (6), softirq (7), steal (8)

Explicitly excluded:

  • idle (4), iowait (5): not active CPU time
  • guest (9), guest_nice (10): already included in user and nice by the kernel - summing them would double-count

Steal is included because it represents CPU time unavailable to the node, which is what load-based throttling and worker tuning need to make decisions.

CPU starvation vs utilization

CPU() distinguishes between:

  • high utilization: CPU is busy
  • CPU starvation: the container is being throttled

In cgroup v2 environments, throttled_usec from cpu.stat is tracked as a percentage of wall-clock time. If throttling exceeds the extreme threshold (>10%), the system is reported as under extreme CPU pressure - even if raw utilization appears moderate.

This is intentional: throttling indicates lack of CPU availability, not just high usage.

Operational thresholds are compile-time constants: HighLoad (85%), ExtremeLoad (95%), and throttleExtremeThresh (10%).

Memory reporting

Function Description
MemStat.Get() populates memory statistics
MemStat.Str() formats a compact summary

Get() switches on cgroupVer and calls the appropriate stateless reader:

cgroupVer Reader Source
2 readMemCgroupV2() memory.max, memory.current, memory.stat
1 readMemCgroupV1() memory.limit_in_bytes, memory.usage_in_bytes, memory.stat
0 readMemHost() /proc/meminfo

All readers are stateless free functions returning (MemStat, error).

Host memory

Host memory is read from /proc/meminfo. Fields used:

  • MemTotal, MemFree, MemAvailable, Cached, Buffers, SwapTotal, SwapFree

If MemAvailable is not present (older kernels), ActualFree falls back to MemFree + BuffCache.

Container memory (cgroup v2)
  1. memory.max - limit in bytes, or "max" (no limit → fall back to host)
  2. memory.current - current usage including kernel caches
  3. memory.stat - inactive_file used as reclaimable cache (BuffCache)

Derived: ActualUsed = Used - BuffCache, ActualFree = Total - ActualUsed.

Usage is capped at the limit to handle transient kernel overshoot before OOM.

Container memory (cgroup v1)
  1. memory.limit_in_bytes - values > MaxInt64/2 treated as "no limit" (fall back to host)
  2. memory.usage_in_bytes - current usage
  3. memory.stat - total_cache used as reclaimable cache

Swap statistics are always host statistics regardless of cgroup version.

Container detection

Container detection uses a best-effort heuristic at init time:

  1. Check for /.dockerenv
  2. Scan /proc/1/cgroup for markers: docker, containerd, kubepods, kube, lxc, libpod, podman

If auto-detection fails but the deployment is known to be containerized, set the ForceContainerCPUMem feature flag. This forces cgroup-based CPU and memory accounting. Requires restart.

Fallback

The package follows these rules:

  • preferred source first, determined once at init time
  • degrade to older or coarser source when the preferred source is unavailable
  • for CPU: preserve a usable percentage whenever possible
  • for memory: prefer host stats over failing when container-specific files cannot be read

Example: testing sys package inside a constrained container

A simple way to validate container-aware CPU and memory reporting is to compare the same test run on the host and inside a Docker container with explicit CPU and memory limits.

Host run
go test -v -tags=debug

Example output:

=== RUN   TestNumCPU
--- PASS: TestNumCPU (0.00s)
=== RUN   TestLoadAvg
    sys_test.go:41: Load average: 0.63, 0.49, 0.49
--- PASS: TestLoadAvg (0.00s)
=== RUN   TestMaxProcs
--- PASS: TestMaxProcs (0.00s)
=== RUN   TestMemoryStats
    sys_test.go:76: Memory stats: {used 29GiB, free 2GiB, buffcache 20GiB, actfree 23GiB}
    sys_test.go:79: Either swap is off or failed to read its stats
--- PASS: TestMemoryStats (0.00s)
=== RUN   TestProcAndMaxLoad
    sys_test.go:110: First call: load=0, extreme=false
    ...
    sys_test.go:133: Second call: load=3, extreme=false
    sys_test.go:145: Process CPU usage:   1.85%
--- PASS: TestProcAndMaxLoad (5.69s)
PASS
ok      github.com/NVIDIA/aistore/sys   5.696s
Container run
docker run --rm \
  --cpus=1.5 \
  --memory=512m \
  -v "$PWD":/src -w /src \
  -v "$HOME/go/pkg/mod":/go/pkg/mod \
  -v "$HOME/.cache/go-build":/root/.cache/go-build \
  golang:1.26 \
  go test ./sys -run . -v -count=1 2>&1

Example output:

=== RUN   TestNumCPU
--- PASS: TestNumCPU (0.00s)
=== RUN   TestLoadAvg
    sys_test.go:41: Load average: 0.36, 0.41, 0.47
--- PASS: TestLoadAvg (0.00s)
=== RUN   TestMaxProcs
--- PASS: TestMaxProcs (0.00s)
=== RUN   TestMemoryStats
    sys_test.go:76: Memory stats: {used 29MiB, free 483MiB, buffcache 152KiB, actfree 483MiB}
    sys_test.go:79: Either swap is off or failed to read its stats
--- PASS: TestMemoryStats (0.00s)
=== RUN   TestProcAndMaxLoad
    sys_test.go:110: First call: load=0, extreme=false
    ...
    sys_test.go:133: Second call: load=15, extreme=false
    sys_test.go:145: Process CPU usage:  14.82%
--- PASS: TestProcAndMaxLoad (5.70s)

The comparison illustrates several points:

  • On the host, TestMemoryStats reports host-scale memory totals.
  • Inside the container, the same test reports memory bounded by the cgroup limit (--memory=512m) rather than the host's physical RAM.
  • TestNumCPU and TestMaxProcs exercise init-time CPU detection and container-aware CPU count.
  • TestProcAndMaxLoad burns CPU in-process and verifies that CPU() reports non-zero utilization on a subsequent sample.
  • Swap may report as zero or be unavailable inside short-lived containers; this is not unusual.

To further confirm container-scoped memory accounting, rerun the container example with a different limit (for example, --memory=4G). TestMemoryStats should then report a total close to 4 GiB instead of 512 MiB.

The container example above assumes cgroup v2 - the default on modern Linux distributions and container runtimes.

Current limitations and future plans

cgroup v1 deprecation

Note that cgroup v1 support is deprecated and will be removed in a future (post-4.4) releases.

All major container runtimes and orchestrators now default to cgroup v2.

References

Documentation

Overview

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Package sys provides helpers to read system info (CPU, memory, loadavg, processes) with support for cgroup v2 and a moving-average CPU estimator.

  • Copyright (c) 2018-2026, NVIDIA CORPORATION. All rights reserved.

Index

Constants

View Source
const (
	ExtremeLoad = 95
	HighLoad    = 85
)

CPU utilization thresholds (percentage, 0-100) HighLoad < HighLoadWM() < ExtremeLoad

View Source
const (
	// cpu samples ring
	MinIvalShort = 2 * time.Second
)

Variables

This section is empty.

Functions

func CPU added in v1.4.4

func CPU(periodic bool) (load int64, isExtreme bool)

func HighLoadWM added in v1.3.26

func HighLoadWM() int64

HighLoadWM: "high-load watermark" as a percentage. For 8 CPUs: max(100 - 100/8, 1) = 88 - between HighLoad(82) and ExtremeLoad(92). see also: (ExtremeLoad, HighLoad) defaults

func Init added in v1.4.4

func Init(forceCont bool) string

container-aware CPU count; GOMAXPROCS - AIS node (`aisnode`) calls Init() once upon startup - external modules that skip it still get a sane NumCPU() - see above - return ("cgroup-v2", etc.) enumerated tags

func MaxParallelism added in v1.3.26

func MaxParallelism() int

number of intra-cluster broadcasting goroutines

func NumCPU

func NumCPU() int

func ProcFDSize added in v1.3.31

func ProcFDSize() int

func Refresh added in v1.4.4

func Refresh(now int64, periodic bool) (int64, int64, error)

Types

type LoadAvg

type LoadAvg struct {
	One, Five, Fifteen float64
}

func LoadAverage

func LoadAverage() (avg LoadAvg, _ error)

type MemStat

type MemStat struct {
	Total      uint64
	Used       uint64
	Free       uint64
	BuffCache  uint64
	ActualFree uint64
	ActualUsed uint64
	SwapTotal  uint64
	SwapFree   uint64
	SwapUsed   uint64
}

func (*MemStat) Get

func (mem *MemStat) Get() error

func (*MemStat) Str added in v1.3.26

func (mem *MemStat) Str(sb *cos.SB)

type ProcCPUStats

type ProcCPUStats struct {
	User     uint64
	System   uint64
	Total    uint64
	LastTime int64
	Percent  float64 // lifetime-average CPU percent since process start
}

type ProcMemStats

type ProcMemStats struct {
	Size     uint64
	Resident uint64
	Share    uint64
}

type ProcStats

type ProcStats struct {
	CPU ProcCPUStats
	Mem ProcMemStats
}

func ProcessStats

func ProcessStats(pid int) (ProcStats, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL