vm_metrics

package
v0.0.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: MIT Imports: 12 Imported by: 0

README

VM Metrics

This package provides real-time resource utilization metrics for VMs managed by Hypeman.

Overview

VM metrics are collected from the host's perspective by reading:

  • /proc/<pid>/stat - CPU time (user + system) for the hypervisor process
  • /proc/<pid>/statm - Memory usage (RSS and VMS) for the hypervisor process
  • /sys/class/net/<tap>/statistics/ - Network I/O from TAP interfaces

This approach works for both Cloud Hypervisor and QEMU without requiring any in-guest agents.

Metrics

Metric Type Description
hypeman_vm_cpu_seconds_total Counter Total CPU time consumed by VM
hypeman_vm_allocated_vcpus Gauge Number of vCPUs allocated
hypeman_vm_memory_rss_bytes Gauge Resident Set Size (physical memory)
hypeman_vm_memory_vms_bytes Gauge Virtual Memory Size
hypeman_vm_allocated_memory_bytes Gauge Total allocated memory
hypeman_vm_network_rx_bytes_total Counter Network bytes received
hypeman_vm_network_tx_bytes_total Counter Network bytes transmitted
hypeman_vm_memory_utilization_ratio Gauge RSS / allocated memory

All metrics include instance_id and instance_name labels.

API Endpoint

GET /instances/{id}/stats

Returns current utilization for a specific instance:

{
  "instance_id": "abc123",
  "instance_name": "my-vm",
  "cpu_seconds": 42.5,
  "memory_rss_bytes": 536870912,
  "memory_vms_bytes": 4294967296,
  "network_rx_bytes": 1048576,
  "network_tx_bytes": 524288,
  "allocated_vcpus": 2,
  "allocated_memory_bytes": 4294967296,
  "memory_utilization_ratio": 0.125
}

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                          Host                                   │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐      │
│  │ /proc/<pid>  │    │ /proc/<pid>  │    │ /sys/class/  │      │
│  │    /stat     │    │   /statm     │    │ net/<tap>/   │      │
│  │  (CPU time)  │    │  (memory)    │    │ statistics/  │      │
│  └──────┬───────┘    └──────┬───────┘    └──────┬───────┘      │
│         │                   │                   │               │
│         └───────────────────┼───────────────────┘               │
│                             │                                   │
│                    ┌────────▼────────┐                          │
│                    │  vm_metrics     │                          │
│                    │    Manager      │                          │
│                    └────────┬────────┘                          │
│                             │                                   │
│              ┌──────────────┼──────────────┐                    │
│              │              │              │                    │
│       ┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐              │
│       │  OTel/OTLP  │ │  REST API │ │  Grafana  │              │
│       │  Exporter   │ │ /stats    │ │ Dashboard │              │
│       └─────────────┘ └───────────┘ └───────────┘              │
└─────────────────────────────────────────────────────────────────┘

Limitations

These metrics measure the hypervisor process, not the guest OS:

  • CPU: Time spent by the hypervisor process, not guest CPU utilization
  • Memory RSS: Physical memory used by hypervisor, closely correlates with guest memory
  • Memory VMS: Virtual address space of hypervisor process
  • Network: Bytes through TAP interface (accurate for guest traffic)

For detailed in-guest metrics (per-process CPU, filesystem usage, etc.), consider running an exporter like Prometheus Node Exporter inside the guest.

Usage

// Create manager
mgr := vm_metrics.NewManager()

// Set instance source (implements InstanceSource interface)
mgr.SetInstanceSource(instanceManager)

// Initialize OTel metrics (optional)
meter := otel.GetMeterProvider().Meter("hypeman")
if err := mgr.InitializeOTel(meter); err != nil {
    return err
}

// Get stats for a specific instance
info := vm_metrics.BuildInstanceInfo(
    inst.Id, 
    inst.Name, 
    inst.HypervisorPID,
    inst.NetworkEnabled,
    inst.Vcpus,
    inst.Size + inst.HotplugSize,
)
stats := mgr.GetInstanceStats(ctx, info)

Prometheus Queries

# CPU utilization rate (per vCPU)
rate(hypeman_vm_cpu_seconds_total[1m]) / hypeman_vm_allocated_vcpus

# Memory utilization percentage
hypeman_vm_memory_rss_bytes / hypeman_vm_allocated_memory_bytes * 100

# Network throughput (bytes/sec)
rate(hypeman_vm_network_rx_bytes_total[1m])
rate(hypeman_vm_network_tx_bytes_total[1m])

Documentation

Overview

Package vm_metrics provides real-time resource utilization metrics for VMs. It collects CPU, memory, and network statistics from the host's perspective by reading /proc/<pid>/stat, /proc/<pid>/statm, and TAP interface statistics.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ReadProcStat

func ReadProcStat(pid int) (uint64, error)

ReadProcStat reads CPU time from /proc/<pid>/stat. Returns total CPU time (user + system) in microseconds. Fields 14 and 15 are utime and stime in clock ticks.

func ReadProcStatm

func ReadProcStatm(pid int) (rssBytes, vmsBytes uint64, err error)

ReadProcStatm reads memory stats from /proc/<pid>/statm. Returns RSS (resident set size) and VMS (virtual memory size) in bytes. Format: size resident shared text lib data dt (all in pages)

func ReadTAPStats

func ReadTAPStats(tapName string) (rxBytes, txBytes uint64, err error)

ReadTAPStats reads network statistics from a TAP device. Reads from /sys/class/net/<tap>/statistics/{rx,tx}_bytes. Note: Returns stats from host perspective. Caller must swap for VM perspective: - rxBytes = host receives = VM transmits - txBytes = host transmits = VM receives

Types

type InstanceData

type InstanceData struct {
	ID                   string
	Name                 string
	HypervisorPID        *int
	NetworkEnabled       bool
	AllocatedVcpus       int
	AllocatedMemoryBytes int64
}

InstanceData contains the minimal instance data needed for metrics.

type InstanceInfo

type InstanceInfo struct {
	ID            string
	Name          string
	HypervisorPID *int   // PID of the hypervisor process (nil if not running)
	TAPDevice     string // Name of the TAP device (e.g., "hype-01234567")

	// Allocated resources
	AllocatedVcpus       int   // Number of allocated vCPUs
	AllocatedMemoryBytes int64 // Allocated memory in bytes (Size + HotplugSize)
}

InstanceInfo contains the minimal info needed to collect VM metrics. This is provided by the instances package.

func BuildInstanceInfo

func BuildInstanceInfo(id, name string, pid *int, networkEnabled bool, vcpus int, memoryBytes int64) InstanceInfo

BuildInstanceInfo creates an InstanceInfo from instance metadata. This is a helper for the API layer to avoid duplicating TAP name logic.

type InstanceListerAdapter

type InstanceListerAdapter struct {
	// contains filtered or unexported fields
}

InstanceListerAdapter adapts an instance lister that provides Instance structs to the vm_metrics.InstanceSource interface. This is useful when you need to build InstanceInfo directly from Instance data.

func NewInstanceListerAdapter

func NewInstanceListerAdapter(listFunc func(ctx context.Context) ([]InstanceData, error)) *InstanceListerAdapter

NewInstanceListerAdapter creates an adapter with a custom list function.

func (*InstanceListerAdapter) ListRunningInstancesForMetrics

func (a *InstanceListerAdapter) ListRunningInstancesForMetrics() ([]InstanceInfo, error)

ListRunningInstancesForMetrics implements InstanceSource.

type InstanceManagerAdapter

type InstanceManagerAdapter struct {
	// contains filtered or unexported fields
}

InstanceManagerAdapter adapts an instance manager that returns resources.InstanceUtilizationInfo to the vm_metrics.InstanceSource interface.

func NewInstanceManagerAdapter

func NewInstanceManagerAdapter(manager interface {
	ListRunningInstancesInfo(ctx context.Context) ([]resources.InstanceUtilizationInfo, error)
}) *InstanceManagerAdapter

NewInstanceManagerAdapter creates an adapter for the given instance manager.

func (*InstanceManagerAdapter) ListRunningInstancesForMetrics

func (a *InstanceManagerAdapter) ListRunningInstancesForMetrics() ([]InstanceInfo, error)

ListRunningInstancesForMetrics implements InstanceSource.

type InstanceSource

type InstanceSource interface {
	// ListRunningInstancesForMetrics returns info for all running instances.
	ListRunningInstancesForMetrics() ([]InstanceInfo, error)
}

InstanceSource provides access to running instance information. Implemented by instances.Manager.

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager collects and exposes VM resource utilization metrics. It reads from /proc and TAP interfaces to gather real-time statistics.

func NewManager

func NewManager() *Manager

NewManager creates a new VM metrics manager.

func (*Manager) CollectAll

func (m *Manager) CollectAll(ctx context.Context) ([]VMStats, error)

CollectAll gathers metrics for all running VMs. Used by OTel metrics callback.

func (*Manager) GetInstanceStats

func (m *Manager) GetInstanceStats(ctx context.Context, info InstanceInfo) *VMStats

GetInstanceStats collects metrics for a single instance. Returns nil if the instance is not running or stats cannot be collected.

func (*Manager) InitializeOTel

func (m *Manager) InitializeOTel(meter metric.Meter) error

InitializeOTel sets up OpenTelemetry metrics. If meter is nil, OTel metrics are disabled.

func (*Manager) SetInstanceSource

func (m *Manager) SetInstanceSource(source InstanceSource)

SetInstanceSource sets the source for instance information. Must be called before collecting metrics.

type VMStats

type VMStats struct {
	InstanceID   string
	InstanceName string

	// CPU stats (from /proc/<pid>/stat)
	CPUUsec uint64 // Total CPU time in microseconds (user + system)

	// Memory stats (from /proc/<pid>/statm)
	MemoryRSSBytes uint64 // Resident Set Size - actual physical memory used
	MemoryVMSBytes uint64 // Virtual Memory Size - total allocated virtual memory

	// Network stats (from TAP interface)
	NetRxBytes uint64 // Total network bytes received
	NetTxBytes uint64 // Total network bytes transmitted

	// Allocated resources (for computing utilization ratios)
	AllocatedVcpus       int   // Number of allocated vCPUs
	AllocatedMemoryBytes int64 // Allocated memory in bytes
}

VMStats holds resource utilization metrics for a single VM. These are point-in-time values collected from the hypervisor process.

func (*VMStats) CPUSeconds

func (s *VMStats) CPUSeconds() float64

CPUSeconds returns CPU time in seconds (for API responses).

func (*VMStats) MemoryUtilizationRatio

func (s *VMStats) MemoryUtilizationRatio() *float64

MemoryUtilizationRatio returns RSS / allocated memory (0.0 to 1.0+). Returns nil if allocated memory is 0.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL