watchdog

package
v1.0.27 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2026 License: MPL-2.0 Imports: 22 Imported by: 0

Documentation

Overview

============================================================================= NFTBan v1.0 - Dynamic Watchdog Package ============================================================================= SPDX-License-Identifier: MPL-2.0

Package watchdog implements a dynamic runtime watchdog for nftban that:

  • Continuously monitors system + process + kernel/netfilter + nftables signals
  • Computes a pressure state (OK/WARN/CRITICAL) per dimension (CPU, MEM, IO, NET)
  • Automatically aligns behavior based on pressure state
  • Triggers forensic capture (pprof) during incidents
  • Exposes metrics in Prometheus format
  • Maintains an on-disk flight recorder for post-mortem analysis

Architecture:

┌─────────────────────────────────────────────────────────────┐
│                     Watchdog Core                           │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                   State Machine                         ││
│  │  NORMAL ←→ DEGRADED ←→ SURVIVAL                         ││
│  └─────────────────────────────────────────────────────────┘│
│                           │                                 │
│     ┌─────────────────────┴─────────────────────┐           │
│     ▼                                           ▼           │
│  ┌─────────────────┐                 ┌─────────────────────┐│
│  │   Collectors    │                 │      Actions        ││
│  │  - Process      │                 │  - Throttle         ││
│  │  - Runtime      │                 │  - Profile          ││
│  │  - System       │                 │  - Memory Valve     ││
│  │  - Netfilter    │                 │  - Degrade Mode     ││
│  │  - nftables     │                 └─────────────────────┘│
│  └─────────────────┘                                        │
│                           │                                 │
│                           ▼                                 │
│              ┌─────────────────────────┐                    │
│              │    Flight Recorder      │                    │
│              │  - Event JSON           │                    │
│              │  - Periodic Snapshots   │                    │
│              │  - pprof Profiles       │                    │
│              └─────────────────────────┘                    │
└─────────────────────────────────────────────────────────────┘

Pressure Dimensions:

  • CPU: Process CPU%, system load, softnet drops
  • MEM: RSS vs budget, heap vs GOMEMLIMIT, GC fraction, RSS slope
  • IO: iowait%, disk usage, log partition fullness
  • NET: conntrack utilization, softnet drops rate, nft apply latency

Operating Modes:

  • NORMAL: All dimensions OK. Full functionality.
  • DEGRADED: Any dimension WARN. Reduced frequency, backpressure enabled.
  • SURVIVAL: Any dimension CRITICAL. Essential operations only.

Thread Safety:

All components are designed for concurrent access. The watchdog runs
as a single goroutine with collectors running in parallel.

Usage:

cfg := watchdog.DefaultConfig()
w := watchdog.New(cfg, controls)
go w.Run(ctx)

=============================================================================

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Action

type Action struct {
	Type      ActionType `json:"type"`
	Timestamp time.Time  `json:"timestamp"`
	Reason    string     `json:"reason"`
	Details   string     `json:"details,omitempty"`
	Success   bool       `json:"success"`
}

Action represents an action taken by the watchdog

type ActionExecutor

type ActionExecutor struct {
	// contains filtered or unexported fields
}

ActionExecutor handles action execution with cooldowns

func NewActionExecutor

func NewActionExecutor(cfg *Config, controls *RuntimeControls) *ActionExecutor

NewActionExecutor creates a new action executor

func (*ActionExecutor) ApplyMode

func (e *ActionExecutor) ApplyMode(mode Mode, snapshot *Snapshot) error

ApplyMode applies settings for the given mode

func (*ActionExecutor) CaptureCPUProfile

func (e *ActionExecutor) CaptureCPUProfile(reason string, snapshot *Snapshot) (string, error)

CaptureCPUProfile captures a CPU profile

func (*ActionExecutor) CaptureGoroutineProfile

func (e *ActionExecutor) CaptureGoroutineProfile(reason string, snapshot *Snapshot) (string, error)

CaptureGoroutineProfile captures a goroutine profile

func (*ActionExecutor) CaptureHeapProfile

func (e *ActionExecutor) CaptureHeapProfile(reason string, snapshot *Snapshot) (string, error)

CaptureHeapProfile captures a heap profile

func (*ActionExecutor) DisableOptional

func (e *ActionExecutor) DisableOptional(reason string) error

DisableOptional disables optional collectors

func (*ActionExecutor) SetOnAction

func (e *ActionExecutor) SetOnAction(cb func(Action))

SetOnAction sets the callback for when actions are taken

func (*ActionExecutor) Throttle

func (e *ActionExecutor) Throttle(reason string) error

Throttle reduces worker count and disables optional work

func (*ActionExecutor) TryFreeOSMemory

func (e *ActionExecutor) TryFreeOSMemory(
	memCriticalDuration time.Duration,
	cpuPercent float64,
	snapshot *Snapshot,
) (bool, error)

TryFreeOSMemory calls debug.FreeOSMemory if conditions are met Safe conditions:

  • MEM CRITICAL for >= 30s
  • CPU below threshold (not during high load)
  • Cooldown passed

type ActionType

type ActionType string

ActionType represents a watchdog action

const (
	ActionThrottle         ActionType = "throttle"
	ActionDisableOptional  ActionType = "disable_optional"
	ActionProfileCPU       ActionType = "profile_cpu"
	ActionProfileHeap      ActionType = "profile_heap"
	ActionProfileGoroutine ActionType = "profile_goroutine"
	ActionFreeOSMemory     ActionType = "free_os_memory"
	ActionDegradeMode      ActionType = "degrade_mode"
)

type BaseCollector

type BaseCollector struct {
	// contains filtered or unexported fields
}

BaseCollector provides common functionality for collectors

func NewBaseCollector

func NewBaseCollector(name string) BaseCollector

NewBaseCollector creates a new base collector

func (*BaseCollector) Enabled

func (b *BaseCollector) Enabled() bool

Enabled returns whether the collector is enabled

func (*BaseCollector) Name

func (b *BaseCollector) Name() string

Name returns the collector name

func (*BaseCollector) SetEnabled

func (b *BaseCollector) SetEnabled(enabled bool)

SetEnabled sets the enabled state

type Collector

type Collector interface {
	Name() string
	Collect(ctx context.Context, snapshot *Snapshot) error
	Enabled() bool
	SetEnabled(enabled bool)
}

Collector defines the interface for metrics collectors

type Config

type Config struct {
	// Enable/disable watchdog
	Enabled bool

	// Base collection interval (adjusts dynamically)
	BaseInterval time.Duration

	// ==========================================================================
	// Hysteresis Settings (prevents flapping)
	// ==========================================================================
	HysteresisWarnEnter float64 // Enter WARN at this score
	HysteresisWarnExit  float64 // Exit WARN when below this for WarnExitDuration
	HysteresisCritEnter float64 // Enter CRITICAL at this score
	HysteresisCritExit  float64 // Exit CRITICAL when below this for CritExitDuration
	WarnExitDuration    time.Duration
	CritExitDuration    time.Duration

	// Memory
	MemBudgetBytes         int64   // Align with GOMEMLIMIT
	RSSCritPercentOfBudget float64 // RSS critical threshold as % of budget
	HeapCritPercentOfLimit float64 // Heap critical threshold as % of GOMEMLIMIT

	// CPU
	CPUCritPercent     float64 // Process CPU critical threshold
	LoadNormalizedCrit float64 // Load/NumCPU critical threshold

	// IO
	IOWaitCritPercent  float64 // iowait critical threshold
	DiskUseCritPercent float64 // Disk usage critical threshold

	// Network/Conntrack
	ConntrackUtilCrit    float64 // Conntrack utilization critical threshold
	SoftnetDropsRateCrit float64 // Softnet drops per second critical threshold

	// Goroutines
	GoroutinesCrit int // Goroutine count critical threshold

	// ==========================================================================
	// Collector Intervals (dynamic - change based on mode)
	// ==========================================================================
	ProcessInterval      time.Duration // Process/runtime metrics
	SystemInterval       time.Duration // System metrics
	KernelInterval       time.Duration // Kernel/netfilter metrics
	NFTSetInterval       time.Duration // nft set counts (cheap)
	NFTRulesetInterval   time.Duration // nft full ruleset (expensive)
	TopProcessesInterval time.Duration // Top hogs

	// Degraded mode multipliers
	DegradedIntervalMultiplier float64 // Multiply intervals by this in DEGRADED

	// ==========================================================================
	// Profiling Settings
	// ==========================================================================
	ProfileAutoEnabled       bool
	ProfileDir               string
	ProfileCPUCooldown       time.Duration
	ProfileHeapCooldown      time.Duration
	ProfileGoroutineCooldown time.Duration
	ProfileCPUDuration       time.Duration // Duration of CPU profile
	ProfileMaxCount          int           // Max profiles to keep

	// ==========================================================================
	// Memory Valve Settings
	// ==========================================================================
	FreeOSMemoryEnabled        bool
	FreeOSMemoryCooldown       time.Duration
	FreeOSMemoryOnlyIfCPUBelow float64 // Only trigger if CPU below this %

	// ==========================================================================
	// Degradation Settings
	// ==========================================================================
	DegradeDisableNFTRulesetScan bool
	DegradeReduceWorkersTo       int
	SurvivalReduceWorkersTo      int

	// ==========================================================================
	// Flight Recorder Settings
	// ==========================================================================
	RecorderEnabled          bool
	RecorderDir              string
	RecorderMaxEvents        int           // Ring buffer size
	RecorderSnapshotInterval time.Duration // How often to write snapshots
	RecorderRetentionDays    int           // Days to keep old files

	// ==========================================================================
	// Alerting
	// ==========================================================================
	AlertThrottleDuration time.Duration // Don't repeat same alert within this
}

Config holds watchdog configuration

func DefaultConfig

func DefaultConfig() *Config

DefaultConfig returns configuration with sensible defaults

func LoadConfig

func LoadConfig(configPath string) *Config

LoadConfig loads watchdog configuration from file Falls back to defaults for missing values

func (*Config) GetIntervalForMode

func (c *Config) GetIntervalForMode(base time.Duration, mode Mode) time.Duration

GetIntervalForMode returns the adjusted interval for the given mode

func (*Config) Validate

func (c *Config) Validate()

Validate ensures config values are within safe bounds

type Cooldowns

type Cooldowns struct {
	// contains filtered or unexported fields
}

Cooldowns tracks action cooldowns to prevent repeated actions

func NewCooldowns

func NewCooldowns() *Cooldowns

NewCooldowns creates a new cooldown tracker

func (*Cooldowns) CanExecute

func (c *Cooldowns) CanExecute(action ActionType, cooldown time.Duration) bool

CanExecute returns true if the action is not in cooldown

func (*Cooldowns) Record

func (c *Cooldowns) Record(action ActionType)

Record marks an action as executed

func (*Cooldowns) TimeSince

func (c *Cooldowns) TimeSince(action ActionType) time.Duration

TimeSince returns time since last action (or max duration if never)

type Dimension

type Dimension string

Dimension represents a monitored pressure dimension

const (
	DimCPU Dimension = "cpu"
	DimMEM Dimension = "mem"
	DimIO  Dimension = "io"
	DimNET Dimension = "net"
)

func AllDimensions

func AllDimensions() []Dimension

AllDimensions returns all pressure dimensions

type Event

type Event struct {
	Type      EventType `json:"type"`
	Timestamp time.Time `json:"timestamp"`
	Message   string    `json:"message"`
	Snapshot  *Snapshot `json:"snapshot,omitempty"`
	Action    *Action   `json:"action,omitempty"`
	OldMode   Mode      `json:"old_mode,omitempty"`
	NewMode   Mode      `json:"new_mode,omitempty"`
	Dimension Dimension `json:"dimension,omitempty"`
	Score     float64   `json:"score,omitempty"`
	Level     Level     `json:"level,omitempty"`
}

Event represents a recorded event for post-mortem analysis

type EventType

type EventType string

EventType categorizes recorder events

const (
	EventModeChange      EventType = "mode_change"
	EventPressureChange  EventType = "pressure_change"
	EventActionTaken     EventType = "action_taken"
	EventThresholdBreach EventType = "threshold_breach"
	EventProfileCapture  EventType = "profile_capture"
)

type KernelCollector

type KernelCollector struct {
	BaseCollector
	// contains filtered or unexported fields
}

KernelCollector collects kernel/netfilter metrics

func NewKernelCollector

func NewKernelCollector() *KernelCollector

NewKernelCollector creates a new kernel collector

func (*KernelCollector) Collect

func (c *KernelCollector) Collect(ctx context.Context, snapshot *Snapshot) error

Collect gathers kernel metrics

func (*KernelCollector) GetSoftnetDropsRate

func (c *KernelCollector) GetSoftnetDropsRate() float64

GetSoftnetDropsRate returns the current softnet drops rate per second

type KernelMetrics

type KernelMetrics struct {
	ConntrackCount       int     `json:"conntrack_count"`
	ConntrackMax         int     `json:"conntrack_max"`
	ConntrackUtilization float64 `json:"conntrack_utilization"` // count/max
	SoftnetDrops         uint64  `json:"softnet_drops_total"`   // Aggregated across CPUs
	SoftnetTimeSqueeze   uint64  `json:"softnet_time_squeeze_total"`
	NICDrops             uint64  `json:"nic_rx_dropped_total"` // Aggregated across interfaces
}

KernelMetrics contains kernel/netfilter metrics

type Level

type Level string

Level represents a pressure level

const (
	LevelOK       Level = "ok"
	LevelWarn     Level = "warn"
	LevelCritical Level = "critical"
)

type MetricsExporter

type MetricsExporter struct {
	// contains filtered or unexported fields
}

MetricsExporter updates Prometheus metrics from watchdog data

func NewMetricsExporter

func NewMetricsExporter() *MetricsExporter

NewMetricsExporter creates a new metrics exporter

func (*MetricsExporter) RecordAction

func (m *MetricsExporter) RecordAction(action Action)

RecordAction records an action in metrics

func (*MetricsExporter) Update

func (m *MetricsExporter) Update(snapshot *Snapshot, state *PressureState)

Update updates all metrics from a snapshot and state

type Mode

type Mode string

Mode represents the operating mode derived from worst dimension

const (
	ModeNormal   Mode = "normal"   // All OK
	ModeDegraded Mode = "degraded" // Any WARN
	ModeSurvival Mode = "survival" // Any CRITICAL
)

type NFTablesCollector

type NFTablesCollector struct {
	BaseCollector
	// contains filtered or unexported fields
}

NFTablesCollector collects nftables metrics

func NewNFTablesCollector

func NewNFTablesCollector(cacheDuration time.Duration) *NFTablesCollector

NewNFTablesCollector creates a new nftables collector

func (*NFTablesCollector) Collect

func (c *NFTablesCollector) Collect(ctx context.Context, snapshot *Snapshot) error

Collect gathers nftables metrics

func (*NFTablesCollector) GetLastApplyLatency

func (c *NFTablesCollector) GetLastApplyLatency() time.Duration

GetLastApplyLatency returns the last recorded apply latency

func (*NFTablesCollector) InvalidateCache

func (c *NFTablesCollector) InvalidateCache()

InvalidateCache forces a refresh on next collection

func (*NFTablesCollector) RecordApplyLatency

func (c *NFTablesCollector) RecordApplyLatency(latency time.Duration)

RecordApplyLatency records the latency of an nft apply operation Called by the nft backend after apply operations

type NFTablesMetrics

type NFTablesMetrics struct {
	RulesTotal       int            `json:"rules_total"`
	SetsTotal        int            `json:"sets_total"`
	SetElements      map[string]int `json:"set_elements"`       // set_name -> count
	CountersEnabled  bool           `json:"counters_enabled"`   // Whether rules have counters
	LastApplyLatency float64        `json:"last_apply_latency"` // Seconds
}

NFTablesMetrics contains nftables metrics

type PressureCalculator

type PressureCalculator struct {
	// contains filtered or unexported fields
}

PressureCalculator computes pressure scores from snapshots

func NewPressureCalculator

func NewPressureCalculator(cfg *Config) *PressureCalculator

NewPressureCalculator creates a new pressure calculator

func (*PressureCalculator) Calculate

func (p *PressureCalculator) Calculate(snapshot *Snapshot) map[Dimension]float64

Calculate computes pressure scores from a snapshot

type PressureState

type PressureState struct {
	Timestamp time.Time

	// Per-dimension scores (0-100)
	Scores map[Dimension]float64

	// Per-dimension levels
	Levels map[Dimension]Level

	// Derived operating mode
	Mode Mode

	// Time in current mode
	ModeEnteredAt time.Time
	ModeDuration  time.Duration
}

PressureState holds the current pressure state for all dimensions

func NewPressureState

func NewPressureState() *PressureState

NewPressureState creates a new pressure state with all OK

func (*PressureState) WorstLevel

func (ps *PressureState) WorstLevel() Level

WorstLevel returns the worst level across all dimensions

type ProcessCollector

type ProcessCollector struct {
	BaseCollector
	// contains filtered or unexported fields
}

ProcessCollector collects process-level metrics

func NewProcessCollector

func NewProcessCollector() (*ProcessCollector, error)

NewProcessCollector creates a new process collector

func (*ProcessCollector) Collect

func (c *ProcessCollector) Collect(ctx context.Context, snapshot *Snapshot) error

Collect gathers process metrics

type ProcessMetrics

type ProcessMetrics struct {
	PID     int     `json:"pid"`
	RSS     uint64  `json:"rss_bytes"`      // Resident Set Size
	VMS     uint64  `json:"vms_bytes"`      // Virtual Memory Size
	CPUPct  float64 `json:"cpu_percent"`    // CPU percentage (smoothed)
	FDs     int     `json:"open_fds"`       // Open file descriptors
	Threads int     `json:"threads"`        // Thread count
	Uptime  float64 `json:"uptime_seconds"` // Process uptime
}

ProcessMetrics contains process-level metrics

type Recorder

type Recorder struct {
	// contains filtered or unexported fields
}

Recorder maintains a flight recorder for watchdog events

func NewRecorder

func NewRecorder(cfg *Config) *Recorder

NewRecorder creates a new flight recorder

func (*Recorder) Cleanup

func (r *Recorder) Cleanup() error

Cleanup removes old files beyond retention period

func (*Recorder) Flush

func (r *Recorder) Flush() error

Flush writes all pending data to disk

func (*Recorder) GetRecentEvents

func (r *Recorder) GetRecentEvents(count int) []Event

GetRecentEvents returns recent events from the ring buffer

func (*Recorder) GetStats

func (r *Recorder) GetStats() RecorderStats

GetStats returns recorder statistics

func (*Recorder) RecordAction

func (r *Recorder) RecordAction(action Action, snapshot *Snapshot)

RecordAction records an action taken

func (*Recorder) RecordEvent

func (r *Recorder) RecordEvent(event Event)

RecordEvent adds an event to the ring buffer

func (*Recorder) RecordModeChange

func (r *Recorder) RecordModeChange(oldMode, newMode Mode, snapshot *Snapshot)

RecordModeChange records a mode transition event

func (*Recorder) RecordPressureChange

func (r *Recorder) RecordPressureChange(dim Dimension, oldLevel, newLevel Level, score float64)

RecordPressureChange records a pressure level change

func (*Recorder) RecordSnapshot

func (r *Recorder) RecordSnapshot(snapshot *Snapshot)

RecordSnapshot records a periodic snapshot

func (*Recorder) RecordThresholdBreach

func (r *Recorder) RecordThresholdBreach(dim Dimension, score float64, threshold float64, snapshot *Snapshot)

RecordThresholdBreach records a threshold breach

type RecorderStats

type RecorderStats struct {
	EventsInBuffer    int       `json:"events_in_buffer"`
	MaxEvents         int       `json:"max_events"`
	LastEventWrite    time.Time `json:"last_event_write"`
	LastSnapshotWrite time.Time `json:"last_snapshot_write"`
}

RecorderStats contains recorder statistics

type RuntimeCollector

type RuntimeCollector struct {
	BaseCollector
}

RuntimeCollector collects Go runtime metrics

func NewRuntimeCollector

func NewRuntimeCollector() *RuntimeCollector

NewRuntimeCollector creates a new runtime collector

func (*RuntimeCollector) Collect

func (c *RuntimeCollector) Collect(ctx context.Context, snapshot *Snapshot) error

Collect gathers Go runtime metrics

type RuntimeControls

type RuntimeControls struct {
	// Worker pool sizing
	MaxWorkers atomic.Int32

	// Feature toggles
	EnableExpensiveCollectors atomic.Bool
	EnableNFTRulesetScan      atomic.Bool
	EnableTopProcesses        atomic.Bool
	EnableVerboseLogging      atomic.Bool

	// Sampling factors (1.0 = normal, 0.5 = half, etc.)
	TelemetrySamplingFactor atomic.Uint32 // stored as percent (100 = 1.0)
	// contains filtered or unexported fields
}

RuntimeControls allows the watchdog to dynamically adjust daemon behavior All values are atomic for thread-safe access without locks

func NewRuntimeControls

func NewRuntimeControls() *RuntimeControls

NewRuntimeControls creates controls with defaults for NORMAL mode

func (*RuntimeControls) GetMode

func (rc *RuntimeControls) GetMode() Mode

GetMode returns the current mode

func (*RuntimeControls) GetSamplingFactor

func (rc *RuntimeControls) GetSamplingFactor() float64

GetSamplingFactor returns the sampling factor as a float (0.0-1.0)

func (*RuntimeControls) Reset

func (rc *RuntimeControls) Reset()

Reset sets all controls to NORMAL mode defaults

func (*RuntimeControls) SetMode

func (rc *RuntimeControls) SetMode(mode Mode)

SetMode applies settings for the given operating mode

type RuntimeMetrics

type RuntimeMetrics struct {
	Goroutines     int     `json:"goroutines"`
	HeapAlloc      uint64  `json:"heap_alloc_bytes"`
	HeapInuse      uint64  `json:"heap_inuse_bytes"`
	HeapReleased   uint64  `json:"heap_released_bytes"`
	HeapSys        uint64  `json:"heap_sys_bytes"`
	StackInuse     uint64  `json:"stack_inuse_bytes"`
	GCCPUFraction  float64 `json:"gc_cpu_fraction"`
	GCPauseNs      uint64  `json:"gc_pause_ns"`       // Last GC pause
	GCPauseTotalNs uint64  `json:"gc_pause_total_ns"` // Total GC pause time
	NumGC          uint32  `json:"num_gc"`            // Number of completed GC cycles
	GOMEMLIMIT     int64   `json:"gomemlimit_bytes"`  // GOMEMLIMIT value (-1 if not set)
}

RuntimeMetrics contains Go runtime metrics

type Snapshot

type Snapshot struct {
	Timestamp time.Time `json:"timestamp"`

	// Process metrics
	Process ProcessMetrics `json:"process"`

	// Go runtime metrics
	Runtime RuntimeMetrics `json:"runtime"`

	// System metrics
	System SystemMetrics `json:"system"`

	// Kernel/netfilter metrics
	Kernel KernelMetrics `json:"kernel"`

	// nftables metrics
	NFTables NFTablesMetrics `json:"nftables"`
}

Snapshot contains all collected metrics at a point in time

type StateMachine

type StateMachine struct {
	// contains filtered or unexported fields
}

StateMachine manages pressure state transitions

func NewStateMachine

func NewStateMachine(cfg *Config) *StateMachine

NewStateMachine creates a new state machine

func (*StateMachine) GetLevel

func (sm *StateMachine) GetLevel(dim Dimension) Level

GetLevel returns the level for a specific dimension

func (*StateMachine) GetLevelDuration

func (sm *StateMachine) GetLevelDuration(dim Dimension) time.Duration

GetLevelDuration returns time at current level for a dimension

func (*StateMachine) GetMode

func (sm *StateMachine) GetMode() Mode

GetMode returns the current operating mode

func (*StateMachine) GetModeDuration

func (sm *StateMachine) GetModeDuration() time.Duration

GetModeDuration returns time in current mode

func (*StateMachine) GetScore

func (sm *StateMachine) GetScore(dim Dimension) float64

GetScore returns the score for a specific dimension

func (*StateMachine) GetState

func (sm *StateMachine) GetState() *PressureState

GetState returns a copy of the current state

func (*StateMachine) IsStable

func (sm *StateMachine) IsStable() bool

IsStable returns true if no dimension is transitioning

func (*StateMachine) SetOnModeChange

func (sm *StateMachine) SetOnModeChange(cb func(old, new Mode))

SetOnModeChange sets the callback for mode changes

func (*StateMachine) SetOnStateChange

func (sm *StateMachine) SetOnStateChange(cb func(old, new *PressureState))

SetOnStateChange sets the callback for state changes

func (*StateMachine) Update

func (sm *StateMachine) Update(scores map[Dimension]float64) *PressureState

Update processes new scores and updates state

type SystemCollector

type SystemCollector struct {
	BaseCollector
	// contains filtered or unexported fields
}

SystemCollector collects OS-level metrics

func NewSystemCollector

func NewSystemCollector(diskPath string) *SystemCollector

NewSystemCollector creates a new system collector

func (*SystemCollector) Collect

func (c *SystemCollector) Collect(ctx context.Context, snapshot *Snapshot) error

Collect gathers system metrics

type SystemMetrics

type SystemMetrics struct {
	LoadAvg1   float64 `json:"load_avg_1"`
	LoadAvg5   float64 `json:"load_avg_5"`
	LoadAvg15  float64 `json:"load_avg_15"`
	NumCPU     int     `json:"num_cpu"`
	IOWaitPct  float64 `json:"iowait_percent"`
	MemTotal   uint64  `json:"mem_total_bytes"`
	MemFree    uint64  `json:"mem_free_bytes"`
	MemAvail   uint64  `json:"mem_available_bytes"`
	SwapTotal  uint64  `json:"swap_total_bytes"`
	SwapFree   uint64  `json:"swap_free_bytes"`
	DiskUsePct float64 `json:"disk_use_percent"`  // Log partition
	Entropy    int     `json:"entropy_available"` // /proc/sys/kernel/random/entropy_avail
}

SystemMetrics contains OS-level metrics

type Watchdog

type Watchdog struct {
	// contains filtered or unexported fields
}

Watchdog is the main watchdog coordinator

func New

func New(cfg *Config, controls *RuntimeControls) (*Watchdog, error)

New creates a new watchdog

func (*Watchdog) GetControls

func (w *Watchdog) GetControls() *RuntimeControls

GetControls returns the runtime controls

func (*Watchdog) GetMode

func (w *Watchdog) GetMode() Mode

GetMode returns the current operating mode

func (*Watchdog) GetRecentEvents

func (w *Watchdog) GetRecentEvents(count int) []Event

GetRecentEvents returns recent events from the flight recorder

func (*Watchdog) GetRecorderStats

func (w *Watchdog) GetRecorderStats() RecorderStats

GetRecorderStats returns flight recorder statistics

func (*Watchdog) GetSnapshot

func (w *Watchdog) GetSnapshot() *Snapshot

GetSnapshot returns the last collected snapshot

func (*Watchdog) GetState

func (w *Watchdog) GetState() *PressureState

GetState returns the current pressure state

func (*Watchdog) IsRunning

func (w *Watchdog) IsRunning() bool

IsRunning returns whether the watchdog is running

func (*Watchdog) RecordNFTApplyLatency

func (w *Watchdog) RecordNFTApplyLatency(latency time.Duration)

RecordNFTApplyLatency records nft apply latency

func (*Watchdog) Run

func (w *Watchdog) Run(ctx context.Context) error

Run starts the watchdog loop

func (*Watchdog) SetOnMetrics

func (w *Watchdog) SetOnMetrics(cb func(*Snapshot, *PressureState))

SetOnMetrics sets a callback for metrics updates

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL