Documentation
¶
Overview ¶
============================================================================= NFTBan v1.0 - Dynamic Watchdog Package ============================================================================= SPDX-License-Identifier: MPL-2.0
Package watchdog implements a dynamic runtime watchdog for nftban that:
- Continuously monitors system + process + kernel/netfilter + nftables signals
- Computes a pressure state (OK/WARN/CRITICAL) per dimension (CPU, MEM, IO, NET)
- Automatically aligns behavior based on pressure state
- Triggers forensic capture (pprof) during incidents
- Exposes metrics in Prometheus format
- Maintains an on-disk flight recorder for post-mortem analysis
Architecture:
┌─────────────────────────────────────────────────────────────┐ │ Watchdog Core │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ State Machine ││ │ │ NORMAL ←→ DEGRADED ←→ SURVIVAL ││ │ └─────────────────────────────────────────────────────────┘│ │ │ │ │ ┌─────────────────────┴─────────────────────┐ │ │ ▼ ▼ │ │ ┌─────────────────┐ ┌─────────────────────┐│ │ │ Collectors │ │ Actions ││ │ │ - Process │ │ - Throttle ││ │ │ - Runtime │ │ - Profile ││ │ │ - System │ │ - Memory Valve ││ │ │ - Netfilter │ │ - Degrade Mode ││ │ │ - nftables │ └─────────────────────┘│ │ └─────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────┐ │ │ │ Flight Recorder │ │ │ │ - Event JSON │ │ │ │ - Periodic Snapshots │ │ │ │ - pprof Profiles │ │ │ └─────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘
Pressure Dimensions:
- CPU: Process CPU%, system load, softnet drops
- MEM: RSS vs budget, heap vs GOMEMLIMIT, GC fraction, RSS slope
- IO: iowait%, disk usage, log partition fullness
- NET: conntrack utilization, softnet drops rate, nft apply latency
Operating Modes:
- NORMAL: All dimensions OK. Full functionality.
- DEGRADED: Any dimension WARN. Reduced frequency, backpressure enabled.
- SURVIVAL: Any dimension CRITICAL. Essential operations only.
Thread Safety:
All components are designed for concurrent access. The watchdog runs as a single goroutine with collectors running in parallel.
Usage:
cfg := watchdog.DefaultConfig() w := watchdog.New(cfg, controls) go w.Run(ctx)
=============================================================================
Index ¶
- type Action
- type ActionExecutor
- func (e *ActionExecutor) ApplyMode(mode Mode, snapshot *Snapshot) error
- func (e *ActionExecutor) CaptureCPUProfile(reason string, snapshot *Snapshot) (string, error)
- func (e *ActionExecutor) CaptureGoroutineProfile(reason string, snapshot *Snapshot) (string, error)
- func (e *ActionExecutor) CaptureHeapProfile(reason string, snapshot *Snapshot) (string, error)
- func (e *ActionExecutor) DisableOptional(reason string) error
- func (e *ActionExecutor) SetOnAction(cb func(Action))
- func (e *ActionExecutor) Throttle(reason string) error
- func (e *ActionExecutor) TryFreeOSMemory(memCriticalDuration time.Duration, cpuPercent float64, snapshot *Snapshot) (bool, error)
- type ActionType
- type BaseCollector
- type Collector
- type Config
- type Cooldowns
- type Dimension
- type Event
- type EventType
- type KernelCollector
- type KernelMetrics
- type Level
- type MetricsExporter
- type Mode
- type NFTablesCollector
- type NFTablesMetrics
- type PressureCalculator
- type PressureState
- type ProcessCollector
- type ProcessMetrics
- type Recorder
- func (r *Recorder) Cleanup() error
- func (r *Recorder) Flush() error
- func (r *Recorder) GetRecentEvents(count int) []Event
- func (r *Recorder) GetStats() RecorderStats
- func (r *Recorder) RecordAction(action Action, snapshot *Snapshot)
- func (r *Recorder) RecordEvent(event Event)
- func (r *Recorder) RecordModeChange(oldMode, newMode Mode, snapshot *Snapshot)
- func (r *Recorder) RecordPressureChange(dim Dimension, oldLevel, newLevel Level, score float64)
- func (r *Recorder) RecordSnapshot(snapshot *Snapshot)
- func (r *Recorder) RecordThresholdBreach(dim Dimension, score float64, threshold float64, snapshot *Snapshot)
- type RecorderStats
- type RuntimeCollector
- type RuntimeControls
- type RuntimeMetrics
- type Snapshot
- type StateMachine
- func (sm *StateMachine) GetLevel(dim Dimension) Level
- func (sm *StateMachine) GetLevelDuration(dim Dimension) time.Duration
- func (sm *StateMachine) GetMode() Mode
- func (sm *StateMachine) GetModeDuration() time.Duration
- func (sm *StateMachine) GetScore(dim Dimension) float64
- func (sm *StateMachine) GetState() *PressureState
- func (sm *StateMachine) IsStable() bool
- func (sm *StateMachine) SetOnModeChange(cb func(old, new Mode))
- func (sm *StateMachine) SetOnStateChange(cb func(old, new *PressureState))
- func (sm *StateMachine) Update(scores map[Dimension]float64) *PressureState
- type SystemCollector
- type SystemMetrics
- type Watchdog
- func (w *Watchdog) GetControls() *RuntimeControls
- func (w *Watchdog) GetMode() Mode
- func (w *Watchdog) GetRecentEvents(count int) []Event
- func (w *Watchdog) GetRecorderStats() RecorderStats
- func (w *Watchdog) GetSnapshot() *Snapshot
- func (w *Watchdog) GetState() *PressureState
- func (w *Watchdog) IsRunning() bool
- func (w *Watchdog) RecordNFTApplyLatency(latency time.Duration)
- func (w *Watchdog) Run(ctx context.Context) error
- func (w *Watchdog) SetOnMetrics(cb func(*Snapshot, *PressureState))
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Action ¶
type Action struct {
Type ActionType `json:"type"`
Timestamp time.Time `json:"timestamp"`
Reason string `json:"reason"`
Details string `json:"details,omitempty"`
Success bool `json:"success"`
}
Action represents an action taken by the watchdog
type ActionExecutor ¶
type ActionExecutor struct {
// contains filtered or unexported fields
}
ActionExecutor handles action execution with cooldowns
func NewActionExecutor ¶
func NewActionExecutor(cfg *Config, controls *RuntimeControls) *ActionExecutor
NewActionExecutor creates a new action executor
func (*ActionExecutor) ApplyMode ¶
func (e *ActionExecutor) ApplyMode(mode Mode, snapshot *Snapshot) error
ApplyMode applies settings for the given mode
func (*ActionExecutor) CaptureCPUProfile ¶
func (e *ActionExecutor) CaptureCPUProfile(reason string, snapshot *Snapshot) (string, error)
CaptureCPUProfile captures a CPU profile
func (*ActionExecutor) CaptureGoroutineProfile ¶
func (e *ActionExecutor) CaptureGoroutineProfile(reason string, snapshot *Snapshot) (string, error)
CaptureGoroutineProfile captures a goroutine profile
func (*ActionExecutor) CaptureHeapProfile ¶
func (e *ActionExecutor) CaptureHeapProfile(reason string, snapshot *Snapshot) (string, error)
CaptureHeapProfile captures a heap profile
func (*ActionExecutor) DisableOptional ¶
func (e *ActionExecutor) DisableOptional(reason string) error
DisableOptional disables optional collectors
func (*ActionExecutor) SetOnAction ¶
func (e *ActionExecutor) SetOnAction(cb func(Action))
SetOnAction sets the callback for when actions are taken
func (*ActionExecutor) Throttle ¶
func (e *ActionExecutor) Throttle(reason string) error
Throttle reduces worker count and disables optional work
func (*ActionExecutor) TryFreeOSMemory ¶
func (e *ActionExecutor) TryFreeOSMemory( memCriticalDuration time.Duration, cpuPercent float64, snapshot *Snapshot, ) (bool, error)
TryFreeOSMemory calls debug.FreeOSMemory if conditions are met Safe conditions:
- MEM CRITICAL for >= 30s
- CPU below threshold (not during high load)
- Cooldown passed
type ActionType ¶
type ActionType string
ActionType represents a watchdog action
const ( ActionThrottle ActionType = "throttle" ActionDisableOptional ActionType = "disable_optional" ActionProfileCPU ActionType = "profile_cpu" ActionProfileHeap ActionType = "profile_heap" ActionProfileGoroutine ActionType = "profile_goroutine" ActionFreeOSMemory ActionType = "free_os_memory" ActionDegradeMode ActionType = "degrade_mode" )
type BaseCollector ¶
type BaseCollector struct {
// contains filtered or unexported fields
}
BaseCollector provides common functionality for collectors
func NewBaseCollector ¶
func NewBaseCollector(name string) BaseCollector
NewBaseCollector creates a new base collector
func (*BaseCollector) Enabled ¶
func (b *BaseCollector) Enabled() bool
Enabled returns whether the collector is enabled
func (*BaseCollector) SetEnabled ¶
func (b *BaseCollector) SetEnabled(enabled bool)
SetEnabled sets the enabled state
type Collector ¶
type Collector interface {
Name() string
Collect(ctx context.Context, snapshot *Snapshot) error
Enabled() bool
SetEnabled(enabled bool)
}
Collector defines the interface for metrics collectors
type Config ¶
type Config struct {
// Enable/disable watchdog
Enabled bool
// Base collection interval (adjusts dynamically)
BaseInterval time.Duration
// ==========================================================================
// Hysteresis Settings (prevents flapping)
// ==========================================================================
HysteresisWarnEnter float64 // Enter WARN at this score
HysteresisWarnExit float64 // Exit WARN when below this for WarnExitDuration
HysteresisCritEnter float64 // Enter CRITICAL at this score
HysteresisCritExit float64 // Exit CRITICAL when below this for CritExitDuration
WarnExitDuration time.Duration
CritExitDuration time.Duration
// Memory
MemBudgetBytes int64 // Align with GOMEMLIMIT
RSSCritPercentOfBudget float64 // RSS critical threshold as % of budget
HeapCritPercentOfLimit float64 // Heap critical threshold as % of GOMEMLIMIT
// CPU
CPUCritPercent float64 // Process CPU critical threshold
LoadNormalizedCrit float64 // Load/NumCPU critical threshold
// IO
IOWaitCritPercent float64 // iowait critical threshold
DiskUseCritPercent float64 // Disk usage critical threshold
// Network/Conntrack
ConntrackUtilCrit float64 // Conntrack utilization critical threshold
SoftnetDropsRateCrit float64 // Softnet drops per second critical threshold
// Goroutines
GoroutinesCrit int // Goroutine count critical threshold
// ==========================================================================
// Collector Intervals (dynamic - change based on mode)
// ==========================================================================
ProcessInterval time.Duration // Process/runtime metrics
SystemInterval time.Duration // System metrics
KernelInterval time.Duration // Kernel/netfilter metrics
NFTSetInterval time.Duration // nft set counts (cheap)
NFTRulesetInterval time.Duration // nft full ruleset (expensive)
TopProcessesInterval time.Duration // Top hogs
// Degraded mode multipliers
DegradedIntervalMultiplier float64 // Multiply intervals by this in DEGRADED
// ==========================================================================
// Profiling Settings
// ==========================================================================
ProfileAutoEnabled bool
ProfileDir string
ProfileCPUCooldown time.Duration
ProfileHeapCooldown time.Duration
ProfileGoroutineCooldown time.Duration
ProfileCPUDuration time.Duration // Duration of CPU profile
ProfileMaxCount int // Max profiles to keep
// ==========================================================================
// Memory Valve Settings
// ==========================================================================
FreeOSMemoryEnabled bool
FreeOSMemoryCooldown time.Duration
FreeOSMemoryOnlyIfCPUBelow float64 // Only trigger if CPU below this %
// ==========================================================================
// Degradation Settings
// ==========================================================================
DegradeDisableNFTRulesetScan bool
DegradeReduceWorkersTo int
SurvivalReduceWorkersTo int
// ==========================================================================
// Flight Recorder Settings
// ==========================================================================
RecorderEnabled bool
RecorderDir string
RecorderMaxEvents int // Ring buffer size
RecorderSnapshotInterval time.Duration // How often to write snapshots
RecorderRetentionDays int // Days to keep old files
// ==========================================================================
// Alerting
// ==========================================================================
AlertThrottleDuration time.Duration // Don't repeat same alert within this
}
Config holds watchdog configuration
func DefaultConfig ¶
func DefaultConfig() *Config
DefaultConfig returns configuration with sensible defaults
func LoadConfig ¶
LoadConfig loads watchdog configuration from file Falls back to defaults for missing values
func (*Config) GetIntervalForMode ¶
GetIntervalForMode returns the adjusted interval for the given mode
type Cooldowns ¶
type Cooldowns struct {
// contains filtered or unexported fields
}
Cooldowns tracks action cooldowns to prevent repeated actions
func (*Cooldowns) CanExecute ¶
func (c *Cooldowns) CanExecute(action ActionType, cooldown time.Duration) bool
CanExecute returns true if the action is not in cooldown
func (*Cooldowns) Record ¶
func (c *Cooldowns) Record(action ActionType)
Record marks an action as executed
type Event ¶
type Event struct {
Type EventType `json:"type"`
Timestamp time.Time `json:"timestamp"`
Message string `json:"message"`
Snapshot *Snapshot `json:"snapshot,omitempty"`
Action *Action `json:"action,omitempty"`
OldMode Mode `json:"old_mode,omitempty"`
NewMode Mode `json:"new_mode,omitempty"`
Dimension Dimension `json:"dimension,omitempty"`
Score float64 `json:"score,omitempty"`
Level Level `json:"level,omitempty"`
}
Event represents a recorded event for post-mortem analysis
type KernelCollector ¶
type KernelCollector struct {
BaseCollector
// contains filtered or unexported fields
}
KernelCollector collects kernel/netfilter metrics
func NewKernelCollector ¶
func NewKernelCollector() *KernelCollector
NewKernelCollector creates a new kernel collector
func (*KernelCollector) Collect ¶
func (c *KernelCollector) Collect(ctx context.Context, snapshot *Snapshot) error
Collect gathers kernel metrics
func (*KernelCollector) GetSoftnetDropsRate ¶
func (c *KernelCollector) GetSoftnetDropsRate() float64
GetSoftnetDropsRate returns the current softnet drops rate per second
type KernelMetrics ¶
type KernelMetrics struct {
ConntrackCount int `json:"conntrack_count"`
ConntrackMax int `json:"conntrack_max"`
ConntrackUtilization float64 `json:"conntrack_utilization"` // count/max
SoftnetDrops uint64 `json:"softnet_drops_total"` // Aggregated across CPUs
SoftnetTimeSqueeze uint64 `json:"softnet_time_squeeze_total"`
NICDrops uint64 `json:"nic_rx_dropped_total"` // Aggregated across interfaces
}
KernelMetrics contains kernel/netfilter metrics
type MetricsExporter ¶
type MetricsExporter struct {
// contains filtered or unexported fields
}
MetricsExporter updates Prometheus metrics from watchdog data
func NewMetricsExporter ¶
func NewMetricsExporter() *MetricsExporter
NewMetricsExporter creates a new metrics exporter
func (*MetricsExporter) RecordAction ¶
func (m *MetricsExporter) RecordAction(action Action)
RecordAction records an action in metrics
func (*MetricsExporter) Update ¶
func (m *MetricsExporter) Update(snapshot *Snapshot, state *PressureState)
Update updates all metrics from a snapshot and state
type NFTablesCollector ¶
type NFTablesCollector struct {
BaseCollector
// contains filtered or unexported fields
}
NFTablesCollector collects nftables metrics
func NewNFTablesCollector ¶
func NewNFTablesCollector(cacheDuration time.Duration) *NFTablesCollector
NewNFTablesCollector creates a new nftables collector
func (*NFTablesCollector) Collect ¶
func (c *NFTablesCollector) Collect(ctx context.Context, snapshot *Snapshot) error
Collect gathers nftables metrics
func (*NFTablesCollector) GetLastApplyLatency ¶
func (c *NFTablesCollector) GetLastApplyLatency() time.Duration
GetLastApplyLatency returns the last recorded apply latency
func (*NFTablesCollector) InvalidateCache ¶
func (c *NFTablesCollector) InvalidateCache()
InvalidateCache forces a refresh on next collection
func (*NFTablesCollector) RecordApplyLatency ¶
func (c *NFTablesCollector) RecordApplyLatency(latency time.Duration)
RecordApplyLatency records the latency of an nft apply operation Called by the nft backend after apply operations
type NFTablesMetrics ¶
type NFTablesMetrics struct {
RulesTotal int `json:"rules_total"`
SetsTotal int `json:"sets_total"`
SetElements map[string]int `json:"set_elements"` // set_name -> count
CountersEnabled bool `json:"counters_enabled"` // Whether rules have counters
LastApplyLatency float64 `json:"last_apply_latency"` // Seconds
}
NFTablesMetrics contains nftables metrics
type PressureCalculator ¶
type PressureCalculator struct {
// contains filtered or unexported fields
}
PressureCalculator computes pressure scores from snapshots
func NewPressureCalculator ¶
func NewPressureCalculator(cfg *Config) *PressureCalculator
NewPressureCalculator creates a new pressure calculator
type PressureState ¶
type PressureState struct {
Timestamp time.Time
// Per-dimension scores (0-100)
Scores map[Dimension]float64
// Per-dimension levels
Levels map[Dimension]Level
// Derived operating mode
Mode Mode
// Time in current mode
ModeEnteredAt time.Time
ModeDuration time.Duration
}
PressureState holds the current pressure state for all dimensions
func NewPressureState ¶
func NewPressureState() *PressureState
NewPressureState creates a new pressure state with all OK
func (*PressureState) WorstLevel ¶
func (ps *PressureState) WorstLevel() Level
WorstLevel returns the worst level across all dimensions
type ProcessCollector ¶
type ProcessCollector struct {
BaseCollector
// contains filtered or unexported fields
}
ProcessCollector collects process-level metrics
func NewProcessCollector ¶
func NewProcessCollector() (*ProcessCollector, error)
NewProcessCollector creates a new process collector
type ProcessMetrics ¶
type ProcessMetrics struct {
PID int `json:"pid"`
RSS uint64 `json:"rss_bytes"` // Resident Set Size
VMS uint64 `json:"vms_bytes"` // Virtual Memory Size
CPUPct float64 `json:"cpu_percent"` // CPU percentage (smoothed)
FDs int `json:"open_fds"` // Open file descriptors
Threads int `json:"threads"` // Thread count
Uptime float64 `json:"uptime_seconds"` // Process uptime
}
ProcessMetrics contains process-level metrics
type Recorder ¶
type Recorder struct {
// contains filtered or unexported fields
}
Recorder maintains a flight recorder for watchdog events
func NewRecorder ¶
NewRecorder creates a new flight recorder
func (*Recorder) GetRecentEvents ¶
GetRecentEvents returns recent events from the ring buffer
func (*Recorder) GetStats ¶
func (r *Recorder) GetStats() RecorderStats
GetStats returns recorder statistics
func (*Recorder) RecordAction ¶
RecordAction records an action taken
func (*Recorder) RecordEvent ¶
RecordEvent adds an event to the ring buffer
func (*Recorder) RecordModeChange ¶
RecordModeChange records a mode transition event
func (*Recorder) RecordPressureChange ¶
RecordPressureChange records a pressure level change
func (*Recorder) RecordSnapshot ¶
RecordSnapshot records a periodic snapshot
type RecorderStats ¶
type RecorderStats struct {
EventsInBuffer int `json:"events_in_buffer"`
MaxEvents int `json:"max_events"`
LastEventWrite time.Time `json:"last_event_write"`
LastSnapshotWrite time.Time `json:"last_snapshot_write"`
}
RecorderStats contains recorder statistics
type RuntimeCollector ¶
type RuntimeCollector struct {
BaseCollector
}
RuntimeCollector collects Go runtime metrics
func NewRuntimeCollector ¶
func NewRuntimeCollector() *RuntimeCollector
NewRuntimeCollector creates a new runtime collector
type RuntimeControls ¶
type RuntimeControls struct {
// Worker pool sizing
MaxWorkers atomic.Int32
// Feature toggles
EnableExpensiveCollectors atomic.Bool
EnableNFTRulesetScan atomic.Bool
EnableTopProcesses atomic.Bool
EnableVerboseLogging atomic.Bool
// Sampling factors (1.0 = normal, 0.5 = half, etc.)
TelemetrySamplingFactor atomic.Uint32 // stored as percent (100 = 1.0)
// contains filtered or unexported fields
}
RuntimeControls allows the watchdog to dynamically adjust daemon behavior All values are atomic for thread-safe access without locks
func NewRuntimeControls ¶
func NewRuntimeControls() *RuntimeControls
NewRuntimeControls creates controls with defaults for NORMAL mode
func (*RuntimeControls) GetMode ¶
func (rc *RuntimeControls) GetMode() Mode
GetMode returns the current mode
func (*RuntimeControls) GetSamplingFactor ¶
func (rc *RuntimeControls) GetSamplingFactor() float64
GetSamplingFactor returns the sampling factor as a float (0.0-1.0)
func (*RuntimeControls) Reset ¶
func (rc *RuntimeControls) Reset()
Reset sets all controls to NORMAL mode defaults
func (*RuntimeControls) SetMode ¶
func (rc *RuntimeControls) SetMode(mode Mode)
SetMode applies settings for the given operating mode
type RuntimeMetrics ¶
type RuntimeMetrics struct {
Goroutines int `json:"goroutines"`
HeapAlloc uint64 `json:"heap_alloc_bytes"`
HeapInuse uint64 `json:"heap_inuse_bytes"`
HeapReleased uint64 `json:"heap_released_bytes"`
HeapSys uint64 `json:"heap_sys_bytes"`
StackInuse uint64 `json:"stack_inuse_bytes"`
GCCPUFraction float64 `json:"gc_cpu_fraction"`
GCPauseNs uint64 `json:"gc_pause_ns"` // Last GC pause
GCPauseTotalNs uint64 `json:"gc_pause_total_ns"` // Total GC pause time
NumGC uint32 `json:"num_gc"` // Number of completed GC cycles
GOMEMLIMIT int64 `json:"gomemlimit_bytes"` // GOMEMLIMIT value (-1 if not set)
}
RuntimeMetrics contains Go runtime metrics
type Snapshot ¶
type Snapshot struct {
Timestamp time.Time `json:"timestamp"`
// Process metrics
Process ProcessMetrics `json:"process"`
// Go runtime metrics
Runtime RuntimeMetrics `json:"runtime"`
// System metrics
System SystemMetrics `json:"system"`
// Kernel/netfilter metrics
Kernel KernelMetrics `json:"kernel"`
// nftables metrics
NFTables NFTablesMetrics `json:"nftables"`
}
Snapshot contains all collected metrics at a point in time
type StateMachine ¶
type StateMachine struct {
// contains filtered or unexported fields
}
StateMachine manages pressure state transitions
func NewStateMachine ¶
func NewStateMachine(cfg *Config) *StateMachine
NewStateMachine creates a new state machine
func (*StateMachine) GetLevel ¶
func (sm *StateMachine) GetLevel(dim Dimension) Level
GetLevel returns the level for a specific dimension
func (*StateMachine) GetLevelDuration ¶
func (sm *StateMachine) GetLevelDuration(dim Dimension) time.Duration
GetLevelDuration returns time at current level for a dimension
func (*StateMachine) GetMode ¶
func (sm *StateMachine) GetMode() Mode
GetMode returns the current operating mode
func (*StateMachine) GetModeDuration ¶
func (sm *StateMachine) GetModeDuration() time.Duration
GetModeDuration returns time in current mode
func (*StateMachine) GetScore ¶
func (sm *StateMachine) GetScore(dim Dimension) float64
GetScore returns the score for a specific dimension
func (*StateMachine) GetState ¶
func (sm *StateMachine) GetState() *PressureState
GetState returns a copy of the current state
func (*StateMachine) IsStable ¶
func (sm *StateMachine) IsStable() bool
IsStable returns true if no dimension is transitioning
func (*StateMachine) SetOnModeChange ¶
func (sm *StateMachine) SetOnModeChange(cb func(old, new Mode))
SetOnModeChange sets the callback for mode changes
func (*StateMachine) SetOnStateChange ¶
func (sm *StateMachine) SetOnStateChange(cb func(old, new *PressureState))
SetOnStateChange sets the callback for state changes
func (*StateMachine) Update ¶
func (sm *StateMachine) Update(scores map[Dimension]float64) *PressureState
Update processes new scores and updates state
type SystemCollector ¶
type SystemCollector struct {
BaseCollector
// contains filtered or unexported fields
}
SystemCollector collects OS-level metrics
func NewSystemCollector ¶
func NewSystemCollector(diskPath string) *SystemCollector
NewSystemCollector creates a new system collector
type SystemMetrics ¶
type SystemMetrics struct {
LoadAvg1 float64 `json:"load_avg_1"`
LoadAvg5 float64 `json:"load_avg_5"`
LoadAvg15 float64 `json:"load_avg_15"`
NumCPU int `json:"num_cpu"`
IOWaitPct float64 `json:"iowait_percent"`
MemTotal uint64 `json:"mem_total_bytes"`
MemFree uint64 `json:"mem_free_bytes"`
MemAvail uint64 `json:"mem_available_bytes"`
SwapTotal uint64 `json:"swap_total_bytes"`
SwapFree uint64 `json:"swap_free_bytes"`
DiskUsePct float64 `json:"disk_use_percent"` // Log partition
Entropy int `json:"entropy_available"` // /proc/sys/kernel/random/entropy_avail
}
SystemMetrics contains OS-level metrics
type Watchdog ¶
type Watchdog struct {
// contains filtered or unexported fields
}
Watchdog is the main watchdog coordinator
func New ¶
func New(cfg *Config, controls *RuntimeControls) (*Watchdog, error)
New creates a new watchdog
func (*Watchdog) GetControls ¶
func (w *Watchdog) GetControls() *RuntimeControls
GetControls returns the runtime controls
func (*Watchdog) GetRecentEvents ¶
GetRecentEvents returns recent events from the flight recorder
func (*Watchdog) GetRecorderStats ¶
func (w *Watchdog) GetRecorderStats() RecorderStats
GetRecorderStats returns flight recorder statistics
func (*Watchdog) GetSnapshot ¶
GetSnapshot returns the last collected snapshot
func (*Watchdog) GetState ¶
func (w *Watchdog) GetState() *PressureState
GetState returns the current pressure state
func (*Watchdog) RecordNFTApplyLatency ¶
RecordNFTApplyLatency records nft apply latency
func (*Watchdog) SetOnMetrics ¶
func (w *Watchdog) SetOnMetrics(cb func(*Snapshot, *PressureState))
SetOnMetrics sets a callback for metrics updates