Documentation
¶
Overview ¶
Package monitor provides a high-performance, thread-safe health monitoring framework for Go applications. It is designed to track the operational status of internal and external components (databases, APIs, microservices) using a robust state machine that handles health transitions with built-in hysteresis to prevent status flapping.
Core Philosophy: Performance & Resilience ¶
The monitor is architected for zero-contention status reporting and reliable periodic execution. It separates configuration management from the performance-critical status reporting path.
Key design principles:
- Atomic Status Reporting: Reads (Status, Latency, Uptime) use lock-free atomic primitives.
- Dampened Transitions: Configurable Fall/Rise thresholds prevent noise during transient failures.
- Dynamic Polling: Intervals automatically adjust based on the current health state (Rise/Fall/Stable).
- Middleware Pipeline: Extensible execution chain for logging, metrics, and tracing.
Internal Architecture & Data Flow ¶
The monitor operates as a background orchestrator managed by an atomic ticker runner.
Internal Dataflow Diagram:
[ Ticker Loop ] <-------------------------------------------+
| |
v |
[ Interval Resolver ] --(IntervalCheck/Fall/Rise)-----------+
|
v
[ Middleware Chain ]
|-- (mdlStatus) --+--> [ Start Timer ]
| |
|-- (User Fct) ---+--> [ Health Check Execution ]
| |
|-- (mdlStatus) --+--> [ Stop Timer & Capture Latency ]
| |
| +--> [ State Machine Transition Logic ]
| |
| +--> [ Atomic Update of Metrics Container ]
v
[ Metrics Dispatch ] --(RegisterCollectMetrics)--> [ Prometheus / Loggers ]
State Machine & Hysteresis ¶
The monitor implements a 3-state machine (OK, Warn, KO) with directional transition counters.
Transition State Diagram:
+--------+ (Fail >= fallCountWarn) +--------+ (Fail >= fallCountKO) +--------+
| OK | --------------------------> | Warn | ------------------------> | KO |
+--------+ <-------------------------- +--------+ <------------------------ +--------+
(Succ >= riseCountWarn) (Succ >= riseCountKO)
Transition Logic Table:
Current Status | Event | Counter Logic | Transition Action ---------------|---------|-----------------------------|--------------------------- OK | Failure | cntFall++ | If cntFall >= fallCountWarn: -> Warn OK | Success | cntFall=0, cntRise=0 | Stay OK (Uptime++) Warn | Failure | cntFall++ | If cntFall >= fallCountKO: -> KO Warn | Success | cntRise++ | If cntRise >= riseCountWarn: -> OK KO | Success | cntRise++ | If cntRise >= riseCountKO: -> Warn KO | Failure | cntRise=0, cntFall=0 | Stay KO (Downtime++)
High-Performance Read Path ¶
Status retrieval methods (Status, Latency, Uptime, Downtime) are optimized for high-frequency polling (e.g., thousands of reads per second from a metrics exporter or a load-balancer probe).
Implementation Details:
- No Mutexes: The "hot path" for reads uses atomic.LoadUint64/Int64 from the metrics container.
- Zero Allocations: Status and metric reads perform no heap allocations.
- Near-Zero Latency: Typical read latency is in the single-digit nanosecond range.
Middleware Extensibility ¶
The health check execution uses a LIFO (Last-In-First-Out) middleware stack. Middlewares can intercept the execution to perform pre/post actions.
Execution Stack Example:
- [ mdlStatus ] (Core: Latency & State logic)
- [ CustomLogger ] (Optional: logs failures)
- [ OpenTelemetry ] (Optional: traces health check)
- [ User HealthCheck Function ] (The actual diagnostic)
Sub-Packages & Modules ¶
- info: Metadata management (name, version, environment data).
- status: Status enumeration and parsing (OK, Warn, KO, Unknown).
- types: Public interfaces and configuration structures.
- pool: (Optional) Management for collections of monitors.
Usage Example: Database Health Monitoring ¶
import (
"context"
"github.com/nabbar/golib/monitor"
"github.com/nabbar/golib/monitor/info"
"github.com/nabbar/golib/monitor/types"
"github.com/nabbar/golib/duration"
)
func setupMonitor(db *sql.DB) types.Monitor {
// 1. Initialize Info with metadata
inf, _ := info.New("postgres-db")
// 2. Create the monitor instance
mon, _ := monitor.New(context.Background(), inf)
// 3. Configure intervals and thresholds
cfg := types.Config{
Name: "main-database",
CheckTimeout: duration.ParseDuration("5s"),
IntervalCheck: duration.ParseDuration("30s"), // Normal polling frequency
IntervalFall: duration.ParseDuration("2s"), // Aggressive polling when failing
FallCountWarn: 2, // 2 consecutive failures to trigger 'Warn'
FallCountKO: 3, // 3 more failures to trigger 'KO'
}
_ = mon.SetConfig(context.Background(), cfg)
// 4. Register the actual check logic
mon.SetHealthCheck(func(ctx context.Context) error {
return db.PingContext(ctx)
})
// 5. Start the background runner
_ = mon.Start(context.Background())
return mon
}
Thread Safety & Concurrency ¶
Every component is safe for concurrent use. Configuration updates (SetConfig) and Metadata updates (InfoUpd) use atomic swaps or thread-safe containers to ensure that a configuration reload never causes a race condition or performance degradation for the periodic runner or status readers.
Prometheus & Metrics Integration ¶
The package is designed to integrate seamlessly with Prometheus via the RegisterCollectMetrics method. At the end of each diagnostic run, the monitor dispatches its current state to the registered collector, updating Gauges for latency, status code, and transition timers.
Package monitor provides a robust, thread-safe framework for periodic health monitoring of components. It supports state machine transitions (OK, Warn, KO), middleware execution chains, and integrated metrics collection for Prometheus.
Example ¶
Example demonstrates a complete monitor setup and execution.
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// Create info
inf, _ := moninf.New("example-service")
// Create monitor
mon, _ := libmon.New(context.Background(), inf)
// Configure monitor
cfg := montps.Config{
Name: "example-monitor",
CheckTimeout: libdur.ParseDuration(2 * time.Second),
IntervalCheck: libdur.ParseDuration(1 * time.Second),
RiseCountKO: 2,
RiseCountWarn: 2,
FallCountKO: 2,
FallCountWarn: 2,
Logger: lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)
// Set health check function
mon.SetHealthCheck(func(ctx context.Context) error {
// Simulate health check
return nil
})
// Start monitoring
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
// Check status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Running: %v\n", mon.IsRunning())
// Stop monitoring
_ = mon.Stop(ctx)
fmt.Printf("Running after stop: %v\n", mon.IsRunning())
Output: Status: KO Running: true Running after stop: false
Index ¶
Examples ¶
Constants ¶
const ( // ErrorParamEmpty indicates an empty parameter was provided. ErrorParamEmpty liberr.CodeError = iota + liberr.MinPkgMonitor // ErrorMissingHealthCheck indicates no health check function was registered. ErrorMissingHealthCheck // ErrorValidatorError indicates configuration validation failed. ErrorValidatorError // ErrorLoggerError indicates logger initialization failed. ErrorLoggerError // ErrorTimeout indicates a timeout occurred during an operation. ErrorTimeout // ErrorInvalid indicates an invalid monitor instance. ErrorInvalid )
const ( // Structured logging constants for consistency across monitor log entries. LogFieldProcess = "process" // LogFieldProcess is the field key for identifying the monitor process in logs. LogValueProcess = "monitor" // LogValueProcess is the constant value for the process field. LogFieldName = "name" // LogFieldName is the field key for the monitor instance name. )
const ( // MaxPoolStart defines the maximum duration the Start method will wait for the internal ticker to become active. MaxPoolStart = 3 * time.Second // MaxTickPooler defines the interval at which the Start method checks if the internal ticker has successfully started. MaxTickPooler = 5 * time.Millisecond )
Variables ¶
This section is empty.
Functions ¶
func New ¶
New initializes and returns a new Monitor instance.
Parameters:
- ctx: The base context used for the monitor's internal state management. If nil, context.Background() is used.
- info: An implementation of montps.Info containing metadata (name, version, etc.) for the monitored component.
The returned monitor is initialized in a stopped state with default configuration values. It uses an atomic internal structure to ensure thread-safety across all operations.
Returns:
- montps.Monitor: A thread-safe monitor instance.
- error: Returns an error if the mandatory 'info' parameter is nil.
Example:
inf, _ := info.New("database-service")
mon, err := monitor.New(context.Background(), inf)
if err != nil {
log.Fatalf("failed to create monitor: %v", err)
}
_ = mon.Start(context.Background())
Example ¶
ExampleNew demonstrates creating a new monitor instance.
package main
import (
"context"
"fmt"
libmon "github.com/nabbar/golib/monitor"
moninf "github.com/nabbar/golib/monitor/info"
)
func main() {
// Create info metadata
inf, err := moninf.New("database-monitor")
if err != nil {
panic(err)
}
// Create monitor
mon, err := libmon.New(context.Background(), inf)
if err != nil {
panic(err)
}
fmt.Printf("Monitor created: %s\n", mon.Name())
}
Output: Monitor created: not named
Types ¶
type Encode ¶
type Encode interface {
// String returns a formatted, human-readable string representation of the monitor's state.
// Format: "<STATUS>: <Name> (<Info>) | <Latency> / <Uptime> / <Downtime> | <Message>"
String() string
// Bytes returns the byte slice representation of the string generated by String().
Bytes() []byte
}
Encode is an interface that defines the contract for converting a monitor's current state into various formats.
Implementation Note: It is primarily used to generate human-readable strings or byte slices for logging, reporting, or API responses while ensuring consistency across formats.
type Monitor ¶
Monitor is the primary interface for managing component health checks. It embeds the base Monitor interface from the types package, which defines methods for configuration, lifecycle management (Start/Stop), and status retrieval.
Example (Metrics) ¶
ExampleMonitor_metrics demonstrates collecting metrics.
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()
inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)
cfg := montps.Config{
Name: "my-service",
CheckTimeout: libdur.ParseDuration(5 * time.Second),
IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
Logger: lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)
mon.SetHealthCheck(func(ctx context.Context) error {
time.Sleep(10 * time.Millisecond)
return nil
})
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
_ = mon.Stop(ctx)
// Collect metrics
latency := mon.Latency()
uptime := mon.Uptime()
downtime := mon.Downtime()
fmt.Printf("Latency recorded: %v\n", latency > 0)
fmt.Printf("Uptime recorded: %v\n", uptime >= 0)
fmt.Printf("Downtime recorded: %v\n", downtime >= 0)
Output: Latency recorded: true Uptime recorded: true Downtime recorded: true
Example (Transitions) ¶
ExampleMonitor_transitions demonstrates status transitions.
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)
cfg := montps.Config{
Name: "transition-demo",
CheckTimeout: libdur.ParseDuration(5 * time.Second),
IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
RiseCountKO: 1,
RiseCountWarn: 1,
Logger: lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)
mon.SetHealthCheck(func(ctx context.Context) error {
return nil // Always healthy
})
_ = mon.Start(ctx)
time.Sleep(100 * time.Millisecond)
fmt.Printf("Initial: KO=%v\n", mon.Status().String() == "KO")
time.Sleep(300 * time.Millisecond)
fmt.Printf("Rising: %v\n", mon.IsRise() || mon.Status().String() != "KO")
_ = mon.Stop(ctx)
Output: Initial: KO=true Rising: true
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package info provides a robust and thread-safe metadata management system for monitored components.
|
Package info provides a robust and thread-safe metadata management system for monitored components. |
|
Package pool provides a thread-safe pool implementation for managing multiple health monitors.
|
Package pool provides a thread-safe pool implementation for managing multiple health monitors. |
|
Package status provides a robust enumeration type for representing monitor health status.
|
Package status provides a robust enumeration type for representing monitor health status. |
|
Package types provides core type definitions and interfaces for the monitor system.
|
Package types provides core type definitions and interfaces for the monitor system. |