Documentation
¶
Overview ¶
Package monitor provides a robust health check monitoring system with automatic status transitions, configurable thresholds, and comprehensive metrics tracking.
Overview ¶
The monitor package implements a sophisticated health monitoring system that periodically executes health checks and tracks the state of monitored components. It features:
- Automatic status transitions with configurable thresholds (OK ↔ Warn ↔ KO)
- Adaptive check intervals based on status (normal, rising, falling)
- Comprehensive metrics (uptime, downtime, latency, rise/fall times)
- Thread-safe concurrent operations
- Prometheus metrics integration
- Flexible configuration with validation
- Middleware chain for extensibility
Status Transitions ¶
The monitor uses a three-state model with hysteresis to prevent flapping:
- KO: Component is not healthy
- Warn: Component is degraded but functional
- OK: Component is fully healthy
Transitions between states require multiple consecutive successes or failures:
KO --[riseCountKO successes]--> Warn --[riseCountWarn successes]--> OK OK --[fallCountWarn failures]--> Warn --[fallCountKO failures]--> KO
Basic Usage ¶
import (
"context"
"time"
"github.com/nabbar/golib/monitor"
"github.com/nabbar/golib/monitor/info"
"github.com/nabbar/golib/monitor/types"
"github.com/nabbar/golib/duration"
)
// Create info metadata
inf, err := info.New("database-monitor")
if err != nil {
log.Fatal(err)
}
// Create monitor
mon, err := monitor.New(context.Background, inf)
if err != nil {
log.Fatal(err)
}
// Configure monitor
cfg := types.Config{
Name: "database",
CheckTimeout: duration.ParseDuration(5 * time.Second),
IntervalCheck: duration.ParseDuration(10 * time.Second),
IntervalFall: duration.ParseDuration(5 * time.Second),
IntervalRise: duration.ParseDuration(5 * time.Second),
FallCountKO: 3,
FallCountWarn: 2,
RiseCountKO: 3,
RiseCountWarn: 2,
}
if err := mon.SetConfig(context.Background, cfg); err != nil {
log.Fatal(err)
}
// Register health check function
mon.SetHealthCheck(func(ctx context.Context) error {
// Check database connectivity
return db.PingContext(ctx)
})
// Start monitoring
if err := mon.Start(context.Background()); err != nil {
log.Fatal(err)
}
defer mon.Stop(context.Background())
// Query status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Latency: %s\n", mon.Latency())
fmt.Printf("Uptime: %s\n", mon.Uptime())
Configuration ¶
The monitor supports extensive configuration:
- CheckTimeout: Maximum duration for a health check to complete (min: 5s)
- IntervalCheck: Interval between checks in normal state (min: 1s)
- IntervalFall: Interval when status is falling (min: 1s, default: IntervalCheck)
- IntervalRise: Interval when status is rising (min: 1s, default: IntervalCheck)
- FallCountKO: Failures needed to go from Warn to KO (min: 1)
- FallCountWarn: Failures needed to go from OK to Warn (min: 1)
- RiseCountKO: Successes needed to go from KO to Warn (min: 1)
- RiseCountWarn: Successes needed to go from Warn to OK (min: 1)
All values are automatically normalized to their minimums if set below threshold.
Metrics Tracking ¶
The monitor tracks comprehensive timing metrics:
- Latency: Duration of the last health check execution
- Uptime: Total time in OK status
- Downtime: Total time in KO or Warn status
- RiseTime: Total time spent transitioning to better status
- FallTime: Total time spent transitioning to worse status
Prometheus Integration ¶
The monitor can export metrics to Prometheus:
import "github.com/nabbar/golib/prometheus"
// Register metric names
mon.RegisterMetricsName("my_service_health")
// Register collection function
mon.RegisterCollectMetrics(prometheusCollector)
// Metrics are automatically collected after each health check
Encoding Support ¶
The monitor supports multiple encoding formats:
// Text encoding text, _ := mon.MarshalText() fmt.Println(string(text)) // Output: OK: database (version: 1.0) | 5ms / 1h30m / 0s // JSON encoding json, _ := mon.MarshalJSON()
Thread Safety ¶
All monitor operations are thread-safe and can be called concurrently from multiple goroutines. The monitor uses fine-grained locking to minimize contention while ensuring data consistency.
Best Practices ¶
1. Configure appropriate check intervals to balance responsiveness and resource usage 2. Set fall/rise counts to prevent status flapping during temporary issues 3. Use shorter intervals during transitions (IntervalFall/Rise) for faster detection 4. Set CheckTimeout lower than IntervalCheck to prevent overlapping checks 5. Register a logger for debugging and troubleshooting 6. Always call Stop() when shutting down to clean up resources
Error Handling ¶
The monitor defines several error codes:
- ErrorParamEmpty: Empty parameter provided
- ErrorMissingHealthCheck: No health check function registered
- ErrorValidatorError: Configuration validation failed
- ErrorLoggerError: Logger initialization failed
- ErrorTimeout: Operation timeout
- ErrorInvalid: Invalid monitor instance
All errors implement the liberr.Error interface for structured error handling.
Related Packages ¶
- github.com/nabbar/golib/monitor/info: Dynamic metadata management
- github.com/nabbar/golib/monitor/status: Health status type
- github.com/nabbar/golib/monitor/types: Type definitions and interfaces
- github.com/nabbar/golib/monitor/pool: Monitor pool management
Example ¶
Example demonstrates a complete monitor setup and execution.
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// Create info
inf, _ := moninf.New("example-service")
// Create monitor
mon, _ := libmon.New(context.Background(), inf)
// Configure monitor
cfg := montps.Config{
Name: "example-monitor",
CheckTimeout: libdur.ParseDuration(2 * time.Second),
IntervalCheck: libdur.ParseDuration(1 * time.Second),
RiseCountKO: 2,
RiseCountWarn: 2,
FallCountKO: 2,
FallCountWarn: 2,
Logger: lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)
// Set health check function
mon.SetHealthCheck(func(ctx context.Context) error {
// Simulate health check
return nil
})
// Start monitoring
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
// Check status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Running: %v\n", mon.IsRunning())
// Stop monitoring
_ = mon.Stop(ctx)
fmt.Printf("Running after stop: %v\n", mon.IsRunning())
Output: Status: KO Running: true Running after stop: false
Index ¶
Examples ¶
Constants ¶
const ( // ErrorParamEmpty indicates an empty parameter was provided. ErrorParamEmpty liberr.CodeError = iota + liberr.MinPkgMonitor // ErrorMissingHealthCheck indicates no health check function was registered. ErrorMissingHealthCheck // ErrorValidatorError indicates configuration validation failed. ErrorValidatorError // ErrorLoggerError indicates logger initialization failed. ErrorLoggerError // ErrorTimeout indicates a timeout occurred during an operation. ErrorTimeout // ErrorInvalid indicates an invalid monitor instance. ErrorInvalid )
const ( // Log field constants for structured logging LogFieldProcess = "process" LogValueProcess = "monitor" LogFieldName = "name" )
const ( // MaxPoolStart is the maximum time to wait for the monitor to start. MaxPoolStart = 3 * time.Second // MaxTickPooler is the polling interval when waiting for the monitor to start. MaxTickPooler = 5 * time.Millisecond )
Variables ¶
This section is empty.
Functions ¶
func New ¶
New creates a new Monitor instance with the given context provider and info. The ctx parameter provides a function that returns the current context. The info parameter provides metadata about the monitored component. Returns an error if info is nil.
Example:
inf, _ := info.New("my-service")
mon, err := monitor.New(context.Background, inf)
if err != nil {
log.Fatal(err)
}
Example ¶
ExampleNew demonstrates creating a new monitor instance.
package main
import (
"context"
"fmt"
libmon "github.com/nabbar/golib/monitor"
moninf "github.com/nabbar/golib/monitor/info"
)
func main() {
// Create info metadata
inf, err := moninf.New("database-monitor")
if err != nil {
panic(err)
}
// Create monitor
mon, err := libmon.New(context.Background(), inf)
if err != nil {
panic(err)
}
fmt.Printf("Monitor created: %s\n", mon.Name())
}
Output: Monitor created: not named
Types ¶
type Encode ¶
type Encode interface {
String() string // Returns a human-readable string representation
Bytes() []byte // Returns the byte representation of the string
}
Encode provides methods for converting monitor state to different formats.
type Monitor ¶
Monitor is the interface that wraps all monitor functionalities. It extends montps.Monitor and provides health check monitoring capabilities.
Example (Metrics) ¶
ExampleMonitor_metrics demonstrates collecting metrics.
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()
inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)
cfg := montps.Config{
Name: "my-service",
CheckTimeout: libdur.ParseDuration(5 * time.Second),
IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
Logger: lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)
mon.SetHealthCheck(func(ctx context.Context) error {
time.Sleep(10 * time.Millisecond)
return nil
})
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
_ = mon.Stop(ctx)
// Collect metrics
latency := mon.Latency()
uptime := mon.Uptime()
downtime := mon.Downtime()
fmt.Printf("Latency recorded: %v\n", latency > 0)
fmt.Printf("Uptime recorded: %v\n", uptime >= 0)
fmt.Printf("Downtime recorded: %v\n", downtime >= 0)
Output: Latency recorded: true Uptime recorded: true Downtime recorded: true
Example (Transitions) ¶
ExampleMonitor_transitions demonstrates status transitions.
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)
cfg := montps.Config{
Name: "transition-demo",
CheckTimeout: libdur.ParseDuration(5 * time.Second),
IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
RiseCountKO: 1,
RiseCountWarn: 1,
Logger: lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)
mon.SetHealthCheck(func(ctx context.Context) error {
return nil // Always healthy
})
_ = mon.Start(ctx)
time.Sleep(100 * time.Millisecond)
fmt.Printf("Initial: KO=%v\n", mon.Status().String() == "KO")
time.Sleep(300 * time.Millisecond)
fmt.Printf("Rising: %v\n", mon.IsRise() || mon.Status().String() != "KO")
_ = mon.Stop(ctx)
Output: Initial: KO=true Rising: true
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package info provides a thread-safe, caching implementation for monitor information.
|
Package info provides a thread-safe, caching implementation for monitor information. |
|
Package pool provides a thread-safe pool implementation for managing multiple health monitors.
|
Package pool provides a thread-safe pool implementation for managing multiple health monitors. |
|
Package status provides a robust enumeration type for representing monitor health status.
|
Package status provides a robust enumeration type for representing monitor health status. |
|
Package types provides core type definitions and interfaces for the monitor system.
|
Package types provides core type definitions and interfaces for the monitor system. |