monitor

package
v1.19.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2026 License: MIT Imports: 20 Imported by: 0

README

Monitor Package

Go Version

Production-ready health monitoring system for Go applications with automatic status transitions, configurable thresholds, and comprehensive metrics tracking.

AI Disclaimer: AI tools are used solely to assist with testing, documentation, and bug fixes under human supervision, in compliance with EU AI Act Article 50.4.


Table of Contents


Overview

The monitor package provides a sophisticated health monitoring system for production Go applications. It implements automatic health check execution with intelligent status transitions, hysteresis to prevent flapping, and comprehensive metrics collection.

Design Philosophy
  1. Reliability First: Hysteresis-based transitions prevent status flapping during temporary issues
  2. Observability: Track latency, uptime, downtime, and state transitions for complete visibility
  3. Flexibility: Configurable intervals, thresholds, and extensible middleware chain
  4. Thread-Safe: Fine-grained locking and atomic operations for concurrent access
  5. Composable: Independent subpackages (info, status, pool, types) work together seamlessly
Value Proposition
  • Prevent Alert Fatigue: Hysteresis prevents flapping during transient failures
  • Adaptive Monitoring: Automatically adjusts check frequency based on component health
  • Production Ready: Thread-safe, tested, and battle-proven in production
  • Observable: Complete visibility into health status and performance metrics
  • Scalable: Efficiently manage hundreds of monitors with pool management

Key Features

  • Three-State Model: OK → Warn → KO transitions with configurable thresholds
  • Adaptive Intervals: Different check frequencies for normal, rising, and falling states
  • Comprehensive Metrics: Latency, uptime, downtime, rise/fall times
  • Thread-Safe: Concurrent access safe with fine-grained locking
  • Pool Management: Group and manage multiple monitors with batch operations
  • Prometheus Integration: Built-in metrics export
  • Middleware Chain: Extensible health check pipeline
  • Dynamic Metadata: Runtime-generated component information
  • Shell Commands: CLI-style operational control

Installation

go get github.com/nabbar/golib/monitor

Architecture

Package Structure
monitor/
├── monitor          # Core health monitoring
├── pool/            # Monitor pool management
├── info/            # Component metadata
├── status/          # Status enumeration
└── types/           # Type definitions
Component Hierarchy
┌────────────────────────────────────────┐
│         Monitor Package                 │
│    Health Check Monitoring System       │
└──────┬──────┬────────┬─────────┬───────┘
       │      │        │         │
   ┌───▼──┐ ┌─▼───┐ ┌─▼────┐ ┌──▼──────┐
   │ Pool │ │Info │ │Status│ │  Types  │
   └──────┘ └─────┘ └──────┘ └─────────┘
Status Transition Model
       ┌──────────────┐
       │      KO      │  ← Component unhealthy
       └──────┬───────┘
              │ riseCountKO successes
              ▼
       ┌──────────────┐
       │     Warn     │  ← Component degraded
       └──────┬───────┘
              │ riseCountWarn successes
              ▼
       ┌──────────────┐
       │      OK      │  ← Component healthy
       └──────┬───────┘
              │ fallCountWarn failures
              ▼
       (returns to Warn, then KO)

Quick Start

Basic Monitor
import (
    "context"
    "time"
    "github.com/nabbar/golib/monitor"
    "github.com/nabbar/golib/monitor/info"
    "github.com/nabbar/golib/monitor/types"
    "github.com/nabbar/golib/duration"
)

// Create monitor
inf, _ := info.New("database-monitor")
mon, _ := monitor.New(context.Background, inf)

// Configure
cfg := types.Config{
    Name:          "postgres",
    CheckTimeout:  duration.ParseDuration(5 * time.Second),
    IntervalCheck: duration.ParseDuration(30 * time.Second),
    FallCountKO:   3,
    RiseCountKO:   3,
}
mon.SetConfig(context.Background(), cfg)

// Register health check
mon.SetHealthCheck(func(ctx context.Context) error {
    return db.PingContext(ctx)
})

// Start
mon.Start(context.Background())
defer mon.Stop(context.Background())

// Query
fmt.Printf("Status: %s\n", mon.Status())
Monitor Pool
import "github.com/nabbar/golib/monitor/pool"

pool := pool.New(ctxFunc)

// Add monitors
pool.MonitorAdd(createDBMonitor())
pool.MonitorAdd(createAPIMonitor())

// Register metrics
pool.RegisterMetrics(promFunc, logFunc)
defer pool.UnregisterMetrics()

// Start all
pool.Start(ctx)
defer pool.Stop(ctx)

Performance

Operation Time Memory Allocations
Monitor Creation 1.2 µs 2.1 KB 18 allocs
Health Check 15 µs 448 B 5 allocs
Status Transition 800 ns 0 B 0 allocs
Metrics Collection 2.5 µs 0 B 0 allocs
Pool.Start (10 monitors) 85 µs 8 KB 120 allocs

Use Cases

1. Microservice Health Monitoring

Monitor multiple services with automatic transitions and metrics collection.

2. Database Connection Pooling

Track database health with adaptive intervals for faster issue detection.

3. External Service Dependencies

Monitor third-party API availability with configurable timeouts.

4. Kubernetes Probes

Integrate with liveness and readiness probes.

5. Custom Middleware

Extend health checks with logging, metrics, or custom logic.


Subpackages

monitor (Core)

Core health check monitoring with status transitions, metrics, and lifecycle management.

GoDoc: pkg.go.dev/github.com/nabbar/golib/monitor

pool

Manage multiple monitors as a group with batch operations and Prometheus integration.

Documentation: pool/README.md

info

Dynamic metadata management with caching and lazy evaluation.

Documentation: info/README.md

status

Type-safe status enumeration (OK, Warn, KO) with multi-format encoding.

types

Shared interfaces, configuration types, and error codes.


Configuration

type Config struct {
    Name          string            // Component name
    CheckTimeout  duration.Duration // Health check timeout (min: 5s)
    IntervalCheck duration.Duration // Normal check interval (min: 1s)
    IntervalFall  duration.Duration // Interval when falling (min: 1s)
    IntervalRise  duration.Duration // Interval when rising (min: 1s)
    FallCountKO   int              // Failures for Warn→KO (min: 1)
    FallCountWarn int              // Failures for OK→Warn (min: 1)
    RiseCountKO   int              // Successes for KO→Warn (min: 1)
    RiseCountWarn int              // Successes for Warn→OK (min: 1)
}

Best Practices:

  • CheckTimeout < IntervalCheck (prevent overlapping checks)
  • Use shorter IntervalFall for faster issue detection
  • Set counts ≥ 2 to prevent flapping

Status Transitions

Transition Rules
From To Condition Resets
KO Warn riseCountKO consecutive successes Fall counters
Warn OK riseCountWarn consecutive successes Fall counters
OK Warn fallCountWarn consecutive failures Rise counters
Warn KO fallCountKO consecutive failures Rise counters
Example Sequence

Configuration: FallCountWarn:2, FallCountKO:3, RiseCountKO:3, RiseCountWarn:2

Check 1: ✓ → OK
Check 2: ✗ → OK (1 failure)
Check 3: ✗ → Warn (2 failures, threshold reached)
Check 4: ✗ → Warn (1 KO failure)
Check 5: ✗ → Warn (2 KO failures)
Check 6: ✗ → KO (3 KO failures, threshold reached)
Check 7-9: ✓✓✓ → Warn (3 successes, KO threshold reached)
Check 10-11: ✓✓ → OK (2 successes, Warn threshold reached)

Best Practices

Configuration
  • Set CheckTimeout < IntervalCheck
  • Use faster IntervalFall for issue detection
  • Configure counts ≥ 2 to prevent flapping
Health Checks
  • Respect context timeout
  • Return specific errors
  • Keep checks lightweight
  • Handle transient failures
Lifecycle
  • Always call Stop() when done
  • Use defer for cleanup
  • Check IsRunning() before operations
Pool Management
  • Use pools for related monitors
  • Call UnregisterMetrics() on shutdown
  • Register Prometheus metrics early

API Reference

Monitor Interface
type Monitor interface {
    // Lifecycle
    Start(ctx context.Context) error
    Stop(ctx context.Context) error
    Restart(ctx context.Context) error
    IsRunning() bool
    
    // Configuration
    SetConfig(ctx context.Context, cfg Config) error
    GetConfig() Config
    
    // Health Check
    SetHealthCheck(hc HealthCheck)
    RegisterMiddleware(mw Middleware)
    
    // Status & Metrics
    Status() status.Status
    Latency() time.Duration
    Uptime() time.Duration
    Downtime() time.Duration
    
    // Info
    InfoGet() Info
    InfoMap() map[string]interface{}
    
    // Encoding
    MarshalText() ([]byte, error)
    MarshalJSON() ([]byte, error)
}
Pool Interface
type Pool interface {
    // Monitor Management
    MonitorAdd(mon Monitor) error
    MonitorGet(name string) Monitor
    MonitorDel(name string)
    MonitorList() []string
    
    // Lifecycle
    Start(ctx context.Context) error
    Stop(ctx context.Context) error
    Restart(ctx context.Context) error
    
    // Metrics
    RegisterMetrics(prom, log func) error
    UnregisterMetrics()
    
    // Shell
    GetShellCommand(ctx context.Context) []Command
}

Testing

Test Suite: 595 specs across 4 packages with 86.1% overall coverage

# Run all tests
go test ./...

# With coverage
go test -cover ./...

# With race detection (recommended)
CGO_ENABLED=1 go test -race ./...

Test Results

monitor/                122 specs    68.5% coverage   0.23s
monitor/info/           139 specs   100.0% coverage   0.12s
monitor/pool/           153 specs    76.2% coverage  11.78s
monitor/status/         181 specs    98.4% coverage   0.02s

Quality Assurance

  • ✅ Zero data races (verified with -race)
  • ✅ Thread-safe concurrent operations
  • ✅ Comprehensive edge case testing
  • ✅ Time-dependent behavior validation

See TESTING.md for detailed testing documentation.


Contributing

Contributions welcome! Please follow these guidelines:

Code Standards
  • Write tests for new features
  • Update documentation
  • Add GoDoc comments for public APIs
  • Run go fmt and go vet
  • Test with race detector (-race)
AI Usage Policy
  • DO NOT use AI tools to generate package code or core logic
  • DO use AI to assist with:
    • Writing and improving tests
    • Documentation and comments
    • Debugging and bug fixes

All AI-assisted work must be reviewed and validated by a human maintainer.

Pull Request Process
  1. Fork the repository
  2. Create a feature branch
  3. Write tests (coverage > 70%)
  4. Update documentation
  5. Run full test suite with race detection
  6. Submit PR with clear description

Future Enhancements

Potential improvements under consideration:

  • Circuit Breaker Pattern: Automatic service isolation during failures
  • Distributed Monitoring: Cluster-wide health coordination
  • Historical Metrics: Long-term trend analysis
  • Custom Exporters: Support for other metrics systems (StatsD, InfluxDB)
  • Health Check Templates: Predefined checks for common services
  • Dynamic Thresholds: Adaptive thresholds based on historical data

Contributions and suggestions are welcome!


AI Transparency Notice

In accordance with Article 50.4 of the EU AI Act, AI assistance has been used for testing, documentation, and bug fixing under human supervision.


License

MIT License - See LICENSE file for details.


Resources

Related Packages:


Version: Go 1.18+ on Linux, macOS, Windows
Maintained By: Monitor Package Contributors

Documentation

Overview

Package monitor provides a robust health check monitoring system with automatic status transitions, configurable thresholds, and comprehensive metrics tracking.

Overview

The monitor package implements a sophisticated health monitoring system that periodically executes health checks and tracks the state of monitored components. It features:

  • Automatic status transitions with configurable thresholds (OK ↔ Warn ↔ KO)
  • Adaptive check intervals based on status (normal, rising, falling)
  • Comprehensive metrics (uptime, downtime, latency, rise/fall times)
  • Thread-safe concurrent operations
  • Prometheus metrics integration
  • Flexible configuration with validation
  • Middleware chain for extensibility

Status Transitions

The monitor uses a three-state model with hysteresis to prevent flapping:

  • KO: Component is not healthy
  • Warn: Component is degraded but functional
  • OK: Component is fully healthy

Transitions between states require multiple consecutive successes or failures:

KO --[riseCountKO successes]--> Warn --[riseCountWarn successes]--> OK
OK --[fallCountWarn failures]--> Warn --[fallCountKO failures]--> KO

Basic Usage

import (
	"context"
	"time"
	"github.com/nabbar/golib/monitor"
	"github.com/nabbar/golib/monitor/info"
	"github.com/nabbar/golib/monitor/types"
	"github.com/nabbar/golib/duration"
)

// Create info metadata
inf, err := info.New("database-monitor")
if err != nil {
	log.Fatal(err)
}

// Create monitor
mon, err := monitor.New(context.Background, inf)
if err != nil {
	log.Fatal(err)
}

// Configure monitor
cfg := types.Config{
	Name:          "database",
	CheckTimeout:  duration.ParseDuration(5 * time.Second),
	IntervalCheck: duration.ParseDuration(10 * time.Second),
	IntervalFall:  duration.ParseDuration(5 * time.Second),
	IntervalRise:  duration.ParseDuration(5 * time.Second),
	FallCountKO:   3,
	FallCountWarn: 2,
	RiseCountKO:   3,
	RiseCountWarn: 2,
}
if err := mon.SetConfig(context.Background, cfg); err != nil {
	log.Fatal(err)
}

// Register health check function
mon.SetHealthCheck(func(ctx context.Context) error {
	// Check database connectivity
	return db.PingContext(ctx)
})

// Start monitoring
if err := mon.Start(context.Background()); err != nil {
	log.Fatal(err)
}
defer mon.Stop(context.Background())

// Query status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Latency: %s\n", mon.Latency())
fmt.Printf("Uptime: %s\n", mon.Uptime())

Configuration

The monitor supports extensive configuration:

  • CheckTimeout: Maximum duration for a health check to complete (min: 5s)
  • IntervalCheck: Interval between checks in normal state (min: 1s)
  • IntervalFall: Interval when status is falling (min: 1s, default: IntervalCheck)
  • IntervalRise: Interval when status is rising (min: 1s, default: IntervalCheck)
  • FallCountKO: Failures needed to go from Warn to KO (min: 1)
  • FallCountWarn: Failures needed to go from OK to Warn (min: 1)
  • RiseCountKO: Successes needed to go from KO to Warn (min: 1)
  • RiseCountWarn: Successes needed to go from Warn to OK (min: 1)

All values are automatically normalized to their minimums if set below threshold.

Metrics Tracking

The monitor tracks comprehensive timing metrics:

  • Latency: Duration of the last health check execution
  • Uptime: Total time in OK status
  • Downtime: Total time in KO or Warn status
  • RiseTime: Total time spent transitioning to better status
  • FallTime: Total time spent transitioning to worse status

Prometheus Integration

The monitor can export metrics to Prometheus:

import "github.com/nabbar/golib/prometheus"

// Register metric names
mon.RegisterMetricsName("my_service_health")

// Register collection function
mon.RegisterCollectMetrics(prometheusCollector)

// Metrics are automatically collected after each health check

Encoding Support

The monitor supports multiple encoding formats:

// Text encoding
text, _ := mon.MarshalText()
fmt.Println(string(text))
// Output: OK: database (version: 1.0) | 5ms / 1h30m / 0s

// JSON encoding
json, _ := mon.MarshalJSON()

Thread Safety

All monitor operations are thread-safe and can be called concurrently from multiple goroutines. The monitor uses fine-grained locking to minimize contention while ensuring data consistency.

Best Practices

1. Configure appropriate check intervals to balance responsiveness and resource usage 2. Set fall/rise counts to prevent status flapping during temporary issues 3. Use shorter intervals during transitions (IntervalFall/Rise) for faster detection 4. Set CheckTimeout lower than IntervalCheck to prevent overlapping checks 5. Register a logger for debugging and troubleshooting 6. Always call Stop() when shutting down to clean up resources

Error Handling

The monitor defines several error codes:

  • ErrorParamEmpty: Empty parameter provided
  • ErrorMissingHealthCheck: No health check function registered
  • ErrorValidatorError: Configuration validation failed
  • ErrorLoggerError: Logger initialization failed
  • ErrorTimeout: Operation timeout
  • ErrorInvalid: Invalid monitor instance

All errors implement the liberr.Error interface for structured error handling.

  • github.com/nabbar/golib/monitor/info: Dynamic metadata management
  • github.com/nabbar/golib/monitor/status: Health status type
  • github.com/nabbar/golib/monitor/types: Type definitions and interfaces
  • github.com/nabbar/golib/monitor/pool: Monitor pool management
Example

Example demonstrates a complete monitor setup and execution.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

// Create info
inf, _ := moninf.New("example-service")

// Create monitor
mon, _ := libmon.New(context.Background(), inf)

// Configure monitor
cfg := montps.Config{
	Name:          "example-monitor",
	CheckTimeout:  libdur.ParseDuration(2 * time.Second),
	IntervalCheck: libdur.ParseDuration(1 * time.Second),
	RiseCountKO:   2,
	RiseCountWarn: 2,
	FallCountKO:   2,
	FallCountWarn: 2,
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

// Set health check function
mon.SetHealthCheck(func(ctx context.Context) error {
	// Simulate health check
	return nil
})

// Start monitoring
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)

// Check status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Running: %v\n", mon.IsRunning())

// Stop monitoring
_ = mon.Stop(ctx)
fmt.Printf("Running after stop: %v\n", mon.IsRunning())
Output:

Status: KO
Running: true
Running after stop: false

Index

Examples

Constants

View Source
const (
	// ErrorParamEmpty indicates an empty parameter was provided.
	ErrorParamEmpty liberr.CodeError = iota + liberr.MinPkgMonitor
	// ErrorMissingHealthCheck indicates no health check function was registered.
	ErrorMissingHealthCheck
	// ErrorValidatorError indicates configuration validation failed.
	ErrorValidatorError
	// ErrorLoggerError indicates logger initialization failed.
	ErrorLoggerError
	// ErrorTimeout indicates a timeout occurred during an operation.
	ErrorTimeout
	// ErrorInvalid indicates an invalid monitor instance.
	ErrorInvalid
)
View Source
const (

	// Log field constants for structured logging
	LogFieldProcess = "process"
	LogValueProcess = "monitor"
	LogFieldName    = "name"
)
View Source
const (
	// MaxPoolStart is the maximum time to wait for the monitor to start.
	MaxPoolStart = 3 * time.Second
	// MaxTickPooler is the polling interval when waiting for the monitor to start.
	MaxTickPooler = 5 * time.Millisecond
)

Variables

This section is empty.

Functions

func New

func New(ctx context.Context, info montps.Info) (montps.Monitor, error)

New creates a new Monitor instance with the given context provider and info. The ctx parameter provides a function that returns the current context. The info parameter provides metadata about the monitored component. Returns an error if info is nil.

Example:

inf, _ := info.New("my-service")
mon, err := monitor.New(context.Background, inf)
if err != nil {
    log.Fatal(err)
}
Example

ExampleNew demonstrates creating a new monitor instance.

package main

import (
	"context"
	"fmt"

	libmon "github.com/nabbar/golib/monitor"

	moninf "github.com/nabbar/golib/monitor/info"
)

func main() {
	// Create info metadata
	inf, err := moninf.New("database-monitor")
	if err != nil {
		panic(err)
	}

	// Create monitor
	mon, err := libmon.New(context.Background(), inf)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Monitor created: %s\n", mon.Name())
}
Output:

Monitor created: not named

Types

type Encode

type Encode interface {
	String() string // Returns a human-readable string representation
	Bytes() []byte  // Returns the byte representation of the string
}

Encode provides methods for converting monitor state to different formats.

type Monitor

type Monitor interface {
	montps.Monitor
}

Monitor is the interface that wraps all monitor functionalities. It extends montps.Monitor and provides health check monitoring capabilities.

Example (Metrics)

ExampleMonitor_metrics demonstrates collecting metrics.

ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()

inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)

cfg := montps.Config{
	Name:          "my-service",
	CheckTimeout:  libdur.ParseDuration(5 * time.Second),
	IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

mon.SetHealthCheck(func(ctx context.Context) error {
	time.Sleep(10 * time.Millisecond)
	return nil
})

_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
_ = mon.Stop(ctx)

// Collect metrics
latency := mon.Latency()
uptime := mon.Uptime()
downtime := mon.Downtime()

fmt.Printf("Latency recorded: %v\n", latency > 0)
fmt.Printf("Uptime recorded: %v\n", uptime >= 0)
fmt.Printf("Downtime recorded: %v\n", downtime >= 0)
Output:

Latency recorded: true
Uptime recorded: true
Downtime recorded: true
Example (Transitions)

ExampleMonitor_transitions demonstrates status transitions.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)

cfg := montps.Config{
	Name:          "transition-demo",
	CheckTimeout:  libdur.ParseDuration(5 * time.Second),
	IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
	RiseCountKO:   1,
	RiseCountWarn: 1,
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

mon.SetHealthCheck(func(ctx context.Context) error {
	return nil // Always healthy
})

_ = mon.Start(ctx)
time.Sleep(100 * time.Millisecond)

fmt.Printf("Initial: KO=%v\n", mon.Status().String() == "KO")

time.Sleep(300 * time.Millisecond)
fmt.Printf("Rising: %v\n", mon.IsRise() || mon.Status().String() != "KO")

_ = mon.Stop(ctx)
Output:

Initial: KO=true
Rising: true

Directories

Path Synopsis
Package info provides a thread-safe, caching implementation for monitor information.
Package info provides a thread-safe, caching implementation for monitor information.
Package pool provides a thread-safe pool implementation for managing multiple health monitors.
Package pool provides a thread-safe pool implementation for managing multiple health monitors.
Package status provides a robust enumeration type for representing monitor health status.
Package status provides a robust enumeration type for representing monitor health status.
Package types provides core type definitions and interfaces for the monitor system.
Package types provides core type definitions and interfaces for the monitor system.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL