monitor

package

v1.19.2 Latest Latest Go to latest Published: Feb 4, 2026 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/nabbar/golib

Links

Open Source Insights

README ¶

Monitor Package

Production-ready health monitoring system for Go applications with automatic status transitions, configurable thresholds, and comprehensive metrics tracking.

AI Disclaimer: AI tools are used solely to assist with testing, documentation, and bug fixes under human supervision, in compliance with EU AI Act Article 50.4.

Overview

The monitor package provides a sophisticated health monitoring system for production Go applications. It implements automatic health check execution with intelligent status transitions, hysteresis to prevent flapping, and comprehensive metrics collection.

Design Philosophy

Reliability First: Hysteresis-based transitions prevent status flapping during temporary issues
Observability: Track latency, uptime, downtime, and state transitions for complete visibility
Flexibility: Configurable intervals, thresholds, and extensible middleware chain
Thread-Safe: Fine-grained locking and atomic operations for concurrent access
Composable: Independent subpackages (info, status, pool, types) work together seamlessly

Value Proposition

Prevent Alert Fatigue: Hysteresis prevents flapping during transient failures
Adaptive Monitoring: Automatically adjusts check frequency based on component health
Production Ready: Thread-safe, tested, and battle-proven in production
Observable: Complete visibility into health status and performance metrics
Scalable: Efficiently manage hundreds of monitors with pool management

Key Features

Three-State Model: OK → Warn → KO transitions with configurable thresholds
Adaptive Intervals: Different check frequencies for normal, rising, and falling states
Comprehensive Metrics: Latency, uptime, downtime, rise/fall times
Thread-Safe: Concurrent access safe with fine-grained locking
Pool Management: Group and manage multiple monitors with batch operations
Prometheus Integration: Built-in metrics export
Middleware Chain: Extensible health check pipeline
Dynamic Metadata: Runtime-generated component information
Shell Commands: CLI-style operational control

Installation

go get github.com/nabbar/golib/monitor

Architecture

Package Structure

monitor/
├── monitor          # Core health monitoring
├── pool/            # Monitor pool management
├── info/            # Component metadata
├── status/          # Status enumeration
└── types/           # Type definitions

Component Hierarchy

┌────────────────────────────────────────┐
│         Monitor Package                 │
│    Health Check Monitoring System       │
└──────┬──────┬────────┬─────────┬───────┘
       │      │        │         │
   ┌───▼──┐ ┌─▼───┐ ┌─▼────┐ ┌──▼──────┐
   │ Pool │ │Info │ │Status│ │  Types  │
   └──────┘ └─────┘ └──────┘ └─────────┘

Status Transition Model

       ┌──────────────┐
       │      KO      │  ← Component unhealthy
       └──────┬───────┘
              │ riseCountKO successes
              ▼
       ┌──────────────┐
       │     Warn     │  ← Component degraded
       └──────┬───────┘
              │ riseCountWarn successes
              ▼
       ┌──────────────┐
       │      OK      │  ← Component healthy
       └──────┬───────┘
              │ fallCountWarn failures
              ▼
       (returns to Warn, then KO)

Quick Start

Basic Monitor

import (
    "context"
    "time"
    "github.com/nabbar/golib/monitor"
    "github.com/nabbar/golib/monitor/info"
    "github.com/nabbar/golib/monitor/types"
    "github.com/nabbar/golib/duration"
)

// Create monitor
inf, _ := info.New("database-monitor")
mon, _ := monitor.New(context.Background, inf)

// Configure
cfg := types.Config{
    Name:          "postgres",
    CheckTimeout:  duration.ParseDuration(5 * time.Second),
    IntervalCheck: duration.ParseDuration(30 * time.Second),
    FallCountKO:   3,
    RiseCountKO:   3,
}
mon.SetConfig(context.Background(), cfg)

// Register health check
mon.SetHealthCheck(func(ctx context.Context) error {
    return db.PingContext(ctx)
})

// Start
mon.Start(context.Background())
defer mon.Stop(context.Background())

// Query
fmt.Printf("Status: %s\n", mon.Status())

Monitor Pool

import "github.com/nabbar/golib/monitor/pool"

pool := pool.New(ctxFunc)

// Add monitors
pool.MonitorAdd(createDBMonitor())
pool.MonitorAdd(createAPIMonitor())

// Register metrics
pool.RegisterMetrics(promFunc, logFunc)
defer pool.UnregisterMetrics()

// Start all
pool.Start(ctx)
defer pool.Stop(ctx)

Performance

Operation	Time	Memory	Allocations
Monitor Creation	1.2 µs	2.1 KB	18 allocs
Health Check	15 µs	448 B	5 allocs
Status Transition	800 ns	0 B	0 allocs
Metrics Collection	2.5 µs	0 B	0 allocs
Pool.Start (10 monitors)	85 µs	8 KB	120 allocs

Use Cases

1. Microservice Health Monitoring

Monitor multiple services with automatic transitions and metrics collection.

2. Database Connection Pooling

Track database health with adaptive intervals for faster issue detection.

3. External Service Dependencies

Monitor third-party API availability with configurable timeouts.

4. Kubernetes Probes

Integrate with liveness and readiness probes.

5. Custom Middleware

Extend health checks with logging, metrics, or custom logic.

Subpackages

monitor (Core)

Core health check monitoring with status transitions, metrics, and lifecycle management.

GoDoc: pkg.go.dev/github.com/nabbar/golib/monitor

pool

Manage multiple monitors as a group with batch operations and Prometheus integration.

Documentation: pool/README.md

info

Dynamic metadata management with caching and lazy evaluation.

Documentation: info/README.md

status

Type-safe status enumeration (OK, Warn, KO) with multi-format encoding.

types

Shared interfaces, configuration types, and error codes.

Configuration

type Config struct {
    Name          string            // Component name
    CheckTimeout  duration.Duration // Health check timeout (min: 5s)
    IntervalCheck duration.Duration // Normal check interval (min: 1s)
    IntervalFall  duration.Duration // Interval when falling (min: 1s)
    IntervalRise  duration.Duration // Interval when rising (min: 1s)
    FallCountKO   int              // Failures for Warn→KO (min: 1)
    FallCountWarn int              // Failures for OK→Warn (min: 1)
    RiseCountKO   int              // Successes for KO→Warn (min: 1)
    RiseCountWarn int              // Successes for Warn→OK (min: 1)
}

Best Practices:

CheckTimeout < IntervalCheck (prevent overlapping checks)
Use shorter IntervalFall for faster issue detection
Set counts ≥ 2 to prevent flapping

Status Transitions

Transition Rules

From	To	Condition	Resets
KO	Warn	`riseCountKO` consecutive successes	Fall counters
Warn	OK	`riseCountWarn` consecutive successes	Fall counters
OK	Warn	`fallCountWarn` consecutive failures	Rise counters
Warn	KO	`fallCountKO` consecutive failures	Rise counters

Example Sequence

Configuration: FallCountWarn:2, FallCountKO:3, RiseCountKO:3, RiseCountWarn:2

Check 1: ✓ → OK
Check 2: ✗ → OK (1 failure)
Check 3: ✗ → Warn (2 failures, threshold reached)
Check 4: ✗ → Warn (1 KO failure)
Check 5: ✗ → Warn (2 KO failures)
Check 6: ✗ → KO (3 KO failures, threshold reached)
Check 7-9: ✓✓✓ → Warn (3 successes, KO threshold reached)
Check 10-11: ✓✓ → OK (2 successes, Warn threshold reached)

Best Practices

Configuration

Set CheckTimeout < IntervalCheck
Use faster IntervalFall for issue detection
Configure counts ≥ 2 to prevent flapping

Health Checks

Respect context timeout
Return specific errors
Keep checks lightweight
Handle transient failures

Lifecycle

Always call Stop() when done
Use defer for cleanup
Check IsRunning() before operations

Pool Management

Use pools for related monitors
Call UnregisterMetrics() on shutdown
Register Prometheus metrics early

API Reference

Monitor Interface

type Monitor interface {
    // Lifecycle
    Start(ctx context.Context) error
    Stop(ctx context.Context) error
    Restart(ctx context.Context) error
    IsRunning() bool
    
    // Configuration
    SetConfig(ctx context.Context, cfg Config) error
    GetConfig() Config
    
    // Health Check
    SetHealthCheck(hc HealthCheck)
    RegisterMiddleware(mw Middleware)
    
    // Status & Metrics
    Status() status.Status
    Latency() time.Duration
    Uptime() time.Duration
    Downtime() time.Duration
    
    // Info
    InfoGet() Info
    InfoMap() map[string]interface{}
    
    // Encoding
    MarshalText() ([]byte, error)
    MarshalJSON() ([]byte, error)
}

Pool Interface

type Pool interface {
    // Monitor Management
    MonitorAdd(mon Monitor) error
    MonitorGet(name string) Monitor
    MonitorDel(name string)
    MonitorList() []string
    
    // Lifecycle
    Start(ctx context.Context) error
    Stop(ctx context.Context) error
    Restart(ctx context.Context) error
    
    // Metrics
    RegisterMetrics(prom, log func) error
    UnregisterMetrics()
    
    // Shell
    GetShellCommand(ctx context.Context) []Command
}

Testing

Test Suite: 595 specs across 4 packages with 86.1% overall coverage

# Run all tests
go test ./...

# With coverage
go test -cover ./...

# With race detection (recommended)
CGO_ENABLED=1 go test -race ./...

Test Results

monitor/                122 specs    68.5% coverage   0.23s
monitor/info/           139 specs   100.0% coverage   0.12s
monitor/pool/           153 specs    76.2% coverage  11.78s
monitor/status/         181 specs    98.4% coverage   0.02s

Quality Assurance

✅ Zero data races (verified with -race)
✅ Thread-safe concurrent operations
✅ Comprehensive edge case testing
✅ Time-dependent behavior validation

See TESTING.md for detailed testing documentation.

Contributing

Contributions welcome! Please follow these guidelines:

Code Standards

Write tests for new features
Update documentation
Add GoDoc comments for public APIs
Run go fmt and go vet
Test with race detector (-race)

AI Usage Policy

DO NOT use AI tools to generate package code or core logic
DO use AI to assist with:
- Writing and improving tests
- Documentation and comments
- Debugging and bug fixes

All AI-assisted work must be reviewed and validated by a human maintainer.

Pull Request Process

Fork the repository
Create a feature branch
Write tests (coverage > 70%)
Update documentation
Run full test suite with race detection
Submit PR with clear description

Future Enhancements

Potential improvements under consideration:

Circuit Breaker Pattern: Automatic service isolation during failures
Distributed Monitoring: Cluster-wide health coordination
Historical Metrics: Long-term trend analysis
Custom Exporters: Support for other metrics systems (StatsD, InfluxDB)
Health Check Templates: Predefined checks for common services
Dynamic Thresholds: Adaptive thresholds based on historical data

Contributions and suggestions are welcome!

AI Transparency Notice

In accordance with Article 50.4 of the EU AI Act, AI assistance has been used for testing, documentation, and bug fixing under human supervision.

License

MIT License - See LICENSE file for details.

Resources

Issues: GitHub Issues
Documentation: GoDoc
Testing Guide: TESTING.md
Contributing: CONTRIBUTING.md

Related Packages:

context - Context management
runner - Ticker and lifecycle management
prometheus - Metrics export
status - Status aggregation

Version: Go 1.18+ on Linux, macOS, Windows
Maintained By: Monitor Package Contributors

Documentation ¶

Overview ¶

Package monitor provides a robust health check monitoring system with automatic status transitions, configurable thresholds, and comprehensive metrics tracking.

Overview ¶

The monitor package implements a sophisticated health monitoring system that periodically executes health checks and tracks the state of monitored components. It features:

Automatic status transitions with configurable thresholds (OK ↔ Warn ↔ KO)
Adaptive check intervals based on status (normal, rising, falling)
Comprehensive metrics (uptime, downtime, latency, rise/fall times)
Thread-safe concurrent operations
Prometheus metrics integration
Flexible configuration with validation
Middleware chain for extensibility

Status Transitions ¶

The monitor uses a three-state model with hysteresis to prevent flapping:

KO: Component is not healthy
Warn: Component is degraded but functional
OK: Component is fully healthy

Transitions between states require multiple consecutive successes or failures:

KO --[riseCountKO successes]--> Warn --[riseCountWarn successes]--> OK
OK --[fallCountWarn failures]--> Warn --[fallCountKO failures]--> KO

Basic Usage ¶

import (
	"context"
	"time"
	"github.com/nabbar/golib/monitor"
	"github.com/nabbar/golib/monitor/info"
	"github.com/nabbar/golib/monitor/types"
	"github.com/nabbar/golib/duration"
)

// Create info metadata
inf, err := info.New("database-monitor")
if err != nil {
	log.Fatal(err)
}

// Create monitor
mon, err := monitor.New(context.Background, inf)
if err != nil {
	log.Fatal(err)
}

// Configure monitor
cfg := types.Config{
	Name:          "database",
	CheckTimeout:  duration.ParseDuration(5 * time.Second),
	IntervalCheck: duration.ParseDuration(10 * time.Second),
	IntervalFall:  duration.ParseDuration(5 * time.Second),
	IntervalRise:  duration.ParseDuration(5 * time.Second),
	FallCountKO:   3,
	FallCountWarn: 2,
	RiseCountKO:   3,
	RiseCountWarn: 2,
}
if err := mon.SetConfig(context.Background, cfg); err != nil {
	log.Fatal(err)
}

// Register health check function
mon.SetHealthCheck(func(ctx context.Context) error {
	// Check database connectivity
	return db.PingContext(ctx)
})

// Start monitoring
if err := mon.Start(context.Background()); err != nil {
	log.Fatal(err)
}
defer mon.Stop(context.Background())

// Query status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Latency: %s\n", mon.Latency())
fmt.Printf("Uptime: %s\n", mon.Uptime())

Configuration ¶

The monitor supports extensive configuration:

CheckTimeout: Maximum duration for a health check to complete (min: 5s)
IntervalCheck: Interval between checks in normal state (min: 1s)
IntervalFall: Interval when status is falling (min: 1s, default: IntervalCheck)
IntervalRise: Interval when status is rising (min: 1s, default: IntervalCheck)
FallCountKO: Failures needed to go from Warn to KO (min: 1)
FallCountWarn: Failures needed to go from OK to Warn (min: 1)
RiseCountKO: Successes needed to go from KO to Warn (min: 1)
RiseCountWarn: Successes needed to go from Warn to OK (min: 1)

All values are automatically normalized to their minimums if set below threshold.

Metrics Tracking ¶

The monitor tracks comprehensive timing metrics:

Latency: Duration of the last health check execution
Uptime: Total time in OK status
Downtime: Total time in KO or Warn status
RiseTime: Total time spent transitioning to better status
FallTime: Total time spent transitioning to worse status

Prometheus Integration ¶

The monitor can export metrics to Prometheus:

import "github.com/nabbar/golib/prometheus"

// Register metric names
mon.RegisterMetricsName("my_service_health")

// Register collection function
mon.RegisterCollectMetrics(prometheusCollector)

// Metrics are automatically collected after each health check

Encoding Support ¶

The monitor supports multiple encoding formats:

// Text encoding
text, _ := mon.MarshalText()
fmt.Println(string(text))
// Output: OK: database (version: 1.0) | 5ms / 1h30m / 0s

// JSON encoding
json, _ := mon.MarshalJSON()

Thread Safety ¶

All monitor operations are thread-safe and can be called concurrently from multiple goroutines. The monitor uses fine-grained locking to minimize contention while ensuring data consistency.

Best Practices ¶

1. Configure appropriate check intervals to balance responsiveness and resource usage 2. Set fall/rise counts to prevent status flapping during temporary issues 3. Use shorter intervals during transitions (IntervalFall/Rise) for faster detection 4. Set CheckTimeout lower than IntervalCheck to prevent overlapping checks 5. Register a logger for debugging and troubleshooting 6. Always call Stop() when shutting down to clean up resources

Error Handling ¶

The monitor defines several error codes:

ErrorParamEmpty: Empty parameter provided
ErrorMissingHealthCheck: No health check function registered
ErrorValidatorError: Configuration validation failed
ErrorLoggerError: Logger initialization failed
ErrorTimeout: Operation timeout
ErrorInvalid: Invalid monitor instance

All errors implement the liberr.Error interface for structured error handling.

Related Packages ¶

github.com/nabbar/golib/monitor/info: Dynamic metadata management
github.com/nabbar/golib/monitor/status: Health status type
github.com/nabbar/golib/monitor/types: Type definitions and interfaces
github.com/nabbar/golib/monitor/pool: Monitor pool management

Example ¶

Example demonstrates a complete monitor setup and execution.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

// Create info
inf, _ := moninf.New("example-service")

// Create monitor
mon, _ := libmon.New(context.Background(), inf)

// Configure monitor
cfg := montps.Config{
	Name:          "example-monitor",
	CheckTimeout:  libdur.ParseDuration(2 * time.Second),
	IntervalCheck: libdur.ParseDuration(1 * time.Second),
	RiseCountKO:   2,
	RiseCountWarn: 2,
	FallCountKO:   2,
	FallCountWarn: 2,
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

// Set health check function
mon.SetHealthCheck(func(ctx context.Context) error {
	// Simulate health check
	return nil
})

// Start monitoring
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)

// Check status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Running: %v\n", mon.IsRunning())

// Stop monitoring
_ = mon.Stop(ctx)
fmt.Printf("Running after stop: %v\n", mon.IsRunning())

Output:

Status: KO
Running: true
Running after stop: false

Index ¶

Constants
func New(ctx context.Context, info montps.Info) (montps.Monitor, error)
type Encode
type Monitor

Constants ¶

View Source

const (
	// ErrorParamEmpty indicates an empty parameter was provided.
	ErrorParamEmpty liberr.CodeError = iota + liberr.MinPkgMonitor
	// ErrorMissingHealthCheck indicates no health check function was registered.
	ErrorMissingHealthCheck
	// ErrorValidatorError indicates configuration validation failed.
	ErrorValidatorError
	// ErrorLoggerError indicates logger initialization failed.
	ErrorLoggerError
	// ErrorTimeout indicates a timeout occurred during an operation.
	ErrorTimeout
	// ErrorInvalid indicates an invalid monitor instance.
	ErrorInvalid
)

View Source

const (

	// Log field constants for structured logging
	LogFieldProcess = "process"
	LogValueProcess = "monitor"
	LogFieldName    = "name"
)

View Source

const (
	// MaxPoolStart is the maximum time to wait for the monitor to start.
	MaxPoolStart = 3 * time.Second
	// MaxTickPooler is the polling interval when waiting for the monitor to start.
	MaxTickPooler = 5 * time.Millisecond
)

Variables ¶

This section is empty.

Functions ¶

func New ¶

func New(ctx context.Context, info montps.Info) (montps.Monitor, error)

New creates a new Monitor instance with the given context provider and info. The ctx parameter provides a function that returns the current context. The info parameter provides metadata about the monitored component. Returns an error if info is nil.

Example:

inf, _ := info.New("my-service")
mon, err := monitor.New(context.Background, inf)
if err != nil {
    log.Fatal(err)
}

Example ¶

ExampleNew demonstrates creating a new monitor instance.

package main

import (
	"context"
	"fmt"

	libmon "github.com/nabbar/golib/monitor"

	moninf "github.com/nabbar/golib/monitor/info"
)

func main() {
	// Create info metadata
	inf, err := moninf.New("database-monitor")
	if err != nil {
		panic(err)
	}

	// Create monitor
	mon, err := libmon.New(context.Background(), inf)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Monitor created: %s\n", mon.Name())
}

Output:

Monitor created: not named

Types ¶

type Encode ¶

type Encode interface {
	String() string // Returns a human-readable string representation
	Bytes() []byte  // Returns the byte representation of the string
}

Encode provides methods for converting monitor state to different formats.

type Monitor ¶

type Monitor interface {
	montps.Monitor
}

Monitor is the interface that wraps all monitor functionalities. It extends montps.Monitor and provides health check monitoring capabilities.

Example (Metrics) ¶

ExampleMonitor_metrics demonstrates collecting metrics.

ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()

inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)

cfg := montps.Config{
	Name:          "my-service",
	CheckTimeout:  libdur.ParseDuration(5 * time.Second),
	IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

mon.SetHealthCheck(func(ctx context.Context) error {
	time.Sleep(10 * time.Millisecond)
	return nil
})

_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
_ = mon.Stop(ctx)

// Collect metrics
latency := mon.Latency()
uptime := mon.Uptime()
downtime := mon.Downtime()

fmt.Printf("Latency recorded: %v\n", latency > 0)
fmt.Printf("Uptime recorded: %v\n", uptime >= 0)
fmt.Printf("Downtime recorded: %v\n", downtime >= 0)

Output:

Latency recorded: true
Uptime recorded: true
Downtime recorded: true

Example (Transitions) ¶

ExampleMonitor_transitions demonstrates status transitions.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)

cfg := montps.Config{
	Name:          "transition-demo",
	CheckTimeout:  libdur.ParseDuration(5 * time.Second),
	IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
	RiseCountKO:   1,
	RiseCountWarn: 1,
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

mon.SetHealthCheck(func(ctx context.Context) error {
	return nil // Always healthy
})

_ = mon.Start(ctx)
time.Sleep(100 * time.Millisecond)

fmt.Printf("Initial: KO=%v\n", mon.Status().String() == "KO")

time.Sleep(300 * time.Millisecond)
fmt.Printf("Rising: %v\n", mon.IsRise() || mon.Status().String() != "KO")

_ = mon.Stop(ctx)

Output:

Initial: KO=true
Rising: true

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
info Package info provides a thread-safe, caching implementation for monitor information.	Package info provides a thread-safe, caching implementation for monitor information.
pool Package pool provides a thread-safe pool implementation for managing multiple health monitors.	Package pool provides a thread-safe pool implementation for managing multiple health monitors.
status Package status provides a robust enumeration type for representing monitor health status.	Package status provides a robust enumeration type for representing monitor health status.
types Package types provides core type definitions and interfaces for the monitor system.	Package types provides core type definitions and interfaces for the monitor system.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Monitor Package

Table of Contents

Overview

Design Philosophy

Value Proposition

Key Features

Installation

Architecture

Package Structure

Component Hierarchy

Status Transition Model

Quick Start

Basic Monitor

Monitor Pool

Performance

Use Cases

1. Microservice Health Monitoring

2. Database Connection Pooling

3. External Service Dependencies

4. Kubernetes Probes

5. Custom Middleware

Subpackages

monitor (Core)

pool

info

status

types

Configuration

Status Transitions

Transition Rules

Example Sequence

Best Practices

Configuration

Health Checks

Lifecycle

Pool Management

API Reference

Monitor Interface

Pool Interface

Testing

Contributing

Code Standards

AI Usage Policy

Pull Request Process

Future Enhancements

AI Transparency Notice

License

Resources

Documentation ¶

Overview ¶

Overview ¶

Status Transitions ¶

Basic Usage ¶

Configuration ¶

Metrics Tracking ¶

Prometheus Integration ¶

Encoding Support ¶

Thread Safety ¶

Best Practices ¶

Error Handling ¶

Related Packages ¶

Index ¶

Examples ¶

Constants ¶

Variables ¶

Functions ¶

func New ¶

Types ¶

type Encode ¶

type Monitor ¶

Source Files ¶

Directories ¶