monitor

package
v1.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2026 License: MIT Imports: 20 Imported by: 0

README

Monitor Package

License Go Version Coverage

The Monitor Package is a high-performance, production-ready health monitoring system for Go applications. It provides a robust framework for tracking the operational status of internal components and external dependencies using an intelligent state machine with built-in hysteresis and lock-free metrics reporting.


Table of Contents


Overview

The monitor package is designed to provide "observation without interference". It allows developers to register periodic health checks that automatically transition through health states (OK, Warn, KO) based on configurable failure and recovery thresholds.

Design Philosophy
  1. Lock-Free Hot Path: Reading the status or metrics of a monitor is architected using atomic operations, ensuring zero contention even under thousands of concurrent requests.
  2. Dampened Transitions: Hysteresis logic prevents "alert flapping" by requiring consecutive successes or failures before triggering a state change.
  3. Context-Aware: Every health check execution is bounded by a context timeout, ensuring that hanging diagnostics do not block the system.
  4. Middleware-First: Execution is wrapped in a LIFO stack, allowing for easy injection of tracing, logging, or custom metrics logic.
Key Features
  • Three-State Machine: Full lifecycle tracking (OK ↔ Warn ↔ KO).
  • Adaptive Ticker: Dynamically adjusts polling frequency during transition phases (Rise/Fall).
  • Atomic Metrics: High-precision tracking of Latency, Uptime, Downtime, and Transition times.
  • Prometheus Integration: Built-in dispatching logic for automated metrics exporting.
  • Metadata Management: Dynamic runtime information through the info subpackage.
  • Zero-Allocation Reads: Optimized memory path for high-frequency status polling.
Key Benefits
  • vs Standard Tickers: Provides a complete state machine and metrics container out-of-the-box, rather than just a periodic trigger.
  • vs Basic Maps: Thread-safety is guaranteed through atomic primitives rather than global mutexes, offering superior scalability on multi-core systems.

Architecture

Package Structure
monitor/
├── monitor.go               # Implementation of the core Monitor orchestrator
├── interface.go             # Public interface definitions and factory
├── model.go                 # Internal structures and atomic containers
├── last.go                  # High-performance metrics & status storage
├── server.go                # Ticker runner and periodic execution logic
├── internalConfig.go        # Configuration normalization and validation
├── middleware.go            # Execution pipeline implementation
├── encode.go                # JSON/Text serialization logic
├── doc.go                   # GoDoc package documentation
│
├── info/                    # Metadata management sub-package
├── pool/                    # Group management and batch operations
├── status/                  # Status enumeration and multi-format parsing
└── types/                   # Cross-package shared interfaces
Package Architecture

The monitor uses a Split-State Architecture. Configuration and Metadata are stored in thread-safe but high-level containers, while operational metrics are stored in a dedicated lastRun structure using low-level sync/atomic primitives.

[ Monitor Instance ]
       |
       +--- [ Config Context ] ---> (Logger, Ticker Intervals, Thresholds)
       |
       +--- [ Metadata Container ] ---> (Atomic Name, Version, Custom Data)
       |
       +--- [ Background Runner ] ---> (Ticker Goroutine)
       |
       +--- [ Performance Metrics ] ---> (Atomic Status, Latency, Uptime Counters)
Dataflow

The periodic check cycle follows a structured pipeline:

1. Ticker Tick --------> 2. Interval Resolver ----> 3. Middleware Stack
                               |                          |
   (Adjusts speed if           |                          |-- [ mdlStatus ] (Start Time)
    Rising or Falling)         |                          |-- [ User Function ] (Diagnostic)
                               |                          |-- [ mdlStatus ] (Set Result)
                               |                          |
4. Metrics Export <---- 5. State Transition <-------------+
      |                 (Update Counters & Status)
      v
[ Prometheus / Logs ]

Performance

The monitor is optimized for zero-contention on the read path. The following benchmarks were captured on an Intel Core i7-4700HQ.

Operation Performance Memory Efficiency
Status Read ~3.14 ns/op 0 B/op Zero Garbage
Latency Read ~2.23 ns/op 0 B/op Zero Garbage
Concurrent Read ~0.85 ns/op 0 B/op Linear Scaling
Check Execution ~15.0 µs/op 448 B/op Low Overhead
Configuration ~49.4 µs/op 24.8 KB/op Administrative Path

Note: Status and Metric reads are lock-free and do not produce pressure on the Garbage Collector.


Subpackages

info

Dynamic metadata management. It allows attaching functions to retrieve runtime data (like version or git hash) only when requested.

pool

Manages monitor groups. Provides batch control (Start/Stop all) and aggregated Prometheus exporters.

status

Type-safe status enumeration. Handles conversions and multi-format marshalling (JSON/YAML/TOML).


Use Cases

1. External API Resilience

Monitor third-party services with "dampened" transitions to avoid false alarms on transient glitches.

cfg := types.Config{
    FallCountWarn: 3, // Ignore isolated failures
    IntervalCheck: duration.ParseDuration("30s"),
}
2. High-Frequency Telemetry

Feed Prometheus scrapers or liveness probes using the lock-free read path without impacting diagnostic performance.

// Atomic read (~3ns) - No impact on system latency
status := mon.Status() 

Quick Start

import (
    "context"
    "github.com/nabbar/golib/monitor"
    "github.com/nabbar/golib/monitor/info"
    "github.com/nabbar/golib/monitor/types"
    "github.com/nabbar/golib/duration"
)

func main() {
    inf, _ := info.New("api-service")
    mon, _ := monitor.New(context.Background(), inf)
    
    _ = mon.SetConfig(context.Background(), types.Config{
        IntervalCheck: duration.ParseDuration("10s"),
        FallCountKO: 3,
    })
    
    mon.SetHealthCheck(func(ctx context.Context) error {
        return db.PingContext(ctx)
    })
    
    _ = mon.Start(context.Background())
    defer mon.Stop(context.Background())
}

Best Practices

✅ DO
  • Use Eventually in tests: Since monitoring is asynchronous, use non-blocking matchers.
  • Respect Context: Ensure your diagnostic function honors the ctx provided to handle timeouts.
  • Register Metrics Early: Association with Prometheus should be done during initialization.
❌ DON'T
  • Don't use time.Sleep: The monitor orchestrator already handles intervals.
  • Don't block the Read Path: The package provides atomic counters; do not wrap them in heavy mutex-guarded logic.

API Reference

1. Primary Factory
Function Parameters Returns Description
New ctx (Context), nfo (Info) (Monitor, error) Initializes a thread-safe monitor instance.
2. Monitor Interface

The Monitor interface aggregates multiple specialized behaviors.

Lifecycle Methods
Method Parameters Returns Description
Start ctx error Launches the background ticker. Waits for operational confirmation.
Stop ctx error Halts the background ticker and waits for current check completion.
Restart ctx error Performs a synchronized full Stop followed by a Start cycle.
IsRunning - bool Thread-safe check of the background runner status.
Configuration & Core Logic
Method Parameters Returns Description
SetConfig ctx, cfg (Config) error Validates and applies runtime parameters and logging options.
GetConfig - Config Returns a deep-copy snapshot of the current effective configuration.
SetHealthCheck fct (HealthCheck) - Registers the function responsible for the component diagnostic.
GetHealthCheck - HealthCheck Retrieves the currently registered diagnostic function.
Clone ctx (Monitor, error) Deep copy of the monitor instance, inheriting state and running status.
Status & State (MonitorStatus)
Method Returns Performance Description
Status status.Status ~3ns Atomic retrieval of current health (OK/Warn/KO).
Latency time.Duration ~2ns Atomic duration of the last executed health check.
Uptime time.Duration ~2ns Total cumulative duration spent in the OK health status.
Downtime time.Duration ~2ns Total cumulative duration spent in Warn or KO statuses.
Message string - Returns the last error or status message captured during execution.
IsRise bool - Reports if the monitor is currently recovering from a degraded state.
IsFall bool - Reports if the monitor is currently degrading toward a failure state.
Metrics & Prometheus (MonitorMetrics)
Method Parameters Description
RegisterMetricsName ...string Defines the Prometheus metric identifiers for this monitor instance.
RegisterMetricsAddName ...string Appends new identifiers to the existing list (handles de-duplication).
RegisterCollectMetrics FuncCollect Associates a provider function for metrics extraction during scrape cycles.
Metadata Management (MonitorInfo)
Method Returns Description
InfoName string Atomic retrieval of the monitor descriptive name.
InfoMap map[string]any Dynamic retrieval of component metadata (version, env, etc.).
InfoUpd - Thread-safe update of the monitor metadata implementation.
3. Config Structure (types.Config)
Field Type Default Description
Name string "not named" Unique identifier for logging and metrics.
CheckTimeout Duration 5s Maximum allowed execution time for a single HealthCheck.
IntervalCheck Duration 1s Normal polling frequency when the status is stable.
IntervalFall Duration 1s Polling frequency adjustment during degradation (Fall phase).
IntervalRise Duration 1s Polling frequency adjustment during recovery (Rise phase).
FallCountKO uint8 1 Consecutive failures required to transition from Warn to KO.
FallCountWarn uint8 1 Consecutive failures required to transition from OK to Warn.
RiseCountKO uint8 1 Consecutive successes required to transition from KO to Warn.
RiseCountWarn uint8 1 Consecutive successes required to transition from Warn to OK.
Logger Options - Integrated structured logging configuration.

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Code Quality
  • Follow Go best practices and idioms
  • Maintain or improve code coverage (target: >80%)
  • Pass all tests including race detector
  • Use gofmt, golangci-lint and gosec
  1. AI Usage Policy
  • AI must NEVER be used to generate package code or core functionality
  • AI assistance is limited to:
    • Testing (writing and improving tests)
    • Debugging (troubleshooting and bug resolution)
    • Documentation (comments, README, TESTING.md)
  • All AI-assisted work must be reviewed and validated by humans
  1. Testing
  • Add tests for new features
  • Use Ginkgo v2 / Gomega for test framework
  • Ensure zero race conditions
  • Maintain coverage above 80%
  1. Documentation
  • Update GoDoc comments for public APIs
  • Add examples for new features
  • Update README.md and TESTING.md if needed
  1. Pull Request Process
  • Fork the repository
  • Create a feature branch
  • Write clear commit messages
  • Ensure all tests pass
  • Update documentation
  • Submit PR with description of changes

Resources

Documentation
  • TESTING.md: Exhaustive test inventory, performance benchmarks, and CPU/Memory profiling data.
Monitoring Standards & Industry References
Summary

These resources provide the context for why this package exists. The Monitor Package provides the Go implementation of the Hysteresis and State Machine patterns used by orchestrators like Kubernetes, with the Atomic-Read performance required for SRE-grade telemetry.


AI Transparency

In compliance with EU AI Act Article 50.4: AI assistance was used for performance profiling, test inventory generation, and documentation synchronization under human supervision. Core monitoring logic is human-designed and validated.


License

MIT License - See LICENSE file for details.

Copyright (c) 2022-2025 Nicolas JUHEL

Documentation

Overview

Package monitor provides a high-performance, thread-safe health monitoring framework for Go applications. It is designed to track the operational status of internal and external components (databases, APIs, microservices) using a robust state machine that handles health transitions with built-in hysteresis to prevent status flapping.

Core Philosophy: Performance & Resilience

The monitor is architected for zero-contention status reporting and reliable periodic execution. It separates configuration management from the performance-critical status reporting path.

Key design principles:

  • Atomic Status Reporting: Reads (Status, Latency, Uptime) use lock-free atomic primitives.
  • Dampened Transitions: Configurable Fall/Rise thresholds prevent noise during transient failures.
  • Dynamic Polling: Intervals automatically adjust based on the current health state (Rise/Fall/Stable).
  • Middleware Pipeline: Extensible execution chain for logging, metrics, and tracing.

Internal Architecture & Data Flow

The monitor operates as a background orchestrator managed by an atomic ticker runner.

Internal Dataflow Diagram:

[ Ticker Loop ] <-------------------------------------------+
      |                                                     |
      v                                                     |
[ Interval Resolver ] --(IntervalCheck/Fall/Rise)-----------+
      |
      v
[ Middleware Chain ]
      |-- (mdlStatus) --+--> [ Start Timer ]
      |                 |
      |-- (User Fct) ---+--> [ Health Check Execution ]
      |                 |
      |-- (mdlStatus) --+--> [ Stop Timer & Capture Latency ]
      |                 |
      |                 +--> [ State Machine Transition Logic ]
      |                 |
      |                 +--> [ Atomic Update of Metrics Container ]
      v
[ Metrics Dispatch ] --(RegisterCollectMetrics)--> [ Prometheus / Loggers ]

State Machine & Hysteresis

The monitor implements a 3-state machine (OK, Warn, KO) with directional transition counters.

Transition State Diagram:

+--------+   (Fail >= fallCountWarn)   +--------+   (Fail >= fallCountKO)   +--------+
|   OK   | --------------------------> |  Warn  | ------------------------> |   KO   |
+--------+ <-------------------------- +--------+ <------------------------ +--------+
           (Succ >= riseCountWarn)                (Succ >= riseCountKO)

Transition Logic Table:

Current Status | Event   | Counter Logic               | Transition Action
---------------|---------|-----------------------------|---------------------------
OK             | Failure | cntFall++                   | If cntFall >= fallCountWarn: -> Warn
OK             | Success | cntFall=0, cntRise=0        | Stay OK (Uptime++)
Warn           | Failure | cntFall++                   | If cntFall >= fallCountKO:   -> KO
Warn           | Success | cntRise++                   | If cntRise >= riseCountWarn: -> OK
KO             | Success | cntRise++                   | If cntRise >= riseCountKO:   -> Warn
KO             | Failure | cntRise=0, cntFall=0        | Stay KO (Downtime++)

High-Performance Read Path

Status retrieval methods (Status, Latency, Uptime, Downtime) are optimized for high-frequency polling (e.g., thousands of reads per second from a metrics exporter or a load-balancer probe).

Implementation Details:

  • No Mutexes: The "hot path" for reads uses atomic.LoadUint64/Int64 from the metrics container.
  • Zero Allocations: Status and metric reads perform no heap allocations.
  • Near-Zero Latency: Typical read latency is in the single-digit nanosecond range.

Middleware Extensibility

The health check execution uses a LIFO (Last-In-First-Out) middleware stack. Middlewares can intercept the execution to perform pre/post actions.

Execution Stack Example:

  1. [ mdlStatus ] (Core: Latency & State logic)
  2. [ CustomLogger ] (Optional: logs failures)
  3. [ OpenTelemetry ] (Optional: traces health check)
  4. [ User HealthCheck Function ] (The actual diagnostic)

Sub-Packages & Modules

  • info: Metadata management (name, version, environment data).
  • status: Status enumeration and parsing (OK, Warn, KO, Unknown).
  • types: Public interfaces and configuration structures.
  • pool: (Optional) Management for collections of monitors.

Usage Example: Database Health Monitoring

import (
	"context"
	"github.com/nabbar/golib/monitor"
	"github.com/nabbar/golib/monitor/info"
	"github.com/nabbar/golib/monitor/types"
	"github.com/nabbar/golib/duration"
)

func setupMonitor(db *sql.DB) types.Monitor {
	// 1. Initialize Info with metadata
	inf, _ := info.New("postgres-db")

	// 2. Create the monitor instance
	mon, _ := monitor.New(context.Background(), inf)

	// 3. Configure intervals and thresholds
	cfg := types.Config{
		Name:          "main-database",
		CheckTimeout:  duration.ParseDuration("5s"),
		IntervalCheck: duration.ParseDuration("30s"), // Normal polling frequency
		IntervalFall:  duration.ParseDuration("2s"),  // Aggressive polling when failing
		FallCountWarn: 2,                             // 2 consecutive failures to trigger 'Warn'
		FallCountKO:   3,                             // 3 more failures to trigger 'KO'
	}
	_ = mon.SetConfig(context.Background(), cfg)

	// 4. Register the actual check logic
	mon.SetHealthCheck(func(ctx context.Context) error {
		return db.PingContext(ctx)
	})

	// 5. Start the background runner
	_ = mon.Start(context.Background())

	return mon
}

Thread Safety & Concurrency

Every component is safe for concurrent use. Configuration updates (SetConfig) and Metadata updates (InfoUpd) use atomic swaps or thread-safe containers to ensure that a configuration reload never causes a race condition or performance degradation for the periodic runner or status readers.

Prometheus & Metrics Integration

The package is designed to integrate seamlessly with Prometheus via the RegisterCollectMetrics method. At the end of each diagnostic run, the monitor dispatches its current state to the registered collector, updating Gauges for latency, status code, and transition timers.

Package monitor provides a robust, thread-safe framework for periodic health monitoring of components. It supports state machine transitions (OK, Warn, KO), middleware execution chains, and integrated metrics collection for Prometheus.

Example

Example demonstrates a complete monitor setup and execution.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

// Create info
inf, _ := moninf.New("example-service")

// Create monitor
mon, _ := libmon.New(context.Background(), inf)

// Configure monitor
cfg := montps.Config{
	Name:          "example-monitor",
	CheckTimeout:  libdur.ParseDuration(2 * time.Second),
	IntervalCheck: libdur.ParseDuration(1 * time.Second),
	RiseCountKO:   2,
	RiseCountWarn: 2,
	FallCountKO:   2,
	FallCountWarn: 2,
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

// Set health check function
mon.SetHealthCheck(func(ctx context.Context) error {
	// Simulate health check
	return nil
})

// Start monitoring
_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)

// Check status
fmt.Printf("Status: %s\n", mon.Status())
fmt.Printf("Running: %v\n", mon.IsRunning())

// Stop monitoring
_ = mon.Stop(ctx)
fmt.Printf("Running after stop: %v\n", mon.IsRunning())
Output:
Status: KO
Running: true
Running after stop: false

Index

Examples

Constants

View Source
const (
	// ErrorParamEmpty indicates an empty parameter was provided.
	ErrorParamEmpty liberr.CodeError = iota + liberr.MinPkgMonitor
	// ErrorMissingHealthCheck indicates no health check function was registered.
	ErrorMissingHealthCheck
	// ErrorValidatorError indicates configuration validation failed.
	ErrorValidatorError
	// ErrorLoggerError indicates logger initialization failed.
	ErrorLoggerError
	// ErrorTimeout indicates a timeout occurred during an operation.
	ErrorTimeout
	// ErrorInvalid indicates an invalid monitor instance.
	ErrorInvalid
)
View Source
const (

	// Structured logging constants for consistency across monitor log entries.
	LogFieldProcess = "process" // LogFieldProcess is the field key for identifying the monitor process in logs.
	LogValueProcess = "monitor" // LogValueProcess is the constant value for the process field.
	LogFieldName    = "name"    // LogFieldName is the field key for the monitor instance name.
)
View Source
const (
	// MaxPoolStart defines the maximum duration the Start method will wait for the internal ticker to become active.
	MaxPoolStart = 3 * time.Second

	// MaxTickPooler defines the interval at which the Start method checks if the internal ticker has successfully started.
	MaxTickPooler = 5 * time.Millisecond
)

Variables

This section is empty.

Functions

func New

func New(ctx context.Context, info montps.Info) (montps.Monitor, error)

New initializes and returns a new Monitor instance.

Parameters:

  • ctx: The base context used for the monitor's internal state management. If nil, context.Background() is used.
  • info: An implementation of montps.Info containing metadata (name, version, etc.) for the monitored component.

The returned monitor is initialized in a stopped state with default configuration values. It uses an atomic internal structure to ensure thread-safety across all operations.

Returns:

  • montps.Monitor: A thread-safe monitor instance.
  • error: Returns an error if the mandatory 'info' parameter is nil.

Example:

inf, _ := info.New("database-service")
mon, err := monitor.New(context.Background(), inf)
if err != nil {
    log.Fatalf("failed to create monitor: %v", err)
}
_ = mon.Start(context.Background())
Example

ExampleNew demonstrates creating a new monitor instance.

package main

import (
	"context"
	"fmt"

	libmon "github.com/nabbar/golib/monitor"

	moninf "github.com/nabbar/golib/monitor/info"
)

func main() {
	// Create info metadata
	inf, err := moninf.New("database-monitor")
	if err != nil {
		panic(err)
	}

	// Create monitor
	mon, err := libmon.New(context.Background(), inf)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Monitor created: %s\n", mon.Name())
}
Output:
Monitor created: not named

Types

type Encode

type Encode interface {
	// String returns a formatted, human-readable string representation of the monitor's state.
	// Format: "<STATUS>: <Name> (<Info>) | <Latency> / <Uptime> / <Downtime> | <Message>"
	String() string

	// Bytes returns the byte slice representation of the string generated by String().
	Bytes() []byte
}

Encode is an interface that defines the contract for converting a monitor's current state into various formats.

Implementation Note: It is primarily used to generate human-readable strings or byte slices for logging, reporting, or API responses while ensuring consistency across formats.

type Monitor

type Monitor interface {
	montps.Monitor
}

Monitor is the primary interface for managing component health checks. It embeds the base Monitor interface from the types package, which defines methods for configuration, lifecycle management (Start/Stop), and status retrieval.

Example (Metrics)

ExampleMonitor_metrics demonstrates collecting metrics.

ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()

inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)

cfg := montps.Config{
	Name:          "my-service",
	CheckTimeout:  libdur.ParseDuration(5 * time.Second),
	IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

mon.SetHealthCheck(func(ctx context.Context) error {
	time.Sleep(10 * time.Millisecond)
	return nil
})

_ = mon.Start(ctx)
time.Sleep(500 * time.Millisecond)
_ = mon.Stop(ctx)

// Collect metrics
latency := mon.Latency()
uptime := mon.Uptime()
downtime := mon.Downtime()

fmt.Printf("Latency recorded: %v\n", latency > 0)
fmt.Printf("Uptime recorded: %v\n", uptime >= 0)
fmt.Printf("Downtime recorded: %v\n", downtime >= 0)
Output:
Latency recorded: true
Uptime recorded: true
Downtime recorded: true
Example (Transitions)

ExampleMonitor_transitions demonstrates status transitions.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

inf, _ := moninf.New("service")
mon, _ := libmon.New(context.Background(), inf)

cfg := montps.Config{
	Name:          "transition-demo",
	CheckTimeout:  libdur.ParseDuration(5 * time.Second),
	IntervalCheck: libdur.ParseDuration(200 * time.Millisecond),
	RiseCountKO:   1,
	RiseCountWarn: 1,
	Logger:        lo.Clone(),
}
_ = mon.SetConfig(context.Background(), cfg)

mon.SetHealthCheck(func(ctx context.Context) error {
	return nil // Always healthy
})

_ = mon.Start(ctx)
time.Sleep(100 * time.Millisecond)

fmt.Printf("Initial: KO=%v\n", mon.Status().String() == "KO")

time.Sleep(300 * time.Millisecond)
fmt.Printf("Rising: %v\n", mon.IsRise() || mon.Status().String() != "KO")

_ = mon.Stop(ctx)
Output:
Initial: KO=true
Rising: true

Directories

Path Synopsis
Package info provides a robust and thread-safe metadata management system for monitored components.
Package info provides a robust and thread-safe metadata management system for monitored components.
Package pool provides a thread-safe pool implementation for managing multiple health monitors.
Package pool provides a thread-safe pool implementation for managing multiple health monitors.
Package status provides a robust enumeration type for representing monitor health status.
Package status provides a robust enumeration type for representing monitor health status.
Package types provides core type definitions and interfaces for the monitor system.
Package types provides core type definitions and interfaces for the monitor system.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL