health

package
v0.9.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package health provides health monitoring for vMCP backend MCP servers.

This package implements the HealthChecker interface and provides periodic health monitoring with configurable intervals and failure thresholds.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsHealthCheck

func IsHealthCheck(ctx context.Context) bool

IsHealthCheck returns true if the context is marked as a health check. Authentication strategies use this to bypass authentication for health checks, since health checks verify backend availability and should not require user credentials. Returns false for nil contexts.

func NewHealthChecker

func NewHealthChecker(
	client vmcp.BackendClient,
	timeout time.Duration,
	degradedThreshold time.Duration,
) vmcp.HealthChecker

NewHealthChecker creates a new health checker that uses BackendClient.ListCapabilities as the health check mechanism. This validates the full MCP communication stack: network connectivity, MCP protocol compliance, authentication, and responsiveness.

Parameters:

  • client: BackendClient for communicating with backend MCP servers
  • timeout: Maximum duration for health check operations (0 = no timeout)
  • degradedThreshold: Response time threshold for marking backend as degraded (0 = disabled)

Returns a new HealthChecker implementation.

func WithHealthCheckMarker

func WithHealthCheckMarker(ctx context.Context) context.Context

WithHealthCheckMarker marks a context as a health check request. Authentication layers can use IsHealthCheck to identify and skip authentication for health check requests.

Types

type CircuitBreaker added in v0.9.2

type CircuitBreaker interface {
	// RecordSuccess records a successful operation
	RecordSuccess()
	// RecordFailure records a failed operation
	RecordFailure()
	// CanAttempt checks if an operation should be allowed based on circuit state
	CanAttempt() bool
	// GetState returns the current state of the circuit breaker
	GetState() CircuitState
	// GetLastStateChange returns the time when the state last changed
	GetLastStateChange() time.Time
	// GetFailureCount returns the current failure count
	GetFailureCount() int
	// GetSnapshot returns an immutable snapshot of the circuit breaker state
	GetSnapshot() circuitBreakerSnapshot
}

CircuitBreaker defines the interface for circuit breaker implementations.

type CircuitBreakerConfig added in v0.9.2

type CircuitBreakerConfig struct {
	// Enabled controls whether circuit breaker is active.
	// +kubebuilder:default=false
	Enabled bool

	// FailureThreshold is the number of failures before opening the circuit.
	// +kubebuilder:validation:Minimum=1
	// +kubebuilder:default=5
	// Must be >= 1. Recommended: 5 failures.
	FailureThreshold int

	// Timeout is the duration to wait in open state before attempting recovery.
	// +kubebuilder:validation:Type=string
	// +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$"
	// +kubebuilder:default="60s"
	// Recommended: 60s.
	Timeout time.Duration
}

CircuitBreakerConfig contains circuit breaker configuration.

type CircuitState added in v0.9.2

type CircuitState string

CircuitState represents the state of a circuit breaker

const (
	// CircuitClosed indicates normal operation - requests pass through
	CircuitClosed CircuitState = "closed"
	// CircuitOpen indicates failing state - requests fail immediately
	CircuitOpen CircuitState = "open"
	// CircuitHalfOpen indicates recovery testing - limited requests allowed
	CircuitHalfOpen CircuitState = "half-open"
)

type Monitor

type Monitor struct {
	// contains filtered or unexported fields
}

Monitor performs periodic health checks on backend MCP servers. It runs background goroutines for each backend, tracking their health status and consecutive failure counts. The monitor supports graceful shutdown and provides thread-safe access to backend health information.

func NewMonitor

func NewMonitor(
	client vmcp.BackendClient,
	backends []vmcp.Backend,
	config MonitorConfig,
) (*Monitor, error)

NewMonitor creates a new health monitor for the given backends.

Parameters:

  • client: BackendClient for communicating with backend MCP servers
  • backends: List of backends to monitor
  • config: Configuration for health monitoring

Returns (monitor, error). Error is returned if configuration is invalid.

func (*Monitor) BuildStatus added in v0.8.3

func (m *Monitor) BuildStatus() *vmcp.Status

BuildStatus builds a vmcp.Status from the current health monitor state. This converts backend health information into the format needed for status reporting to the Kubernetes API or CLI output.

Phase determination: - Ready: All backends healthy, or no backends configured (cold start) - Pending: Backends configured but no health check data yet (waiting for first check) - Degraded: Some backends healthy, some degraded/unhealthy - Failed: No healthy backends (and at least one backend exists)

Returns a Status instance with current health information and discovered backends.

Takes a single snapshot of backend states to ensure internal consistency under concurrent updates.

func (*Monitor) GetAllBackendStates

func (m *Monitor) GetAllBackendStates() map[string]*State

GetAllBackendStates returns health states for all monitored backends. Returns a map of backend ID to State.

func (*Monitor) GetBackendState

func (m *Monitor) GetBackendState(backendID string) (*State, error)

GetBackendState returns the full health state for a backend. Returns (state, error). Error is returned if the backend is not being monitored.

func (*Monitor) GetBackendStatus

func (m *Monitor) GetBackendStatus(backendID string) (vmcp.BackendHealthStatus, error)

GetBackendStatus returns the current health status for a backend. Returns (status, error). Error is returned if the backend is not being monitored.

func (*Monitor) GetHealthSummary

func (m *Monitor) GetHealthSummary() Summary

GetHealthSummary returns a summary of backend health for logging/monitoring. Returns counts of healthy, degraded, unhealthy, and total backends.

func (*Monitor) IsBackendHealthy

func (m *Monitor) IsBackendHealthy(backendID string) bool

IsBackendHealthy returns true if the backend is currently healthy. Returns false if the backend is not being monitored or is unhealthy.

func (*Monitor) Start

func (m *Monitor) Start(ctx context.Context) error

Start begins health monitoring for all backends. This spawns a background goroutine for each backend that performs periodic health checks. Returns an error if the monitor is already started, has been stopped, or if the parent context is invalid.

The monitor respects the parent context for cancellation. When the parent context is cancelled, all health check goroutines will stop gracefully.

Note: A monitor cannot be restarted after it has been stopped. Create a new monitor instead.

func (*Monitor) Stop

func (m *Monitor) Stop() error

Stop gracefully stops health monitoring. This cancels all health check goroutines and waits for them to complete. Returns an error if the monitor was not started.

After stopping, the monitor cannot be restarted. Create a new monitor if needed.

func (*Monitor) UpdateBackends added in v0.8.3

func (m *Monitor) UpdateBackends(newBackends []vmcp.Backend)

UpdateBackends updates the list of backends being monitored. Starts monitoring new backends and stops monitoring removed backends. This method is safe to call while the monitor is running.

func (*Monitor) WaitForInitialHealthChecks added in v0.8.3

func (m *Monitor) WaitForInitialHealthChecks()

WaitForInitialHealthChecks blocks until all backends have completed their initial health check. This is useful for ensuring that health status is accurate before relying on it (e.g., before reporting initial status to an external system).

If the monitor was not started, this returns immediately (no initial checks to wait for). This method is safe to call multiple times and from multiple goroutines.

type MonitorConfig

type MonitorConfig struct {
	// CheckInterval is how often to perform health checks.
	// Must be > 0. Recommended: 30s.
	CheckInterval time.Duration

	// UnhealthyThreshold is the number of consecutive failures before marking unhealthy.
	// Must be >= 1. Recommended: 3 failures.
	UnhealthyThreshold int

	// Timeout is the maximum duration for a single health check operation.
	// Zero means no timeout (not recommended).
	Timeout time.Duration

	// DegradedThreshold is the response time threshold for marking a backend as degraded.
	// If a health check succeeds but takes longer than this duration, the backend is marked degraded.
	// Zero means disabled (backends will never be marked degraded based on response time alone).
	// Recommended: 5s.
	DegradedThreshold time.Duration

	// CircuitBreaker contains circuit breaker configuration.
	// nil means circuit breaker is disabled.
	CircuitBreaker *CircuitBreakerConfig
}

MonitorConfig contains configuration for the health monitor.

func DefaultConfig

func DefaultConfig() MonitorConfig

DefaultConfig returns sensible default configuration values.

type State

type State struct {
	// Status is the current health status.
	Status vmcp.BackendHealthStatus

	// ConsecutiveFailures is the number of consecutive failed health checks.
	ConsecutiveFailures int

	// LastCheckTime is when the last health check was performed.
	LastCheckTime time.Time

	// LastErrorCategory is a sanitized error category for API responses.
	// Values: "authentication_failed", "timeout", "connection_failed", "backend_unavailable", etc.
	// This field is safe to serialize and expose in API responses.
	LastErrorCategory string

	// LastError is the raw error encountered (if any).
	// DEPRECATED: This field may contain sensitive information (paths, URLs, credentials)
	// and should not be serialized to API responses. Use LastErrorCategory instead.
	// The json:"-" tag prevents this field from being included in JSON marshaling.
	LastError error `json:"-"`

	// LastTransitionTime is when the status last changed.
	LastTransitionTime time.Time

	// CircuitState is the current circuit breaker state.
	// When circuit breaker is disabled, this will be CircuitClosed (via alwaysClosedCircuit).
	CircuitState CircuitState

	// CircuitLastChanged is when the circuit breaker state last changed.
	// When circuit breaker is disabled, this will be zero time (via alwaysClosedCircuit).
	CircuitLastChanged time.Time
}

State is an immutable snapshot of a backend's health state. This is returned by GetState and GetAllStates to provide thread-safe access to health information without holding locks.

type Summary

type Summary struct {
	Total           int
	Healthy         int
	Degraded        int
	Unhealthy       int
	Unknown         int
	Unauthenticated int
}

Summary provides aggregate health statistics for all backends.

func (Summary) String

func (s Summary) String() string

String returns a human-readable summary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL