health

package
v0.0.75 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 27, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ComponentCollectorManager    = "collector_manager"
	ComponentBufferQueue         = "buffer_queue"
	ComponentDakrTransport       = "dakr_transport"
	ComponentMpaServer           = "mpa_server"
	ComponentPrometheus          = "prometheus"
	ComponentMonitor             = "monitor"
	ComponentEBPFTracer          = "ebpf_tracer"
	ComponentPodCache            = "pod_cache"
	ComponentKarpenterDeployment = "karpenter_deployment"
)

Component name constants used for HealthManager registration.

Variables

This section is empty.

Functions

func BuildHeartbeatRequest added in v0.0.64

func BuildHeartbeatRequest(hm *HealthManager, clusterID string, operatorType gen.OperatorType, version, commit string, startTime time.Time) *gen.ReportHealthRequest

BuildHeartbeatRequest constructs a ReportHealthRequest from the current HealthManager state.

func BuildHeartbeatRequestFromReport added in v0.0.64

func BuildHeartbeatRequestFromReport(report map[string]ComponentStatus, clusterID string, operatorType gen.OperatorType, version, commit string, startTime time.Time) *gen.ReportHealthRequest

BuildHeartbeatRequestFromReport constructs a ReportHealthRequest from an already-built report map. Use this when you need to log and send the same snapshot to avoid a double lock acquisition on HealthManager.

Types

type ComponentResponse added in v0.0.63

type ComponentResponse struct {
	Status   string            `json:"status"`
	Message  string            `json:"message,omitempty"`
	Metadata map[string]string `json:"metadata,omitempty"`
}

ComponentResponse represents the JSON structure for /components/{component} responses

type ComponentStatus

type ComponentStatus struct {
	Status   HealthStatus
	Message  string
	Metadata map[string]string
}

ComponentStatus holds the health status, message, and metadata for a component

type HealthManager

type HealthManager struct {
	// contains filtered or unexported fields
}

func NewHealthManager

func NewHealthManager() *HealthManager

NewHealthManager creates a new HealthManager

func (*HealthManager) BuildReport

func (hm *HealthManager) BuildReport() map[string]ComponentStatus

BuildReport returns a snapshot of all component statuses

func (*HealthManager) CheckLiveness added in v0.0.63

func (hm *HealthManager) CheckLiveness() (map[string]ComponentStatus, error)

CheckLiveness returns the report and liveness error atomically under a single lock acquisition, avoiding TOCTOU between BuildReport and LivenessCheck.

func (*HealthManager) CheckReadiness added in v0.0.63

func (hm *HealthManager) CheckReadiness() (map[string]ComponentStatus, error)

CheckReadiness returns the report and readiness error atomically.

func (*HealthManager) ClearLivenessSuppression added in v0.0.63

func (hm *HealthManager) ClearLivenessSuppression()

ClearLivenessSuppression removes any active grace period so LivenessCheck resumes normal evaluation. Call this after collectors are back up.

func (*HealthManager) ClearReadinessSuppression added in v0.0.65

func (hm *HealthManager) ClearReadinessSuppression()

ClearReadinessSuppression removes any active readiness grace period so ReadinessCheck resumes normal evaluation.

func (*HealthManager) Deregister

func (hm *HealthManager) Deregister(name string)

Deregister removes a component from the health registry

func (*HealthManager) GetStatus

func (hm *HealthManager) GetStatus(name string) (ComponentStatus, bool)

GetStatus retrieves the current status for a component

func (*HealthManager) LivenessCheck added in v0.0.63

func (hm *HealthManager) LivenessCheck() error

LivenessCheck checks if all components are at least Degraded (not Unhealthy). During an active grace period (set via SuppressLiveness) it always returns nil so that planned restarts do not trigger pod kills.

func (*HealthManager) ReadinessCheck added in v0.0.63

func (hm *HealthManager) ReadinessCheck() error

ReadinessCheck checks if all required components are Healthy or Degraded.

func (*HealthManager) Register

func (hm *HealthManager) Register(name string)

Register adds a component to the health registry

func (*HealthManager) SetStandby added in v0.0.74

func (hm *HealthManager) SetStandby(standby bool)

SetStandby marks the pod as a standby (non-leader) replica. While in standby, ReadinessCheck passes unconditionally — the pod is healthy and ready to take over leadership, it just isn't running collectors yet. Call with false when leader election is won so normal readiness checks resume.

func (*HealthManager) SuppressLiveness added in v0.0.63

func (hm *HealthManager) SuppressLiveness(d time.Duration)

SuppressLiveness makes LivenessCheck pass unconditionally for the given duration. Use this before a planned collector restart so that the transient Unhealthy window does not trigger a pod kill. The grace period is cleared automatically when StartAll succeeds (via ClearLivenessSuppression) or when the deadline expires.

func (*HealthManager) SuppressReadiness added in v0.0.65

func (hm *HealthManager) SuppressReadiness(d time.Duration)

SuppressReadiness makes ReadinessCheck pass unconditionally for the given duration. Use this at startup so the pod can become ready while waiting for leader election. The grace period is cleared automatically when collectors start (via ClearReadinessSuppression) or when the deadline expires.

func (*HealthManager) UpdateStatus

func (hm *HealthManager) UpdateStatus(
	name string,
	status HealthStatus,
	message string,
	metadata map[string]string,
)

UpdateStatus updates the health status, message, and metadata for a component

type HealthResponse added in v0.0.63

type HealthResponse struct {
	Status     string                       `json:"status"`
	Error      string                       `json:"error,omitempty"`
	Components map[string]ComponentResponse `json:"components,omitempty"`
}

HealthResponse represents the JSON structure for /healthz and /readyz responses

type HealthServer added in v0.0.63

type HealthServer struct {
	// contains filtered or unexported fields
}

HealthServer serves health and readiness endpoints

func NewHealthServer added in v0.0.63

func NewHealthServer(manager *HealthManager, addr string) *HealthServer

NewHealthServer creates a new HealthServer bound to the specified address

func (*HealthServer) Start added in v0.0.63

func (s *HealthServer) Start() error

Start begins serving health endpoints

func (*HealthServer) Stop added in v0.0.63

func (s *HealthServer) Stop(ctx context.Context) error

Stop gracefully shuts down the server

type HealthStatus

type HealthStatus int

HealthStatus matches proto enum for easy mapping

const (
	HealthStatusUnspecified HealthStatus = iota
	HealthStatusHealthy
	HealthStatusDegraded
	HealthStatusUnhealthy
)

HealthStatus values

func (HealthStatus) String added in v0.0.63

func (s HealthStatus) String() string

String returns a human-readable representation of the HealthStatus

type NodeOperatorMonitor added in v0.0.68

type NodeOperatorMonitor struct {
	// contains filtered or unexported fields
}

func NewNodeOperatorMonitor added in v0.0.68

func NewNodeOperatorMonitor(logger logr.Logger, clientset kubernetes.Interface, httpClient *http.Client) *NodeOperatorMonitor

func (*NodeOperatorMonitor) BuildNodeOperatorReport added in v0.0.68

func (m *NodeOperatorMonitor) BuildNodeOperatorReport(ctx context.Context) (map[string]ComponentStatus, string, string, time.Time)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL