Documentation
¶
Index ¶
- Constants
- func BuildHeartbeatRequest(hm *HealthManager, clusterID string, operatorType gen.OperatorType, ...) *gen.ReportHealthRequest
- func BuildHeartbeatRequestFromReport(report map[string]ComponentStatus, clusterID string, ...) *gen.ReportHealthRequest
- type ComponentResponse
- type ComponentStatus
- type HealthManager
- func (hm *HealthManager) BuildReport() map[string]ComponentStatus
- func (hm *HealthManager) CheckLiveness() (map[string]ComponentStatus, error)
- func (hm *HealthManager) CheckReadiness() (map[string]ComponentStatus, error)
- func (hm *HealthManager) ClearLivenessSuppression()
- func (hm *HealthManager) ClearReadinessSuppression()
- func (hm *HealthManager) Deregister(name string)
- func (hm *HealthManager) GetStatus(name string) (ComponentStatus, bool)
- func (hm *HealthManager) LivenessCheck() error
- func (hm *HealthManager) ReadinessCheck() error
- func (hm *HealthManager) Register(name string)
- func (hm *HealthManager) SetStandby(standby bool)
- func (hm *HealthManager) SetTransitionObserver(obs TransitionObserver)
- func (hm *HealthManager) SuppressLiveness(d time.Duration)
- func (hm *HealthManager) SuppressReadiness(d time.Duration)
- func (hm *HealthManager) UpdateStatus(name string, status HealthStatus, message string, metadata map[string]string)
- type HealthResponse
- type HealthServer
- type HealthStatus
- type NodeOperatorMonitor
- type TransitionObserver
Constants ¶
const ( ComponentCollectorManager = "collector_manager" ComponentBufferQueue = "buffer_queue" ComponentDakrTransport = "dakr_transport" ComponentMpaServer = "mpa_server" ComponentPrometheus = "prometheus" ComponentMonitor = "monitor" ComponentEBPFTracer = "ebpf_tracer" ComponentPodCache = "pod_cache" ComponentKarpenterDeployment = "karpenter_deployment" )
Component name constants used for HealthManager registration.
Variables ¶
This section is empty.
Functions ¶
func BuildHeartbeatRequest ¶ added in v0.0.64
func BuildHeartbeatRequest(hm *HealthManager, clusterID string, operatorType gen.OperatorType, version, commit string, startTime time.Time) *gen.ReportHealthRequest
BuildHeartbeatRequest constructs a ReportHealthRequest from the current HealthManager state.
func BuildHeartbeatRequestFromReport ¶ added in v0.0.64
func BuildHeartbeatRequestFromReport(report map[string]ComponentStatus, clusterID string, operatorType gen.OperatorType, version, commit string, startTime time.Time) *gen.ReportHealthRequest
BuildHeartbeatRequestFromReport constructs a ReportHealthRequest from an already-built report map. Use this when you need to log and send the same snapshot to avoid a double lock acquisition on HealthManager.
Types ¶
type ComponentResponse ¶ added in v0.0.63
type ComponentResponse struct {
Status string `json:"status"`
Message string `json:"message,omitempty"`
Metadata map[string]string `json:"metadata,omitempty"`
}
ComponentResponse represents the JSON structure for /components/{component} responses
type ComponentStatus ¶
type ComponentStatus struct {
Status HealthStatus
Message string
Metadata map[string]string
}
ComponentStatus holds the health status, message, and metadata for a component
type HealthManager ¶
type HealthManager struct {
// contains filtered or unexported fields
}
func NewHealthManager ¶
func NewHealthManager() *HealthManager
NewHealthManager creates a new HealthManager
func (*HealthManager) BuildReport ¶
func (hm *HealthManager) BuildReport() map[string]ComponentStatus
BuildReport returns a snapshot of all component statuses
func (*HealthManager) CheckLiveness ¶ added in v0.0.63
func (hm *HealthManager) CheckLiveness() (map[string]ComponentStatus, error)
CheckLiveness returns the report and liveness error atomically under a single lock acquisition, avoiding TOCTOU between BuildReport and LivenessCheck.
func (*HealthManager) CheckReadiness ¶ added in v0.0.63
func (hm *HealthManager) CheckReadiness() (map[string]ComponentStatus, error)
CheckReadiness returns the report and readiness error atomically.
func (*HealthManager) ClearLivenessSuppression ¶ added in v0.0.63
func (hm *HealthManager) ClearLivenessSuppression()
ClearLivenessSuppression removes any active grace period so LivenessCheck resumes normal evaluation. Call this after collectors are back up.
func (*HealthManager) ClearReadinessSuppression ¶ added in v0.0.65
func (hm *HealthManager) ClearReadinessSuppression()
ClearReadinessSuppression removes any active readiness grace period so ReadinessCheck resumes normal evaluation.
func (*HealthManager) Deregister ¶
func (hm *HealthManager) Deregister(name string)
Deregister removes a component from the health registry
func (*HealthManager) GetStatus ¶
func (hm *HealthManager) GetStatus(name string) (ComponentStatus, bool)
GetStatus retrieves the current status for a component
func (*HealthManager) LivenessCheck ¶ added in v0.0.63
func (hm *HealthManager) LivenessCheck() error
LivenessCheck checks if all components are at least Degraded (not Unhealthy). During an active grace period (set via SuppressLiveness) it always returns nil so that planned restarts do not trigger pod kills.
func (*HealthManager) ReadinessCheck ¶ added in v0.0.63
func (hm *HealthManager) ReadinessCheck() error
ReadinessCheck checks if all required components are Healthy or Degraded.
func (*HealthManager) Register ¶
func (hm *HealthManager) Register(name string)
Register adds a component to the health registry
func (*HealthManager) SetStandby ¶ added in v0.0.74
func (hm *HealthManager) SetStandby(standby bool)
SetStandby marks the pod as a standby (non-leader) replica. While in standby, ReadinessCheck passes unconditionally — the pod is healthy and ready to take over leadership, it just isn't running collectors yet. Call with false when leader election is won so normal readiness checks resume.
func (*HealthManager) SetTransitionObserver ¶ added in v0.0.77
func (hm *HealthManager) SetTransitionObserver(obs TransitionObserver)
SetTransitionObserver registers (or clears, if nil) a callback invoked on every component status transition. Only one observer is held at a time. The observer runs outside the lock so it may safely re-enter the HealthManager.
func (*HealthManager) SuppressLiveness ¶ added in v0.0.63
func (hm *HealthManager) SuppressLiveness(d time.Duration)
SuppressLiveness makes LivenessCheck pass unconditionally for the given duration. Use this before a planned collector restart so that the transient Unhealthy window does not trigger a pod kill. The grace period is cleared automatically when StartAll succeeds (via ClearLivenessSuppression) or when the deadline expires.
func (*HealthManager) SuppressReadiness ¶ added in v0.0.65
func (hm *HealthManager) SuppressReadiness(d time.Duration)
SuppressReadiness makes ReadinessCheck pass unconditionally for the given duration. Use this at startup so the pod can become ready while waiting for leader election. The grace period is cleared automatically when collectors start (via ClearReadinessSuppression) or when the deadline expires.
func (*HealthManager) UpdateStatus ¶
func (hm *HealthManager) UpdateStatus( name string, status HealthStatus, message string, metadata map[string]string, )
UpdateStatus updates the health status, message, and metadata for a component. If the new status differs from the previous one, the registered TransitionObserver (if any) is invoked outside the lock with old and new status.
type HealthResponse ¶ added in v0.0.63
type HealthResponse struct {
Status string `json:"status"`
Error string `json:"error,omitempty"`
Components map[string]ComponentResponse `json:"components,omitempty"`
}
HealthResponse represents the JSON structure for /healthz and /readyz responses
type HealthServer ¶ added in v0.0.63
type HealthServer struct {
// contains filtered or unexported fields
}
HealthServer serves health and readiness endpoints
func NewHealthServer ¶ added in v0.0.63
func NewHealthServer(manager *HealthManager, addr string) *HealthServer
NewHealthServer creates a new HealthServer bound to the specified address
func (*HealthServer) Start ¶ added in v0.0.63
func (s *HealthServer) Start() error
Start begins serving health endpoints
type HealthStatus ¶
type HealthStatus int
HealthStatus matches proto enum for easy mapping
const ( HealthStatusUnspecified HealthStatus = iota HealthStatusHealthy HealthStatusDegraded HealthStatusUnhealthy )
HealthStatus values
func (HealthStatus) String ¶ added in v0.0.63
func (s HealthStatus) String() string
String returns a human-readable representation of the HealthStatus
type NodeOperatorMonitor ¶ added in v0.0.68
type NodeOperatorMonitor struct {
// contains filtered or unexported fields
}
func NewNodeOperatorMonitor ¶ added in v0.0.68
func NewNodeOperatorMonitor(logger logr.Logger, clientset kubernetes.Interface, httpClient *http.Client) *NodeOperatorMonitor
func (*NodeOperatorMonitor) BuildNodeOperatorReport ¶ added in v0.0.68
func (m *NodeOperatorMonitor) BuildNodeOperatorReport(ctx context.Context) (map[string]ComponentStatus, string, string, time.Time)
type TransitionObserver ¶ added in v0.0.77
type TransitionObserver func(component string, oldStatus, newStatus HealthStatus, message string, metadata map[string]string)
TransitionObserver is invoked whenever a component's status changes via UpdateStatus. It is called outside the HealthManager lock so observers may safely call back into the manager (e.g. to read other component statuses) without deadlocking. The observer is invoked synchronously, but the typical implementation should hand off to a queue so the UpdateStatus call site stays fast.