Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ControllerPanic is a counter to record the number of panics in the controller. ControllerPanic = prometheus.NewCounterVec( prometheus.CounterOpts{ Namespace: "tidb_operator", Subsystem: "controller", Name: "panic_total", Help: "The total number of panics in the controller", }, []string{}, ) // AbnormalInstance is 1 when the named condition on the instance is False // (abnormal), 0 otherwise. The series stays present while the operator // manages the instance and is removed only when the instance is finalized. // // Use `metric == 1` together with PromQL `for: <duration>` to alert on // instances stuck in an abnormal state, e.g. a rolling restart that cannot // converge or a pod that is up but cannot serve. AbnormalInstance = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Namespace: "tidb_operator", Name: "abnormal_instance", Help: "1 when the named condition on the instance is False, 0 otherwise. " + "Use `metric == 1` with PromQL `for: <duration>` to alert on stuck state.", }, InstanceAbnormalMetricLabels, ) )
var InstanceAbnormalMetricLabels = []string{"namespace", "cluster", "component", "group", "instance", "condition"}
InstanceAbnormalMetricLabels is the canonical label order for the per-instance abnormal-condition gauge. Keep in sync with WithLabelValues / DeleteLabelValues callers.
Functions ¶
func ClearInstanceConditionMetrics ¶
ClearInstanceConditionMetrics removes every tracked-condition series for the given instance.
Called from TaskInstanceFinalizerDel after the finalizer is removed, so every component that uses the standard finalize task is covered without per-builder wiring. Component builders short-circuit the deletion path with task.IfBreak around CondClusterIsDeleting / CondObjectIsDeleting, so the normal TaskInstanceConditionSynced / TaskInstanceConditionReady tasks (where ObserveCondition lives) never run during finalization; without this explicit cleanup, the gauge series would stay present at its last value forever, triggering false-positive `metric == 1 for: <duration>` alerts on a non-existent instance and growing label cardinality across each cluster lifecycle.
func ObserveCondition ¶
ObserveCondition writes 1 to the abnormal-instance gauge when the named condition is False; 0 otherwise (True or absent are treated as healthy). The series stays present so PromQL `for:` alerts can fire reliably without gaps, and so dashboards never see missing samples for managed instances.
condType must be one of trackedConditions so the finalize-time cleanup in ClearInstanceConditionMetrics covers the same set of series this writes.
Types ¶
This section is empty.