metrics

package
v2.1.0-beta.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (

	// ControllerPanic is a counter to record the number of panics in the controller.
	ControllerPanic = prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Namespace: "tidb_operator",
			Subsystem: "controller",
			Name:      "panic_total",
			Help:      "The total number of panics in the controller",
		}, []string{},
	)

	// AbnormalInstance is 1 when the named condition on the instance is False
	// (abnormal), 0 otherwise. The series stays present while the operator
	// manages the instance and is removed only when the instance is finalized.
	//
	// Use `metric == 1` together with PromQL `for: <duration>` to alert on
	// instances stuck in an abnormal state, e.g. a rolling restart that cannot
	// converge or a pod that is up but cannot serve.
	AbnormalInstance = prometheus.NewGaugeVec(
		prometheus.GaugeOpts{
			Namespace: "tidb_operator",
			Name:      "abnormal_instance",
			Help: "1 when the named condition on the instance is False, 0 otherwise. " +
				"Use `metric == 1` with PromQL `for: <duration>` to alert on stuck state.",
		}, InstanceAbnormalMetricLabels,
	)
)
View Source
var InstanceAbnormalMetricLabels = []string{"namespace", "cluster", "component", "group", "instance", "condition"}

InstanceAbnormalMetricLabels is the canonical label order for the per-instance abnormal-condition gauge. Keep in sync with WithLabelValues / DeleteLabelValues callers.

Functions

func ClearInstanceConditionMetrics

func ClearInstanceConditionMetrics(obj client.Object)

ClearInstanceConditionMetrics removes every tracked-condition series for the given instance.

Called from TaskInstanceFinalizerDel after the finalizer is removed, so every component that uses the standard finalize task is covered without per-builder wiring. Component builders short-circuit the deletion path with task.IfBreak around CondClusterIsDeleting / CondObjectIsDeleting, so the normal TaskInstanceConditionSynced / TaskInstanceConditionReady tasks (where ObserveCondition lives) never run during finalization; without this explicit cleanup, the gauge series would stay present at its last value forever, triggering false-positive `metric == 1 for: <duration>` alerts on a non-existent instance and growing label cardinality across each cluster lifecycle.

func ObserveCondition

func ObserveCondition(obj client.Object, conds []metav1.Condition, condType string)

ObserveCondition writes 1 to the abnormal-instance gauge when the named condition is False; 0 otherwise (True or absent are treated as healthy). The series stays present so PromQL `for:` alerts can fire reliably without gaps, and so dashboards never see missing samples for managed instances.

condType must be one of trackedConditions so the finalize-time cleanup in ClearInstanceConditionMetrics covers the same set of series this writes.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL