metrics

package

v2.1.0-beta.1 Latest Latest Go to latest Published: Apr 21, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/pingcap/tidb-operator

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func ClearInstanceConditionMetrics(obj client.Object)
func ObserveCondition(obj client.Object, conds []metav1.Condition, condType string)

Constants ¶

This section is empty.

Variables ¶

View Source

var (

	// ControllerPanic is a counter to record the number of panics in the controller.
	ControllerPanic = prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Namespace: "tidb_operator",
			Subsystem: "controller",
			Name:      "panic_total",
			Help:      "The total number of panics in the controller",
		}, []string{},
	)

	// AbnormalInstance is 1 when the named condition on the instance is False
	// (abnormal), 0 otherwise. The series stays present while the operator
	// manages the instance and is removed only when the instance is finalized.
	//
	// Use `metric == 1` together with PromQL `for: <duration>` to alert on
	// instances stuck in an abnormal state, e.g. a rolling restart that cannot
	// converge or a pod that is up but cannot serve.
	AbnormalInstance = prometheus.NewGaugeVec(
		prometheus.GaugeOpts{
			Namespace: "tidb_operator",
			Name:      "abnormal_instance",
			Help: "1 when the named condition on the instance is False, 0 otherwise. " +
				"Use `metric == 1` with PromQL `for: <duration>` to alert on stuck state.",
		}, InstanceAbnormalMetricLabels,
	)
)

View Source

var InstanceAbnormalMetricLabels = []string{"namespace", "cluster", "component", "group", "instance", "condition"}

InstanceAbnormalMetricLabels is the canonical label order for the per-instance abnormal-condition gauge. Keep in sync with WithLabelValues / DeleteLabelValues callers.

Functions ¶

func ClearInstanceConditionMetrics ¶

func ClearInstanceConditionMetrics(obj client.Object)

ClearInstanceConditionMetrics removes every tracked-condition series for the given instance.

Called from TaskInstanceFinalizerDel after the finalizer is removed, so every component that uses the standard finalize task is covered without per-builder wiring. Component builders short-circuit the deletion path with task.IfBreak around CondClusterIsDeleting / CondObjectIsDeleting, so the normal TaskInstanceConditionSynced / TaskInstanceConditionReady tasks (where ObserveCondition lives) never run during finalization; without this explicit cleanup, the gauge series would stay present at its last value forever, triggering false-positive `metric == 1 for: <duration>` alerts on a non-existent instance and growing label cardinality across each cluster lifecycle.

func ObserveCondition ¶

func ObserveCondition(obj client.Object, conds []metav1.Condition, condType string)

ObserveCondition writes 1 to the abnormal-instance gauge when the named condition is False; 0 otherwise (True or absent are treated as healthy). The series stays present so PromQL `for:` alerts can fire reliably without gaps, and so dashboards never see missing samples for managed instances.

condType must be one of trackedConditions so the finalize-time cleanup in ClearInstanceConditionMetrics covers the same set of series this writes.

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL