metrics

package

v0.1.0-alpha.8 Latest Latest Go to latest Published: Dec 30, 2025 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

gitlab.com/haproxy-haptic/haptic

Links

Open Source Insights

README ¶

pkg/controller/metrics - Controller Domain Metrics

Domain-specific Prometheus metrics for the HAProxy Template Ingress Controller.

Overview

This package provides controller-specific metrics and an event adapter component that translates controller events into Prometheus metric updates.

Architecture:

metrics.go - Defines controller-specific Prometheus metrics
component.go - Event adapter that subscribes to controller events and updates metrics

Metrics

Reconciliation Metrics

Track reconciliation cycle performance and errors.

haptic_reconciliation_total (counter)

Total number of reconciliation cycles triggered
Increments on both successful and failed reconciliations

haptic_reconciliation_duration_seconds (histogram)

Time spent in reconciliation cycles
Buckets: 10ms to 10s (see pkg/metrics.DurationBuckets)

haptic_reconciliation_errors_total (counter)

Total number of failed reconciliation cycles
Increments when reconciliation fails due to template errors, validation failures, etc.

Example Queries:

# Reconciliation rate per second
rate(haptic_reconciliation_total[5m])

# Average reconciliation duration
rate(haptic_reconciliation_duration_seconds_sum[5m]) /
rate(haptic_reconciliation_duration_seconds_count[5m])

# Error rate
rate(haptic_reconciliation_errors_total[5m])

# Success rate percentage
100 * (1 - (
  rate(haptic_reconciliation_errors_total[5m]) /
  rate(haptic_reconciliation_total[5m])
))

Deployment Metrics

Track HAProxy configuration deployment performance.

haptic_deployment_total (counter)

Total number of deployment attempts
Increments regardless of success/failure

haptic_deployment_duration_seconds (histogram)

Time spent deploying configurations to HAProxy instances
Buckets: 10ms to 10s

haptic_deployment_errors_total (counter)

Total number of failed deployments
Increments when deployment to at least one instance fails

Example Queries:

# Deployment rate
rate(haptic_deployment_total[5m])

# 95th percentile deployment latency
histogram_quantile(0.95, rate(haptic_deployment_duration_seconds_bucket[5m]))

# Failed deployment rate
rate(haptic_deployment_errors_total[5m])

Validation Metrics

Track configuration validation performance.

haptic_validation_total (counter)

Total number of validation attempts
Increments for both successful and failed validations

haptic_validation_errors_total (counter)

Total number of failed validations
Increments when configuration has syntax errors or validation warnings

Example Queries:

# Validation rate
rate(haptic_validation_total[5m])

# Validation error rate
rate(haptic_validation_errors_total[5m])

# Validation success rate
100 * (1 - (
  rate(haptic_validation_errors_total[5m]) /
  rate(haptic_validation_total[5m])
))

Resource Metrics

Track Kubernetes resources being watched.

haptic_resource_count (gauge with type label)

Current number of resources indexed by type
Labels: type (e.g., "ingresses", "services", "endpoints", "haproxy-pods")
Updates on every index change

Example Queries:

# Current resource counts
haptic_resource_count

# Ingress count
haptic_resource_count{type="ingresses"}

# Resource count over time
haptic_resource_count{type="services"}[1h]

Event Metrics

Track event bus activity.

haptic_event_subscribers (gauge)

Current number of active event subscribers
Reflects component health (subscribers should remain constant)

haptic_events_published_total (counter)

Total number of events published to the event bus
Indicates overall controller activity level

Example Queries:

# Event publishing rate
rate(haptic_events_published_total[5m])

# Current subscribers (should be constant)
haptic_event_subscribers

# Subscriber changes (indicator of component restarts)
delta(haptic_event_subscribers[5m])

Leader Election Metrics

Track leadership status and transitions for high availability deployments.

haptic_leader_election_is_leader (gauge)

Indicates if this replica is currently the leader
Values: 1 (leader), 0 (follower)
Only one replica should report 1 across all controller instances

haptic_leader_election_transitions_total (counter)

Total number of leadership transitions (becoming leader or losing leadership)
Increments on both gain and loss of leadership
Frequent transitions may indicate cluster instability

haptic_leader_election_time_as_leader_seconds_total (counter)

Cumulative time this replica has spent as leader (in seconds)
Updates when losing leadership
Useful for understanding leadership distribution

Example Queries:

# Current leader count (should be 1 across all replicas)
sum(haptic_leader_election_is_leader)

# Leadership transition rate
rate(haptic_leader_election_transitions_total[1h])

# Average time as leader per transition
haptic_leader_election_time_as_leader_seconds_total /
haptic_leader_election_transitions_total

# Identify current leader pod
haptic_leader_election_is_leader{pod=~".*"} == 1

# Alert on split-brain (multiple leaders)
sum(haptic_leader_election_is_leader) > 1

# Alert on no leader
sum(haptic_leader_election_is_leader) < 1

# Alert on frequent leadership changes (> 5 per hour)
rate(haptic_leader_election_transitions_total[1h]) > 5

Operational Notes:

In single-replica deployments (leader election disabled), metrics still exist
Normal failover causes 1 transition (old leader loses, new leader gains)
High transition rates may indicate: clock skew, network issues, or resource contention
Leadership distribution should be relatively balanced over time

Component Architecture

Metrics Struct

Holds all controller-specific Prometheus metrics:

type Metrics struct {
    ReconciliationDuration prometheus.Histogram
    ReconciliationTotal    prometheus.Counter
    ReconciliationErrors   prometheus.Counter
    DeploymentDuration     prometheus.Histogram
    DeploymentTotal        prometheus.Counter
    DeploymentErrors       prometheus.Counter
    ValidationTotal        prometheus.Counter
    ValidationErrors       prometheus.Counter
    ResourceCount          *prometheus.GaugeVec  // type label
    EventSubscribers       prometheus.Gauge
    EventsPublished        prometheus.Counter
    LeaderElectionIsLeader       prometheus.Gauge
    LeaderElectionTransitionsTotal prometheus.Counter
    LeaderElectionTimeAsLeaderSeconds prometheus.Counter
}

Component (Event Adapter)

Subscribes to controller events and updates metrics accordingly:

type Component struct {
    metrics        *Metrics
    eventBus       *events.EventBus
    eventChan      <-chan events.Event
    resourceCounts map[string]int  // Track resource counts
}

Lifecycle:

Create component: NewComponent(metrics, eventBus)
Subscribe to events: component.Start()
Start event loop: go component.Run(ctx)
Stop on context cancellation

Usage

Basic Setup

import (
    "github.com/prometheus/client_golang/prometheus"
    "haptic/pkg/controller/metrics"
    "haptic/pkg/events"
)

// Create instance-based registry
registry := prometheus.NewRegistry()

// Create controller metrics
domainMetrics := metrics.NewMetrics(registry)

// Create event bus
bus := events.NewEventBus(100)

// Create metrics component (event adapter)
metricsComponent := metrics.New(domainMetrics, bus)

// Subscribe before starting event bus (prevents race)
metricsComponent.Start()

// Start event bus
bus.Start()

// Start metrics component event loop
go metricsComponent.Run(ctx)

Direct Metric Updates

You can also update metrics directly (without events):

// Record reconciliation
metrics.RecordReconciliation(durationMs, success)

// Record deployment
metrics.RecordDeployment(durationMs, success)

// Record validation
metrics.RecordValidation(success)

// Update resource count
metrics.SetResourceCount("ingresses", 42)

// Update event subscribers
metrics.SetEventSubscribers(10)

// Record event published
metrics.RecordEvent()

Event-Driven Updates

The component automatically updates metrics based on these events:

Reconciliation Events:

ReconciliationCompletedEvent → Increments total, records duration
ReconciliationFailedEvent → Increments total and errors

Deployment Events:

DeploymentCompletedEvent → Increments total, records duration
InstanceDeploymentFailedEvent → Increments total and errors

Validation Events:

ValidationCompletedEvent → Increments total (success)
ValidationFailedEvent → Increments total and errors

Resource Events:

IndexSynchronizedEvent → Initializes resource counts
ResourceIndexUpdatedEvent → Updates resource counts incrementally

Leader Election Events:

BecameLeaderEvent → Sets is_leader to 1, increments transitions, starts time tracking
LostLeadershipEvent → Sets is_leader to 0, increments transitions, records time as leader

Testing

Metrics Tests

Test metric creation and updates:

func TestMetrics_RecordReconciliation(t *testing.T) {
    registry := prometheus.NewRegistry()
    metrics := New(registry)

    // Record successful reconciliation
    metrics.RecordReconciliation(1500, true)

    assert.Equal(t, 1.0, testutil.ToFloat64(metrics.ReconciliationTotal))
    assert.Equal(t, 0.0, testutil.ToFloat64(metrics.ReconciliationErrors))
}

Component Tests

Test event-driven metric updates:

func TestComponent_ReconciliationEvents(t *testing.T) {
    registry := prometheus.NewRegistry()
    metrics := New(registry)
    eventBus := events.NewEventBus(100)

    component := NewComponent(metrics, eventBus)

    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    component.Start()
    eventBus.Start()
    go component.Run(ctx)

    // Publish event
    eventBus.Publish(events.NewReconciliationCompletedEvent(1500))

    time.Sleep(100 * time.Millisecond)

    // Verify metrics updated
    assert.Equal(t, 1.0, testutil.ToFloat64(metrics.ReconciliationTotal))
}

Alerting Examples

Prometheus Alerts

groups:
  - name: haptic
    rules:
      - alert: HighReconciliationErrorRate
        expr: |
          rate(haptic_reconciliation_errors_total[5m]) /
          rate(haptic_reconciliation_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High reconciliation error rate (>10%)"

      - alert: HighDeploymentLatency
        expr: |
          histogram_quantile(0.95,
            rate(haptic_deployment_duration_seconds_bucket[5m])
          ) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "95th percentile deployment latency >5s"

      - alert: ValidationFailures
        expr: |
          rate(haptic_validation_errors_total[5m]) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Configuration validation failing"

      - alert: ComponentStopped
        expr: |
          delta(haptic_event_subscribers[5m]) < 0
        labels:
          severity: critical
        annotations:
          summary: "Event subscriber count decreased (component crash?)"

Dashboard Examples

Grafana Queries

Reconciliation Performance:

# Reconciliation rate
rate(haptic_reconciliation_total[5m])

# Success rate
100 * (1 - (
  rate(haptic_reconciliation_errors_total[5m]) /
  rate(haptic_reconciliation_total[5m])
))

# P50, P95, P99 latencies
histogram_quantile(0.50, rate(haptic_reconciliation_duration_seconds_bucket[5m]))
histogram_quantile(0.95, rate(haptic_reconciliation_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(haptic_reconciliation_duration_seconds_bucket[5m]))

Deployment Performance:

# Deployment rate
rate(haptic_deployment_total[5m])

# Average deployment duration
rate(haptic_deployment_duration_seconds_sum[5m]) /
rate(haptic_deployment_duration_seconds_count[5m])

# Deployment success rate
100 * (1 - (
  rate(haptic_deployment_errors_total[5m]) /
  rate(haptic_deployment_total[5m])
))

Resource Tracking:

# All resource counts
haptic_resource_count

# Ingress count
haptic_resource_count{type="ingresses"}

# HAProxy pod count
haptic_resource_count{type="haproxy-pods"}

Best Practices

DO

✅ Record duration for all async operations
✅ Increment error counters for all failure cases
✅ Update resource counts on every index change
✅ Keep metrics simple and focused
✅ Use histogram for latency, counter for totals

DON'T

❌ Create metrics with unbounded labels (e.g., pod names)
❌ Skip error tracking (every failure should increment error counter)
❌ Use gauges for cumulative values (use counters instead)
❌ Update metrics manually in business logic (use events)

Architecture Integration

This package integrates with the controller architecture:

Pure metrics (metrics.go) - No event dependencies
Event adapter (component.go) - Bridges events to metrics
Controller orchestration (pkg/controller) - Wires everything together

Resources

Development context: pkg/controller/metrics/CLAUDE.md
Generic metrics utilities: pkg/metrics/README.md
Controller events: pkg/controller/events/types.go
Prometheus documentation: https://prometheus.io/docs/

Documentation ¶

Index ¶

type Component
- func New(metrics *Metrics, eventBus *pkgevents.EventBus) *Component
- func (c *Component) Metrics() *Metrics
- func (c *Component) Start(ctx context.Context) error
type Metrics
- func NewMetrics(registry prometheus.Registerer) *Metrics

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Component ¶

type Component struct {
	// contains filtered or unexported fields
}

Component is an event-driven metrics collector.

Subscribes to controller events and updates metrics via the Metrics struct. This is an event adapter that bridges domain events to Prometheus metrics.

IMPORTANT: Instance-based, created fresh per application iteration. When the iteration ends (context cancelled), the component stops and the metrics it was updating become eligible for garbage collection.

func New ¶

func New(metrics *Metrics, eventBus *pkgevents.EventBus) *Component

New creates a new metrics component that listens to events.

Parameters:

metrics: The Metrics instance to update (created with metrics.NewMetrics)
eventBus: The EventBus to subscribe to for events

Usage:

registry := prometheus.NewRegistry()
metrics := metrics.NewMetrics(registry)
component := metrics.New(metrics, eventBus)
go component.Start(ctx)
eventBus.Start()

func (*Component) Metrics ¶

func (c *Component) Metrics() *Metrics

Metrics returns the underlying Metrics instance for direct access.

This allows other components (like webhook) to record metrics directly without going through the event bus.

func (*Component) Start ¶

func (c *Component) Start(ctx context.Context) error

Start begins the metrics event processing loop.

This method blocks until the context is cancelled.

type Metrics ¶

type Metrics struct {
	// Reconciliation metrics
	ReconciliationDuration prometheus.Histogram
	ReconciliationTotal    prometheus.Counter
	ReconciliationErrors   prometheus.Counter

	// Deployment metrics
	DeploymentDuration prometheus.Histogram
	DeploymentTotal    prometheus.Counter
	DeploymentErrors   prometheus.Counter

	// Validation metrics
	ValidationTotal  prometheus.Counter
	ValidationErrors prometheus.Counter

	// Validation test metrics
	ValidationTestsTotal     prometheus.Counter
	ValidationTestsPassTotal prometheus.Counter
	ValidationTestsFailTotal prometheus.Counter
	ValidationTestDuration   prometheus.Histogram

	// Resource metrics
	ResourceCount *prometheus.GaugeVec

	// Event metrics
	EventSubscribers prometheus.Gauge
	EventsPublished  prometheus.Counter

	// Webhook metrics
	WebhookRequestsTotal   *prometheus.CounterVec
	WebhookRequestDuration prometheus.Histogram
	WebhookValidationTotal *prometheus.CounterVec
	WebhookCertExpiry      prometheus.Gauge
	WebhookCertRotations   prometheus.Counter

	// Leader election metrics
	LeaderElectionIsLeader            prometheus.Gauge
	LeaderElectionTransitionsTotal    prometheus.Counter
	LeaderElectionTimeAsLeaderSeconds prometheus.Counter
}

Metrics holds all controller-specific Prometheus metrics.

IMPORTANT: Create one instance per application iteration. When the iteration ends (e.g., on config reload), metrics are garbage collected. This prevents stale state from surviving across reinitialization cycles.

func NewMetrics ¶

func NewMetrics(registry prometheus.Registerer) *Metrics

New creates all controller metrics and registers them with the provided registry.

IMPORTANT: Pass an instance-based registry (prometheus.NewRegistry()), NOT prometheus.DefaultRegisterer. Metrics are scoped to the registry's lifetime. When the registry is garbage collected (iteration ends), metrics are freed.

This is critical for supporting application reinitialization on configuration changes without leaking metrics or accumulating stale state.

Example:

registry := prometheus.NewRegistry()  // Create per iteration
metrics := metrics.NewMetrics(registry)  // Metrics tied to iteration
// ... use metrics ...
// When iteration ends, both registry and metrics are GC'd

func (*Metrics) AddTimeAsLeader ¶

func (m *Metrics) AddTimeAsLeader(seconds float64)

AddTimeAsLeader adds time spent as leader to the cumulative counter.

Parameters:

seconds: Time spent as leader in seconds

func (*Metrics) RecordDeployment ¶

func (m *Metrics) RecordDeployment(durationSeconds float64, success bool)

RecordDeployment records a deployment attempt.

Parameters:

durationSeconds: Time spent deploying (use time.Since(start).Seconds())
success: Whether the deployment completed successfully

func (*Metrics) RecordEvent ¶

func (m *Metrics) RecordEvent()

RecordEvent records an event publication. Call this for every event published to the EventBus.

func (*Metrics) RecordLeadershipTransition ¶

func (m *Metrics) RecordLeadershipTransition()

RecordLeadershipTransition records a leadership state change. Call this whenever leadership is gained or lost.

func (*Metrics) RecordReconciliation ¶

func (m *Metrics) RecordReconciliation(durationSeconds float64, success bool)

RecordReconciliation records a completed reconciliation cycle.

Parameters:

durationSeconds: Time spent in reconciliation (use time.Since(start).Seconds())
success: Whether the reconciliation completed successfully

func (*Metrics) RecordValidation ¶

func (m *Metrics) RecordValidation(success bool)

RecordValidation records a validation attempt.

Parameters:

success: Whether the validation passed

func (*Metrics) RecordValidationTests ¶

func (m *Metrics) RecordValidationTests(total, passed, failed int, durationSeconds float64)

RecordValidationTests records validation test execution results.

Parameters:

total: Total number of tests executed
passed: Number of tests that passed
failed: Number of tests that failed
durationSeconds: Time spent running tests (use time.Duration.Seconds())

func (*Metrics) RecordWebhookCertRotation ¶

func (m *Metrics) RecordWebhookCertRotation()

RecordWebhookCertRotation records a webhook certificate rotation.

func (*Metrics) RecordWebhookRequest ¶

func (m *Metrics) RecordWebhookRequest(gvk, result string, durationSeconds float64)

RecordWebhookRequest records a webhook admission request.

Parameters:

gvk: The GVK of the resource being validated (e.g., "v1.ConfigMap")
result: The result of the request ("allowed", "denied", or "error")
durationSeconds: Time spent processing the request

func (*Metrics) RecordWebhookValidation ¶

func (m *Metrics) RecordWebhookValidation(gvk, result string)

RecordWebhookValidation records a webhook validation result.

Parameters:

gvk: The GVK of the resource being validated
result: The validation result ("allowed", "denied", or "error")

func (*Metrics) SetEventSubscribers ¶

func (m *Metrics) SetEventSubscribers(count int)

SetEventSubscribers sets the number of active event subscribers.

Parameters:

count: The current number of event subscribers

func (*Metrics) SetIsLeader ¶

func (m *Metrics) SetIsLeader(isLeader bool)

SetIsLeader sets whether this replica is the leader.

Parameters:

isLeader: true if this replica is the leader, false otherwise

func (*Metrics) SetResourceCount ¶

func (m *Metrics) SetResourceCount(resourceType string, count int)

SetResourceCount sets the count for a specific resource type.

Parameters:

resourceType: The type of resource (e.g., "ingresses", "services")
count: The current number of resources of this type

func (*Metrics) SetWebhookCertExpiry ¶

func (m *Metrics) SetWebhookCertExpiry(expiryTime int64)

SetWebhookCertExpiry sets the webhook certificate expiry timestamp.

Parameters:

expiryTime: The time when the certificate expires

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL