metrics

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 25, 2026 License: MIT Imports: 6 Imported by: 0

README

ChronoQueue Metrics Package

This package provides comprehensive Prometheus metrics for monitoring ChronoQueue operations.

Structure

The metrics are organized into separate files by domain:

Core Files
  • metrics.go - Base infrastructure, HTTP/gRPC metrics, and legacy business metrics
  • message_metrics.go - Message lifecycle tracking (state transitions, processing latency)
  • dlq_metrics.go - Dead letter queue health metrics
  • lease_metrics.go - Lease renewal and expiration tracking
  • schedule_metrics.go - Scheduler performance and execution metrics
  • database_metrics.go - Database query performance and connection pool metrics
  • background_metrics.go - Background service health metrics

Metrics Categories

Message Lifecycle Metrics

State Transitions (chronoqueue_message_state_transitions_total)

  • Tracks all state changes: INVISIBLE→PENDING, PENDING→RUNNING, RUNNING→COMPLETED, etc.
  • Labels: queue_name, from_state, to_state

Message Counts by State (chronoqueue_messages_by_state)

  • Current count of messages in each state per queue
  • Labels: queue_name, state
  • Update periodically via GetQueueState()

Claim Latency (chronoqueue_message_claim_duration_seconds)

  • Time to claim a message from queue
  • Labels: queue_name
  • Histogram buckets: 1ms to 5s

Processing Duration (chronoqueue_message_processing_duration_seconds)

  • End-to-end processing time (claim to acknowledgment)
  • Labels: queue_name
  • Histogram buckets: 100ms to 10min
DLQ Metrics

DLQ Message Count (chronoqueue_dlq_messages_total)

  • Current number of messages in each DLQ
  • Labels: dlq_name, source_queue

DLQ Ingestion Rate (chronoqueue_dlq_ingestion_total)

  • Messages moved to DLQ with failure reason
  • Labels: dlq_name, source_queue, reason
  • Reasons: max_attempts, lease_timeout, nack

DLQ Retry Count (chronoqueue_dlq_retry_total)

  • Messages retried from DLQ back to source queue
  • Labels: dlq_name, destination_queue
Lease Management Metrics

Lease Renewals (chronoqueue_lease_renewals_total)

  • Renewal attempts and outcomes
  • Labels: queue_name, status
  • Status: success, denied_max_renewals, failed

Lease Expirations (chronoqueue_lease_expirations_total)

  • Messages reclaimed due to expired leases
  • Labels: queue_name, expiry_type
  • Types: lease, heartbeat

Heartbeat Timeouts (chronoqueue_heartbeat_timeouts_total)

  • Heartbeat timeout events
  • Labels: queue_name
Schedule Metrics

Schedule Executions (chronoqueue_schedule_executions_total)

  • Schedule trigger events
  • Labels: schedule_id, queue_name, status

Schedule Activations (chronoqueue_schedule_activations_total)

  • Messages activated from INVISIBLE to PENDING
  • Labels: queue_name

Schedule Lag (chronoqueue_schedule_lag_seconds)

  • How far behind schedule the scheduler is running
  • Labels: queue_name
  • Negative = ahead, Positive = behind
Database Metrics

Query Duration (chronoqueue_db_query_duration_seconds)

  • Individual query execution time
  • Labels: backend (sqlite/postgres), operation
  • Histogram buckets: 0.1ms to 1s

Transaction Duration (chronoqueue_db_transaction_duration_seconds)

  • Complete transaction execution time
  • Labels: backend, operation
  • Histogram buckets: 1ms to 5s

Connection Pool Stats

  • chronoqueue_db_connections_active - Active connections
  • chronoqueue_db_connections_idle - Idle connections
  • chronoqueue_db_connections_wait - Connections waiting for availability
  • Labels: backend
Background Service Metrics

Service Iterations (chronoqueue_background_service_iterations_total)

  • Iteration count per service
  • Labels: service (scheduler/reclaim), status (success/error)

Processed Messages (chronoqueue_background_service_processed_messages_total)

  • Messages handled by each service
  • Labels: service, queue_name

Iteration Duration (chronoqueue_background_service_iteration_duration_seconds)

  • How long each iteration takes
  • Labels: service
  • Histogram buckets: 10ms to 30s

Usage Examples

Recording Message State Transitions
import "github.com/adrien19/chronoqueue/pkg/metrics"

// When claiming a message
metrics.RecordStateTransition(queueName, "PENDING", "RUNNING")

// When acknowledging
metrics.RecordStateTransition(queueName, "RUNNING", "COMPLETED")
Recording DLQ Operations
// Message moved to DLQ after max attempts
metrics.IncrementDLQIngestion(dlqName, sourceQueue, "max_attempts")

// Message retried from DLQ
metrics.IncrementDLQRetry(dlqName, targetQueue)
Recording Lease Operations
// Successful lease renewal
metrics.IncrementLeaseRenewals(queueName, "success")

// Lease renewal denied due to max_renewals limit
metrics.IncrementLeaseRenewals(queueName, "denied_max_renewals")

// Lease expired and message reclaimed
metrics.IncrementLeaseExpirations(queueName, "lease")
Recording Database Operations
start := time.Now()
// ... execute query ...
metrics.ObserveDBQuery("sqlite", "claim_message", time.Since(start))
Recording Background Service Work
// In scheduler service
metrics.IncrementBackgroundServiceIterations("scheduler", "success")
metrics.IncrementBackgroundServiceProcessedMessages("scheduler", queueName)

Prometheus Queries

Message Processing Rate
rate(chronoqueue_messages_enqueued_total[5m])
rate(chronoqueue_messages_dequeued_total[5m])
P95 Message Claim Latency
histogram_quantile(0.95, rate(chronoqueue_message_claim_duration_seconds_bucket[5m]))
DLQ Health
# Total messages in all DLQs
sum(chronoqueue_dlq_messages_total)

# DLQ ingestion rate by reason
rate(chronoqueue_dlq_ingestion_total[5m]) by (reason)
Lease Expiration Rate
rate(chronoqueue_lease_expirations_total[5m]) by (expiry_type)
Scheduler Performance
# Scheduler lag
chronoqueue_schedule_lag_seconds

# Activation rate
rate(chronoqueue_schedule_activations_total[5m])
Database Performance
# P99 query latency
histogram_quantile(0.99, rate(chronoqueue_db_query_duration_seconds_bucket[5m])) by (operation)

# Connection pool utilization
chronoqueue_db_connections_active / (chronoqueue_db_connections_active + chronoqueue_db_connections_idle)

Alerting Examples

# High DLQ ingestion rate
- alert: HighDLQIngestionRate
  expr: rate(chronoqueue_dlq_ingestion_total[5m]) > 10
  for: 5m
  annotations:
    summary: "High DLQ ingestion for {{ $labels.source_queue }}"

# Scheduler falling behind
- alert: SchedulerLag
  expr: chronoqueue_schedule_lag_seconds > 60
  for: 2m
  annotations:
    summary: "Scheduler >60s behind for {{ $labels.queue_name }}"

# High lease expiration rate
- alert: HighLeaseExpirationRate
  expr: rate(chronoqueue_lease_expirations_total[5m]) > 5
  for: 3m
  annotations:
    summary: "High lease expiration for {{ $labels.queue_name }}"

# Database performance degradation
- alert: SlowDatabaseQueries
  expr: histogram_quantile(0.95, rate(chronoqueue_db_query_duration_seconds_bucket[5m])) > 0.5
  for: 5m
  annotations:
    summary: "P95 query latency >500ms for {{ $labels.operation }}"

Integration Points

The repository layer should call these metric functions at appropriate points:

  1. Message operations - EnqueueMessage, ClaimMessage, AcknowledgeMessage, NackMessage
  2. Lease operations - ExtendMessageLease
  3. DLQ operations - RetryDLQMessage, DeleteDLQMessage
  4. Background services - Scheduler and Reclaim service iterations
  5. Periodic updates - Queue state, DLQ counts, connection pool stats

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DecrementQueuesTotal

func DecrementQueuesTotal()

DecrementQueuesTotal decrements the total queue count

func HTTPMetricsMiddleware

func HTTPMetricsMiddleware(handler http.Handler) http.Handler

HTTPMetricsMiddleware wraps an HTTP handler with Prometheus metrics

func IncrementBackgroundServiceIterations added in v1.2.0

func IncrementBackgroundServiceIterations(service, status string)

IncrementBackgroundServiceIterations records a background service iteration Service should be: "scheduler" or "reclaim" Status should be: "success" or "error"

func IncrementBackgroundServiceProcessedMessages added in v1.2.0

func IncrementBackgroundServiceProcessedMessages(service, queueName string)

IncrementBackgroundServiceProcessedMessages records a message processed by a background service Service should be: "scheduler" or "reclaim"

func IncrementCronScheduleExecutions added in v1.2.0

func IncrementCronScheduleExecutions(scheduleID, queueName string)

IncrementCronScheduleExecutions records a cron schedule execution success

func IncrementCronScheduleFailures added in v1.2.0

func IncrementCronScheduleFailures(scheduleID, queueName string)

IncrementCronScheduleFailures records a cron schedule failure

func IncrementDLQIngestion added in v1.2.0

func IncrementDLQIngestion(dlqName, sourceQueue, reason string)

IncrementDLQIngestion records a message being moved to DLQ Reason should be: "max_attempts", "lease_timeout", or "nack"

func IncrementDLQRetry added in v1.2.0

func IncrementDLQRetry(dlqName, destinationQueue string)

IncrementDLQRetry records a message being retried from DLQ Call this when RetryDLQMessage succeeds

func IncrementHeartbeatTimeouts added in v1.2.0

func IncrementHeartbeatTimeouts(queueName string)

IncrementHeartbeatTimeouts records a heartbeat timeout event Call this when reclaim service finds a message with expired heartbeat_expiry

func IncrementLeaseExpirations added in v1.2.0

func IncrementLeaseExpirations(queueName, expiryType string)

IncrementLeaseExpirations records a message being reclaimed due to lease/heartbeat expiry Expiry type should be: "lease" or "heartbeat"

func IncrementLeaseRenewals added in v1.2.0

func IncrementLeaseRenewals(queueName, status string)

IncrementLeaseRenewals records a lease renewal attempt Status should be: "success", "denied_max_renewals", or "failed"

func IncrementMessagesDequeued

func IncrementMessagesDequeued(queueName string)

IncrementMessagesDequeued increments dequeued message count for a queue

func IncrementMessagesEnqueued

func IncrementMessagesEnqueued(queueName string)

IncrementMessagesEnqueued increments enqueued message count for a queue

func IncrementMessagesValidated added in v1.2.0

func IncrementMessagesValidated(queueName string)

IncrementMessagesValidated increments validated message count for a queue

func IncrementQueuesTotal

func IncrementQueuesTotal()

IncrementQueuesTotal increments the total queue count

func IncrementScheduleActivations added in v1.2.0

func IncrementScheduleActivations(queueName string)

IncrementScheduleActivations records a message being activated by the scheduler Call this when scheduler moves a message from INVISIBLE to PENDING

func IncrementScheduleExecutions added in v1.2.0

func IncrementScheduleExecutions(scheduleID, queueName, status string)

IncrementScheduleExecutions records a schedule execution attempt Status should be: "success" or "failed"

func IncrementValidationFailures added in v1.2.0

func IncrementValidationFailures(queueName string, reason string)

IncrementValidationFailures increments validation failure count for a queue with reason

func ObserveBackgroundServiceIterationDuration added in v1.2.0

func ObserveBackgroundServiceIterationDuration(service string, durationSeconds float64)

ObserveBackgroundServiceIterationDuration records how long a service iteration took Service should be: "scheduler" or "reclaim"

func ObserveDBQuery added in v1.2.0

func ObserveDBQuery(backend, operation string, duration time.Duration)

ObserveDBQuery records a database query execution time Backend should be: "sqlite" or "postgres" Operation should be: "claim_message", "enqueue_message", etc.

func ObserveDBTransaction added in v1.2.0

func ObserveDBTransaction(backend, operation string, duration time.Duration)

ObserveDBTransaction records a database transaction execution time This should be called for the entire transaction (begin to commit/rollback)

func ObserveMessageClaimLatency added in v1.2.0

func ObserveMessageClaimLatency(queueName string, duration time.Duration)

ObserveMessageClaimLatency records how long it took to claim a message Call this after ClaimMessage operation completes

func ObserveMessageProcessingDuration added in v1.2.0

func ObserveMessageProcessingDuration(queueName string, duration time.Duration)

ObserveMessageProcessingDuration records how long a message took to process This requires tracking claim time and comparing to ack time

func RecordGRPCMetrics

func RecordGRPCMetrics(method string, duration time.Duration, err error)

RecordGRPCMetrics records gRPC request metrics (called from the interceptor)

func RecordMessagesCleanedUp added in v1.2.0

func RecordMessagesCleanedUp(count int64)

RecordMessagesCleanedUp records the number of messages permanently deleted by cleanup service.

func RecordStateTransition added in v1.2.0

func RecordStateTransition(queueName, fromState, toState string)

RecordStateTransition records a message state change Use this whenever a message transitions between states

func SetDBConnectionsActive added in v1.2.0

func SetDBConnectionsActive(backend string, count float64)

SetDBConnectionsActive sets the current count of active database connections Call this periodically using sql.DB.Stats().OpenConnections

func SetDBConnectionsIdle added in v1.2.0

func SetDBConnectionsIdle(backend string, count float64)

SetDBConnectionsIdle sets the current count of idle database connections Call this periodically using sql.DB.Stats().Idle

func SetDBConnectionsWait added in v1.2.0

func SetDBConnectionsWait(backend string, count float64)

SetDBConnectionsWait sets the current count of connections waiting Call this periodically using sql.DB.Stats().WaitCount

func SetDLQMessagesTotal added in v1.2.0

func SetDLQMessagesTotal(dlqName, sourceQueue string, count float64)

SetDLQMessagesTotal sets the current count of messages in a DLQ Call this after DLQ operations or periodically via GetDLQStats

func SetMessagesByState added in v1.2.0

func SetMessagesByState(queueName, state string, count float64)

SetMessagesByState updates the count of messages in a specific state for a queue This should be called periodically (e.g., every 30s) by querying GetQueueState

func SetMessagesPending

func SetMessagesPending(queueName string, count float64)

SetMessagesPending sets the pending message count for a queue

func SetQueuesTotal

func SetQueuesTotal(count float64)

SetQueuesTotal sets the total queue count

func SetScheduleLag added in v1.2.0

func SetScheduleLag(queueName string, lagSeconds float64)

SetScheduleLag updates the scheduler lag metric for a queue lagSeconds should be: current_time - scheduled_time Negative values are normal (scheduler runs slightly ahead) Positive values indicate scheduler is behind schedule

Types

type MetricsRegistry

type MetricsRegistry struct {
	// contains filtered or unexported fields
}

MetricsRegistry holds the Prometheus registry and provides methods for metrics

func NewMetricsRegistry

func NewMetricsRegistry() *MetricsRegistry

NewMetricsRegistry creates a new metrics registry with all ChronoQueue metrics

func (*MetricsRegistry) Handler

func (m *MetricsRegistry) Handler() http.Handler

Handler returns an HTTP handler for Prometheus metrics

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL