Documentation
¶
Overview ¶
Package observability provides OpenTelemetry distributed tracing for cd-operator. This enables end-to-end visibility from PR discovery through deployment.
Index ¶
- Constants
- func AddEvent(span trace.Span, name string, message string)
- func ExtractSpanID(span trace.Span) string
- func ExtractTraceID(span trace.Span) string
- func GetTracer() trace.Tracer
- func InitTracing(ctx context.Context, cfg TracingConfig) (*sdktrace.TracerProvider, error)
- func NewMetricsWriter(writer io.Writer, metrics *Metrics) io.Writer
- func RecordError(span trace.Span, err error)
- func RecordSuccess(span trace.Span)
- func SetActionResult(span trace.Span, action string, success bool)
- func SetArgoCDAttributes(span trace.Span, appName string, healthStatus string, syncStatus string, ...)
- func SetEnvironmentAttributes(span trace.Span, sourceEnv string, targetEnv string)
- func SetPRAttributes(span trace.Span, prNumber int, repository string, headSHA string)
- func SetPolicyAttributes(span trace.Span, policyName string, autoPromote bool)
- func SetTestAttributes(span trace.Span, provider string, jobName string, runID string, status string)
- func Shutdown(ctx context.Context, tp *sdktrace.TracerProvider) error
- func StartSpanWithCluster(ctx context.Context, tracer trace.Tracer, name string, clusterName string) (context.Context, trace.Span)
- func StartSpanWithEnv(ctx context.Context, tracer trace.Tracer, name string, sourceEnv string, ...) (context.Context, trace.Span)
- func StartSpanWithPR(ctx context.Context, tracer trace.Tracer, name string, prNumber int, ...) (context.Context, trace.Span)
- type ComponentMetrics
- func (m *ComponentMetrics) DecPRState(state, repository string)
- func (m *ComponentMetrics) IncPRState(state, repository string)
- func (m *ComponentMetrics) ObserveArgoCDAPIDuration(start time.Time, cluster, method string)
- func (m *ComponentMetrics) ObserveDriftResolutionDuration(start time.Time, cluster string)
- func (m *ComponentMetrics) ObserveExternalTestDuration(start time.Time, provider string)
- func (m *ComponentMetrics) ObserveGitHubAPIDuration(start time.Time, method string)
- func (m *ComponentMetrics) ObservePRProcessingDuration(start time.Time, action, repository string)
- func (m *ComponentMetrics) ObservePromotionDuration(start time.Time, sourceEnv, targetEnv string)
- func (m *ComponentMetrics) RecordArgoCDAPICall(cluster, method, status string)
- func (m *ComponentMetrics) RecordDriftDetection(cluster, result string)
- func (m *ComponentMetrics) RecordExternalTestExecution(provider, result string)
- func (m *ComponentMetrics) RecordGitHubAPICall(method, status string)
- func (m *ComponentMetrics) RecordPRDiscovery(repository, result string)
- func (m *ComponentMetrics) RecordPRMerge(repository, result string)
- func (m *ComponentMetrics) RecordPRQualification(repository, result, reason string)
- func (m *ComponentMetrics) RecordPromotion(sourceEnv, targetEnv, result string)
- func (m *ComponentMetrics) SetDriftStatus(cluster, application, status string, value float64)
- func (m *ComponentMetrics) SetGitHubRateLimitRemaining(resource string, remaining float64)
- func (m *ComponentMetrics) SetPRState(state, repository string, count float64)
- type Exporter
- type Metrics
- type MetricsConfig
- type MetricsExporter
- type TracingConfig
Constants ¶
const ( // PR attributes AttrPRNumber = "pr.number" AttrPRRepository = "pr.repository" AttrPRHeadSHA = "pr.head.sha" AttrPRTitle = "pr.title" AttrPRAuthor = "pr.author" AttrPRState = "pr.state" AttrPRMergeable = "pr.mergeable" // Environment attributes AttrEnvSource = "environment.source" AttrEnvTarget = "environment.target" AttrEnvName = "environment.name" // Cluster attributes AttrClusterName = "cluster.name" AttrClusterEndpoint = "cluster.endpoint" // Test attributes AttrTestProvider = "test.provider" AttrTestJobName = "test.job" AttrTestRunID = "test.run_id" AttrTestStatus = "test.status" AttrTestURL = "test.url" // Policy attributes AttrPolicyName = "policy.name" AttrPolicyAutoPromote = "policy.auto_promote" // ArgoCD attributes AttrArgoApplication = "argocd.application" AttrArgoHealthStatus = "argocd.health.status" AttrArgoSyncStatus = "argocd.sync.status" AttrArgoRevision = "argocd.revision" // Error attributes AttrErrorType = "error.type" AttrErrorMessage = "error.message" // Action attributes AttrAction = "action" AttrResult = "result" )
Attribute keys for common span metadata. Following OpenTelemetry semantic conventions where applicable.
Variables ¶
This section is empty.
Functions ¶
func AddEvent ¶
AddEvent adds a timestamped event to a span. Use this to mark significant milestones within an operation.
Example:
observability.AddEvent(span, "tests-passed", "All external tests completed successfully")
func ExtractSpanID ¶
ExtractSpanID extracts the span ID from a span context. Returns empty string if no span ID is present. Use this for correlation in logs and metrics.
func ExtractTraceID ¶
ExtractTraceID extracts the trace ID from a span context. Returns empty string if no trace ID is present. Use this for correlation in logs and metrics.
func GetTracer ¶
GetTracer returns the global tracer instance for cd-operator. This should be used by all packages to create spans.
func InitTracing ¶
func InitTracing(ctx context.Context, cfg TracingConfig) (*sdktrace.TracerProvider, error)
InitTracing initializes OpenTelemetry distributed tracing with OTLP exporter. Returns a TracerProvider that must be shut down on application exit.
Features: - OTLP gRPC exporter for Jaeger/Tempo/etc. - W3C TraceContext + Baggage propagation - Resource attributes (service name, version, environment) - Configurable sampling rate - Graceful shutdown support
Example:
cfg := observability.TracingConfig{
Enabled: true,
Endpoint: "localhost:4317",
SamplingRate: 0.1,
ServiceName: "cd-operator",
}
tp, err := observability.InitTracing(ctx, cfg)
if err != nil {
log.Fatal("failed to init tracing", zap.Error(err))
}
defer observability.Shutdown(ctx, tp)
func NewMetricsWriter ¶
NewMetricsWriter wraps a writer to add Prometheus metrics collection. Each write operation updates log counters and duration histograms.
Example:
m := observability.NewMetrics()
writer := observability.NewMetricsWriter(os.Stdout, m)
logger, _ := logger.New(logger.Config{
Writer: writer,
})
func RecordError ¶
RecordError records an error on a span with standardized attributes. This marks the span as failed and includes error details.
Example:
if err != nil {
observability.RecordError(span, err)
return err
}
func RecordSuccess ¶
RecordSuccess marks a span as successful. Use this at the end of an operation to indicate completion without errors.
Example:
defer span.End() // ... do work ... observability.RecordSuccess(span)
func SetActionResult ¶
SetActionResult records the result of an action. Use this to track success/failure rates of different operations.
func SetArgoCDAttributes ¶
func SetArgoCDAttributes(span trace.Span, appName string, healthStatus string, syncStatus string, revision string)
SetArgoCDAttributes adds ArgoCD application attributes to an existing span. Use this when querying or updating ArgoCD applications.
func SetEnvironmentAttributes ¶
SetEnvironmentAttributes adds environment-related attributes to an existing span. Use this for promotion operations.
func SetPRAttributes ¶
SetPRAttributes adds PR-related attributes to an existing span. Use this when PR information becomes available mid-operation.
func SetPolicyAttributes ¶
SetPolicyAttributes adds promotion policy attributes to an existing span. Use this when loading or applying promotion policies.
func SetTestAttributes ¶
func SetTestAttributes(span trace.Span, provider string, jobName string, runID string, status string)
SetTestAttributes adds test-related attributes to an existing span. Use this when triggering or monitoring external tests.
func Shutdown ¶
func Shutdown(ctx context.Context, tp *sdktrace.TracerProvider) error
Shutdown gracefully shuts down the tracer provider. This flushes any pending spans to the backend before exit. Must be called before application exit to avoid losing traces.
Example:
defer observability.Shutdown(context.Background(), tp)
func StartSpanWithCluster ¶
func StartSpanWithCluster( ctx context.Context, tracer trace.Tracer, name string, clusterName string, ) (context.Context, trace.Span)
StartSpanWithCluster creates a span with cluster-specific attributes. Use this for operations that interact with ArgoCD clusters.
Example:
ctx, span := observability.StartSpanWithCluster(ctx, tracer, "query-argocd", "production") defer span.End()
func StartSpanWithEnv ¶
func StartSpanWithEnv( ctx context.Context, tracer trace.Tracer, name string, sourceEnv string, targetEnv string, ) (context.Context, trace.Span)
StartSpanWithEnv creates a span with environment-specific attributes. Use this for promotion operations that move between environments.
Example:
ctx, span := observability.StartSpanWithEnv(ctx, tracer, "promote", "dev", "staging") defer span.End()
func StartSpanWithPR ¶
func StartSpanWithPR( ctx context.Context, tracer trace.Tracer, name string, prNumber int, headSHA string, ) (context.Context, trace.Span)
StartSpanWithPR creates a span with PR-specific attributes. Use this for operations that process a specific pull request.
Example:
ctx, span := observability.StartSpanWithPR(ctx, tracer, "qualify-pr", 123, "abc123") defer span.End()
Types ¶
type ComponentMetrics ¶
type ComponentMetrics struct {
// contains filtered or unexported fields
}
ComponentMetrics holds all Prometheus metrics for cd-operator components. This is the central registry for all custom metrics, following the naming convention: cd_operator_<component>_<metric>_<unit>
All metrics are registered with the default Prometheus registry on package initialization. Metrics are designed to be non-blocking and best-effort to avoid impacting core operations.
func GetGlobalMetrics ¶
func GetGlobalMetrics() *ComponentMetrics
GetGlobalMetrics returns the global metrics instance. Returns nil if InitGlobalMetrics has not been called yet.
func InitGlobalMetrics ¶
func InitGlobalMetrics(registerer prometheus.Registerer) *ComponentMetrics
InitGlobalMetrics initializes the global metrics instance with the provided registry. This should be called once during application startup.
Example:
observability.InitGlobalMetrics(prometheus.DefaultRegisterer)
func NewComponentMetrics ¶
func NewComponentMetrics(registerer prometheus.Registerer) *ComponentMetrics
NewComponentMetrics creates and registers all component metrics with the provided registry. If registerer is nil, uses prometheus.DefaultRegisterer.
All metrics are registered atomically. If any metric fails to register (e.g., duplicate), the function panics to fail fast during operator startup.
Example:
registry := prometheus.NewRegistry() metrics := observability.NewComponentMetrics(registry)
func (*ComponentMetrics) DecPRState ¶
func (m *ComponentMetrics) DecPRState(state, repository string)
DecPRState decrements the count of PRs in a given state.
func (*ComponentMetrics) IncPRState ¶
func (m *ComponentMetrics) IncPRState(state, repository string)
IncPRState increments the count of PRs in a given state.
func (*ComponentMetrics) ObserveArgoCDAPIDuration ¶
func (m *ComponentMetrics) ObserveArgoCDAPIDuration(start time.Time, cluster, method string)
ObserveArgoCDAPIDuration records the duration of an ArgoCD API call.
func (*ComponentMetrics) ObserveDriftResolutionDuration ¶
func (m *ComponentMetrics) ObserveDriftResolutionDuration(start time.Time, cluster string)
ObserveDriftResolutionDuration records the duration of a drift resolution operation.
func (*ComponentMetrics) ObserveExternalTestDuration ¶
func (m *ComponentMetrics) ObserveExternalTestDuration(start time.Time, provider string)
ObserveExternalTestDuration records the duration of an external test execution.
func (*ComponentMetrics) ObserveGitHubAPIDuration ¶
func (m *ComponentMetrics) ObserveGitHubAPIDuration(start time.Time, method string)
ObserveGitHubAPIDuration records the duration of a GitHub API call.
func (*ComponentMetrics) ObservePRProcessingDuration ¶
func (m *ComponentMetrics) ObservePRProcessingDuration(start time.Time, action, repository string)
ObservePRProcessingDuration records the duration of a PR processing operation. Use with defer for automatic timing:
defer metrics.ObservePRProcessingDuration(time.Now(), "qualify", "owner/repo")
func (*ComponentMetrics) ObservePromotionDuration ¶
func (m *ComponentMetrics) ObservePromotionDuration(start time.Time, sourceEnv, targetEnv string)
ObservePromotionDuration records the duration of a promotion operation.
func (*ComponentMetrics) RecordArgoCDAPICall ¶
func (m *ComponentMetrics) RecordArgoCDAPICall(cluster, method, status string)
RecordArgoCDAPICall records an ArgoCD API call with its result status.
func (*ComponentMetrics) RecordDriftDetection ¶
func (m *ComponentMetrics) RecordDriftDetection(cluster, result string)
RecordDriftDetection records a drift detection operation result.
func (*ComponentMetrics) RecordExternalTestExecution ¶
func (m *ComponentMetrics) RecordExternalTestExecution(provider, result string)
RecordExternalTestExecution records an external test execution result.
func (*ComponentMetrics) RecordGitHubAPICall ¶
func (m *ComponentMetrics) RecordGitHubAPICall(method, status string)
RecordGitHubAPICall records a GitHub API call with its result status.
func (*ComponentMetrics) RecordPRDiscovery ¶
func (m *ComponentMetrics) RecordPRDiscovery(repository, result string)
RecordPRDiscovery records a PR discovery operation result.
func (*ComponentMetrics) RecordPRMerge ¶
func (m *ComponentMetrics) RecordPRMerge(repository, result string)
RecordPRMerge records a PR merge operation result.
func (*ComponentMetrics) RecordPRQualification ¶
func (m *ComponentMetrics) RecordPRQualification(repository, result, reason string)
RecordPRQualification records a PR qualification operation result.
func (*ComponentMetrics) RecordPromotion ¶
func (m *ComponentMetrics) RecordPromotion(sourceEnv, targetEnv, result string)
RecordPromotion records a promotion operation result.
func (*ComponentMetrics) SetDriftStatus ¶
func (m *ComponentMetrics) SetDriftStatus(cluster, application, status string, value float64)
SetDriftStatus sets the drift status for an application in a cluster.
func (*ComponentMetrics) SetGitHubRateLimitRemaining ¶
func (m *ComponentMetrics) SetGitHubRateLimitRemaining(resource string, remaining float64)
SetGitHubRateLimitRemaining sets the remaining GitHub API rate limit.
func (*ComponentMetrics) SetPRState ¶
func (m *ComponentMetrics) SetPRState(state, repository string, count float64)
SetPRState sets the current count of PRs in a given state.
type Exporter ¶
type Exporter interface {
// Push sends metrics immediately
Push(ctx context.Context) error
// Shutdown gracefully stops the exporter and pushes final metrics
Shutdown(ctx context.Context) error
}
Exporter defines the interface for metrics exporters. This interface can be mocked for testing.
type Metrics ¶
type Metrics struct {
// contains filtered or unexported fields
}
Metrics holds Prometheus collectors for logging observability.
func NewMetrics ¶
func NewMetrics() *Metrics
NewMetrics creates and registers Prometheus metrics for logging. Uses the default Prometheus registry.
Exposed metrics:
- cd_operator_log_total: Total number of log entries by level
- cd_operator_error_total: Total number of errors by level
- cd_operator_log_duration_seconds: Time spent writing logs
func NewMetricsWithRegistry ¶
func NewMetricsWithRegistry(registerer prometheus.Registerer) *Metrics
NewMetricsWithRegistry creates metrics with a custom registry. This allows for isolated metrics collection and custom exporters.
func (*Metrics) RecordError ¶
RecordError increments error counter.
func (*Metrics) SetExporter ¶
SetExporter configures a metrics exporter for this Metrics instance. This enables push mode or other export strategies.
type MetricsConfig ¶
type MetricsConfig struct {
// Enabled controls whether metrics collection is active
Enabled bool
// Mode determines how metrics are exported
// "pull" - HTTP server for Prometheus scraping (default)
// "push" - Push to Prometheus Push Gateway on exit
// "disabled" - Collect but don't export
Mode string
// PullPort is the HTTP port for pull mode (e.g., ":8081")
PullPort string
// PushGatewayURL is the URL for push mode (e.g., "http://localhost:9091")
PushGatewayURL string
// PushOnExit controls whether to push metrics when application exits
PushOnExit bool
// JobName identifies this application in the push gateway
JobName string
// InstanceID uniquely identifies this process instance
InstanceID string
// PushInterval for periodic pushing (0 = disabled, only push on exit)
PushInterval time.Duration
}
MetricsConfig controls how metrics are collected and exported.
func DefaultMetricsConfig ¶
func DefaultMetricsConfig() MetricsConfig
DefaultMetricsConfig returns sensible defaults for cd-operator.
type MetricsExporter ¶
type MetricsExporter struct {
// contains filtered or unexported fields
}
MetricsExporter handles exporting metrics to various backends.
func NewMetricsExporter ¶
func NewMetricsExporter(config MetricsConfig, registry *prometheus.Registry) *MetricsExporter
NewMetricsExporter creates a new metrics exporter with the given configuration.
func (*MetricsExporter) Push ¶
func (e *MetricsExporter) Push(ctx context.Context) error
Push sends metrics to the push gateway immediately. Safe to call even if push mode is not configured (no-op). Handles transient failures gracefully by logging warnings instead of failing hard.
func (*MetricsExporter) Shutdown ¶
func (e *MetricsExporter) Shutdown(ctx context.Context) error
Shutdown gracefully stops the exporter and pushes final metrics if configured.
func (*MetricsExporter) WithLogger ¶
func (e *MetricsExporter) WithLogger(logger httpclient.LeveledLogger) *MetricsExporter
WithLogger configures the exporter to use the provided logger. This should be called after creating the exporter to integrate with application logging.
type TracingConfig ¶
type TracingConfig struct {
// Enabled controls whether tracing is active.
// Default: false (tracing disabled)
Enabled bool
// Endpoint is the OTLP gRPC endpoint for trace export.
// Example: "localhost:4317" (Jaeger), "tempo:4317" (Grafana Tempo)
// Default: "localhost:4317"
Endpoint string
// SamplingRate determines the fraction of traces to record.
// 0.0 = sample nothing, 1.0 = sample everything.
// Default: 0.1 (10% sampling)
SamplingRate float64
// ServiceName identifies this service in the trace backend.
// Default: "cd-operator"
ServiceName string
// ServiceVersion is the version of the operator (e.g., from git tag).
// Default: "dev"
ServiceVersion string
// Environment identifies the deployment environment (dev, staging, prod).
// Default: "development"
Environment string
// Insecure disables TLS for the OTLP exporter (useful for local dev).
// Default: true (no TLS)
Insecure bool
}
TracingConfig controls OpenTelemetry distributed tracing behavior.
func DefaultTracingConfig ¶
func DefaultTracingConfig() TracingConfig
DefaultTracingConfig returns sensible defaults for cd-operator.