monitoring

package
v0.0.0-...-bd75e23 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2026 License: MIT Imports: 28 Imported by: 0

README

Advanced Monitoring and Alerting System

This package provides a comprehensive monitoring and alerting system for NovaCron. It allows for collecting, storing, analyzing, and alerting on metrics from various sources.

Features

  • Metrics Collection: Collect metrics from different sources (system, VMs, network, storage)
  • Metric Types: Support for counters, gauges, histograms, timers, and state metrics
  • Alerting: Define alert conditions based on metric thresholds and patterns
  • Notifications: Send alerts through multiple channels (email, webhook, Slack, etc.)
  • Historical Data: Store and analyze historical metric data
  • Trend Analysis: Analyze metric trends and predict future values
  • Multi-tenant Support: Metrics and alerts can be tenant-specific

Architecture

The monitoring system consists of the following components:

  1. Metric Registry: Central registry for all metrics
  2. Metric Collectors: Collect metrics from various sources
  3. Alert Manager: Evaluates alert conditions and triggers alerts
  4. Notification Manager: Manages notification channels and templates
  5. History Manager: Manages historical metric data and cleanup

Usage Examples

Basic Metric Collection
// Create metric registry
registry := monitoring.NewMetricRegistry()

// Create a gauge metric
cpuUsage := monitoring.NewGaugeMetric(
    "system.cpu.usage", 
    "CPU Usage", 
    "Percentage of CPU usage",
    "system",
)
cpuUsage.SetUnit("percent")

// Register the metric
registry.RegisterMetric(cpuUsage)

// Record a value
cpuUsage.RecordValue(45.6, nil)

// Get the latest value
value := cpuUsage.GetLastValue()
fmt.Printf("Current CPU usage: %.2f%%\n", value.Value)
Creating and Using a Collector
// Create metric registry
registry := monitoring.NewMetricRegistry()

// Create system collector with 5-second interval
collector := monitoring.NewSystemCollector(registry, 5*time.Second)

// Start the collector
collector.Start()

// Get all metrics provided by the collector
metrics := collector.GetMetrics()
for _, metric := range metrics {
    fmt.Printf("Metric: %s (%s)\n", metric.Name, metric.Description)
}

// Manually trigger a collection (normally automatic)
batches, err := collector.Collect()
if err != nil {
    fmt.Printf("Error collecting metrics: %v\n", err)
}

// Stop collector when done
collector.Stop()
Setting Up Alerting
// Create alert registry
alertRegistry := monitoring.NewAlertRegistry()

// Create a CPU usage alert
cpuAlert := monitoring.NewAlert(
    "cpu-usage-alert",
    "High CPU Usage",
    "Alert when CPU usage exceeds 80%",
    monitoring.AlertSeverityHigh,
    monitoring.AlertCondition{
        Type:     monitoring.ThresholdCondition,
        MetricID: "system.cpu.usage",
        Operator: monitoring.GreaterThanOrEqual,
        Threshold: func() *float64 {
            val := 80.0
            return &val
        }(),
        Period: func() *time.Duration {
            period := 30 * time.Second
            return &period
        }(),
    },
)

// Register the alert
alertRegistry.RegisterAlert(cpuAlert)

// Create alert manager
alertManager := monitoring.NewAlertManager(alertRegistry, registry, 5*time.Second)
alertManager.Start()
Setting Up Notifications
// Create notification manager
notificationManager := monitoring.NewNotificationManager()

// Add default email template
emailTemplate := monitoring.DefaultEmailTemplate()
notificationManager.AddTemplate(emailTemplate)

// Configure email notification
emailConfig := &monitoring.NotificationConfig{
    ID:      "email-config",
    Name:    "Email Notifications",
    Channel: monitoring.EmailChannel,
    Enabled: true,
    Settings: map[string]interface{}{
        "server":      "smtp.example.com",
        "port":        587,
        "username":    "alerts@example.com",
        "password":    "password",
        "fromAddress": "alerts@example.com",
        "toAddresses": []string{"admin@example.com"},
        "useTLS":      true,
    },
}
notificationManager.AddConfig(emailConfig)
notificationManager.CreateEmailNotifier("email-config")

// Send notification
notificationManager.SendNotification(alert, "default-email-template")
Historical Metrics and Analysis
// Create metrics history manager
historyManager := monitoring.NewMetricHistoryManager(
    registry, 
    24*time.Hour, // Data retention
    1*time.Hour,  // Cleanup interval
)
historyManager.Start()

// Get historical values
start := time.Now().Add(-1 * time.Hour)
end := time.Now()
values, err := historyManager.GetHistoricalValues("system.cpu.usage", start, end)
if err != nil {
    fmt.Printf("Error getting historical values: %v\n", err)
}

// Analyze trend
slope, err := historyManager.AnalyzeMetricTrend("system.cpu.usage", 1*time.Hour)
if err != nil {
    fmt.Printf("Error analyzing trend: %v\n", err)
}
fmt.Printf("CPU usage trend: %.2f%%/s\n", slope)

// Predict future value
future := time.Now().Add(30 * time.Minute)
predictedValue, err := historyManager.PredictMetricValue("system.cpu.usage", future)
if err != nil {
    fmt.Printf("Error predicting value: %v\n", err)
}
fmt.Printf("Predicted CPU usage in 30 minutes: %.2f%%\n", predictedValue)

Integration Points

The monitoring system integrates with other NovaCron components:

  • VM Manager: Collects VM performance metrics
  • Network Manager: Collects network metrics
  • Storage Manager: Collects storage metrics
  • Multi-tenant Architecture: Filters metrics and alerts by tenant
  • Authentication System: Controls access to metrics and alerts

Future Enhancements

  • Visual Dashboard: Provide real-time visualization of metrics and alerts
  • Custom Query Language: Allow for complex metric queries and aggregations
  • Machine Learning: Improve anomaly detection and prediction accuracy
  • Event Correlation: Correlate metrics and events across components
  • Auto-scaling Triggers: Use metrics to automatically adjust resources

Documentation

Overview

Package monitoring provides production-grade metrics collection and telemetry for DWCP v3 Phase 6 rollout with real-time monitoring capabilities.

This file contains type definitions for the VM telemetry system

Index

Constants

View Source
const (
	// DefaultCollectionInterval is the default interval for metric collection
	DefaultCollectionInterval = 30 * time.Second

	// DefaultRetentionPeriod is the default period to retain metrics
	DefaultRetentionPeriod = 30 * 24 * time.Hour // 30 days
)

Variables

This section is empty.

Functions

This section is empty.

Types

type AggregationMethod

type AggregationMethod string

AggregationMethod defines how metrics are aggregated

const (
	// AggregationMethodSum sums the metric values
	AggregationMethodSum AggregationMethod = "sum"

	// AggregationMethodAvg averages the metric values
	AggregationMethodAvg AggregationMethod = "avg"

	// AggregationMethodMin takes the minimum value
	AggregationMethodMin AggregationMethod = "min"

	// AggregationMethodMax takes the maximum value
	AggregationMethodMax AggregationMethod = "max"

	// AggregationMethodCount counts occurrences
	AggregationMethodCount AggregationMethod = "count"

	// AggregationMethodP50 returns the 50th percentile (median)
	AggregationMethodP50 AggregationMethod = "p50"

	// AggregationMethodP90 returns the 90th percentile
	AggregationMethodP90 AggregationMethod = "p90"

	// AggregationMethodP95 returns the 95th percentile
	AggregationMethodP95 AggregationMethod = "p95"

	// AggregationMethodP99 returns the 99th percentile
	AggregationMethodP99 AggregationMethod = "p99"
)

type Alert

type Alert struct {
	// ID is a unique identifier for the alert
	ID string `json:"id"`

	// Name is a human-readable name for the alert
	Name string `json:"name"`

	// Description describes the alert
	Description string `json:"description"`

	// Severity indicates the severity of the alert
	Severity AlertSeverity `json:"severity"`

	// Type indicates the type of alert
	Type AlertType `json:"type"`

	// Condition defines when the alert should trigger
	Condition AlertCondition `json:"condition"`

	// Labels are additional metadata for the alert
	Labels map[string]string `json:"labels,omitempty"`

	// Annotations are additional information for the alert
	Annotations map[string]string `json:"annotations,omitempty"`

	// NotificationChannels are the channels to notify when the alert fires
	NotificationChannels []string `json:"notification_channels,omitempty"`

	// Enabled indicates whether the alert is enabled
	Enabled bool `json:"enabled"`

	// Status is the current status of the alert
	Status AlertStatus `json:"status"`
}

Alert represents an alert definition

type AlertCondition

type AlertCondition struct {
	// MetricName is the name of the metric to check
	MetricName string `json:"metric_name"`

	// Operator is the comparison operator
	Operator AlertConditionOperator `json:"operator"`

	// Threshold is the threshold value
	Threshold float64 `json:"threshold"`

	// Duration is the duration the condition must be true for
	Duration time.Duration `json:"duration"`

	// Tags to filter metrics by
	Tags map[string]string `json:"tags,omitempty"`
}

AlertCondition represents a condition for triggering an alert

type AlertConditionOperator

type AlertConditionOperator string

AlertConditionOperator represents comparison operators for alert conditions

const (
	// AlertConditionOperatorEqual represents the equal operator
	AlertConditionOperatorEqual AlertConditionOperator = "eq"

	// AlertConditionOperatorNotEqual represents the not equal operator
	AlertConditionOperatorNotEqual AlertConditionOperator = "ne"

	// AlertConditionOperatorGreaterThan represents the greater than operator
	AlertConditionOperatorGreaterThan AlertConditionOperator = "gt"

	// AlertConditionOperatorGreaterThanOrEqual represents the greater than or equal operator
	AlertConditionOperatorGreaterThanOrEqual AlertConditionOperator = "gte"

	// AlertConditionOperatorLessThan represents the less than operator
	AlertConditionOperatorLessThan AlertConditionOperator = "lt"

	// AlertConditionOperatorLessThanOrEqual represents the less than or equal operator
	AlertConditionOperatorLessThanOrEqual AlertConditionOperator = "lte"
)

type AlertInstance

type AlertInstance struct {
	// Alert is the alert definition
	Alert *Alert `json:"alert"`

	// Value is the value that triggered the alert
	Value float64 `json:"value"`

	// StartTime is when the alert started firing
	StartTime time.Time `json:"start_time"`

	// EndTime is when the alert stopped firing (if resolved)
	EndTime time.Time `json:"end_time,omitempty"`

	// Status is the current status of the alert instance
	Status AlertStatus `json:"status"`

	// AcknowledgedBy is who acknowledged the alert
	AcknowledgedBy string `json:"acknowledged_by,omitempty"`

	// AcknowledgedTime is when the alert was acknowledged
	AcknowledgedTime time.Time `json:"acknowledged_time,omitempty"`

	// AcknowledgementComment is a comment about the acknowledgement
	AcknowledgementComment string `json:"acknowledgement_comment,omitempty"`
}

AlertInstance represents an instance of a triggered alert

type AlertManager

type AlertManager struct {
	// contains filtered or unexported fields
}

AlertManager manages alerts

func NewAlertManager

func NewAlertManager(metricCollector *DistributedMetricCollector) *AlertManager

NewAlertManager creates a new alert manager

func (*AlertManager) AcknowledgeAlert

func (m *AlertManager) AcknowledgeAlert(instanceID, acknowledgedBy, comment string) error

AcknowledgeAlert acknowledges an alert instance

func (*AlertManager) CheckMetric

func (m *AlertManager) CheckMetric(metric *Metric)

CheckMetric checks if a metric triggers any alerts

func (*AlertManager) DeregisterAlert

func (m *AlertManager) DeregisterAlert(alertID string) bool

DeregisterAlert deregisters an alert

func (*AlertManager) GetAlert

func (m *AlertManager) GetAlert(alertID string) (*Alert, error)

GetAlert gets an alert by ID

func (*AlertManager) ListAlertInstances

func (m *AlertManager) ListAlertInstances() []*AlertInstance

ListAlertInstances lists all alert instances

func (*AlertManager) ListAlerts

func (m *AlertManager) ListAlerts() []*Alert

ListAlerts lists all alerts

func (*AlertManager) RegisterAlert

func (m *AlertManager) RegisterAlert(alert *Alert) error

RegisterAlert registers an alert

func (*AlertManager) Start

func (m *AlertManager) Start() error

Start starts the alert manager

func (*AlertManager) Stop

func (m *AlertManager) Stop() error

Stop stops the alert manager

type AlertSeverity

type AlertSeverity string

AlertSeverity represents the severity of an alert

const (
	// AlertSeverityInfo represents an informational alert
	AlertSeverityInfo AlertSeverity = "info"

	// AlertSeverityWarning represents a warning alert
	AlertSeverityWarning AlertSeverity = "warning"

	// AlertSeverityError represents an error alert
	AlertSeverityError AlertSeverity = "error"

	// AlertSeverityCritical represents a critical alert
	AlertSeverityCritical AlertSeverity = "critical"
)

func (AlertSeverity) String

func (s AlertSeverity) String() string

String returns the string representation of an AlertSeverity

type AlertStatus

type AlertStatus string

AlertStatus represents the status of an alert

const (
	// AlertStatusFiring indicates the alert is currently firing
	AlertStatusFiring AlertStatus = "firing"

	// AlertStatusResolved indicates the alert has been resolved
	AlertStatusResolved AlertStatus = "resolved"

	// AlertStatusAcknowledged indicates the alert has been acknowledged
	AlertStatusAcknowledged AlertStatus = "acknowledged"

	// AlertStatusSuppressed indicates the alert is suppressed
	AlertStatusSuppressed AlertStatus = "suppressed"
)

type AlertType

type AlertType string

AlertType represents the type of alert

const (
	// AlertTypeThreshold represents a threshold-based alert
	AlertTypeThreshold AlertType = "threshold"

	// AlertTypeRateOfChange represents a rate-of-change alert
	AlertTypeRateOfChange AlertType = "rate_of_change"

	// AlertTypeAnomaly represents an anomaly detection alert
	AlertTypeAnomaly AlertType = "anomaly"

	// AlertTypeEvent represents an event-based alert
	AlertTypeEvent AlertType = "event"
)

type AnalyticsEngine

type AnalyticsEngine struct {
	// contains filtered or unexported fields
}

AnalyticsEngine processes metrics to generate insights

func NewAnalyticsEngine

func NewAnalyticsEngine(config *AnalyticsEngineConfig, metricCollector *DistributedMetricCollector) *AnalyticsEngine

NewAnalyticsEngine creates a new analytics engine

func (*AnalyticsEngine) AddProcessor

func (e *AnalyticsEngine) AddProcessor(processor AnalyticsProcessor)

AddProcessor adds an analytics processor

func (*AnalyticsEngine) GetResult

func (e *AnalyticsEngine) GetResult(resultID string) (*AnalyticsResult, error)

GetResult retrieves an analytics result by ID

func (*AnalyticsEngine) ListResults

func (e *AnalyticsEngine) ListResults() []*AnalyticsResult

ListResults lists all analytics results

func (*AnalyticsEngine) QueryResults

func (e *AnalyticsEngine) QueryResults(query AnalyticsQuery) ([]*AnalyticsResult, error)

QueryResults queries analytics results

func (*AnalyticsEngine) RemoveProcessor

func (e *AnalyticsEngine) RemoveProcessor(processorID string) bool

RemoveProcessor removes an analytics processor

func (*AnalyticsEngine) RunAdhocAnalysis

func (e *AnalyticsEngine) RunAdhocAnalysis(ctx context.Context, processorID string, parameters map[string]interface{}) (*AnalyticsResult, error)

RunAdhocAnalysis runs analytics on demand

func (*AnalyticsEngine) Start

func (e *AnalyticsEngine) Start() error

Start begins analytics processing

func (*AnalyticsEngine) Stop

func (e *AnalyticsEngine) Stop() error

Stop halts analytics processing

type AnalyticsEngineConfig

type AnalyticsEngineConfig struct {
	// ProcessingInterval is how often analytics are processed
	ProcessingInterval time.Duration

	// RetentionPeriod is how long analytics results are retained
	RetentionPeriod time.Duration

	// Processors is the list of analytic processors to use
	Processors []AnalyticsProcessor

	// EnablePredictiveAnalytics enables predictive analytics
	EnablePredictiveAnalytics bool

	// PredictionWindow is how far to predict
	PredictionWindow time.Duration
}

AnalyticsEngineConfig contains configuration for the analytics engine

func DefaultAnalyticsEngineConfig

func DefaultAnalyticsEngineConfig() *AnalyticsEngineConfig

DefaultAnalyticsEngineConfig returns the default configuration

type AnalyticsProcessor

type AnalyticsProcessor interface {
	// ID returns the processor ID
	ID() string

	// Enabled returns whether the processor is enabled
	Enabled() bool

	// RequiredMetrics returns the metric patterns required for processing
	RequiredMetrics() []string

	// RequiredPreviousResults returns the prior result IDs required for processing
	RequiredPreviousResults() []string

	// Process processes metrics and generates an analytics result
	Process(ctx context.Context, inputs *AnalyticsProcessorInputs) (*AnalyticsResult, error)
}

AnalyticsProcessor processes metrics to generate analytics

type AnalyticsProcessorInputs

type AnalyticsProcessorInputs struct {
	// MetricData is a map of metric patterns to metric series
	MetricData map[string][]*MetricSeries

	// PriorResults is a map of result IDs to prior results
	PriorResults map[string]*AnalyticsResult

	// TimeRange is the time range to process
	TimeRange TimeRange

	// Parameters contains additional parameters for processing
	Parameters map[string]interface{}
}

AnalyticsProcessorInputs contains inputs for analytics processing

type AnalyticsQuery

type AnalyticsQuery struct {
	// Type filters by result type
	Type string `json:"type"`

	// Category filters by result category
	Category string `json:"category"`

	// Start time of the query range
	Start time.Time `json:"start"`

	// End time of the query range
	End time.Time `json:"end"`

	// Tags to filter by
	Tags map[string]string `json:"tags"`
}

AnalyticsQuery defines parameters for querying analytics results

type AnalyticsResult

type AnalyticsResult struct {
	// ID is the result ID
	ID string `json:"id"`

	// Type is the result type (e.g., "anomaly", "prediction")
	Type string `json:"type"`

	// Category is the result category (e.g., "system", "network")
	Category string `json:"category"`

	// Timestamp is when the result was generated
	Timestamp time.Time `json:"timestamp"`

	// TimeRange is the time range the result covers
	TimeRange TimeRange `json:"time_range"`

	// Tags are additional metadata for the result
	Tags map[string]string `json:"tags,omitempty"`

	// Summary is a short summary of the result
	Summary string `json:"summary"`

	// Details contains detailed information about the result
	Details map[string]interface{} `json:"details,omitempty"`

	// Confidence is the confidence level of the result (0-1)
	Confidence float64 `json:"confidence"`

	// Severity indicates the severity of the result (0-1)
	Severity float64 `json:"severity"`
}

AnalyticsResult represents the result of analytics processing

type Anomaly

type Anomaly struct {
	Value    float64
	Mean     float64
	StdDev   float64
	Severity string
	Message  string
}

Anomaly represents a detected anomaly

type AnomalyDetectionProcessor

type AnomalyDetectionProcessor struct {
	// contains filtered or unexported fields
}

AnomalyDetectionProcessor detects anomalies in metrics

func (*AnomalyDetectionProcessor) Enabled

func (p *AnomalyDetectionProcessor) Enabled() bool

Enabled returns whether the processor is enabled

func (*AnomalyDetectionProcessor) ID

ID returns the processor ID

func (*AnomalyDetectionProcessor) Process

Process processes metrics and generates an analytics result

func (*AnomalyDetectionProcessor) RequiredMetrics

func (p *AnomalyDetectionProcessor) RequiredMetrics() []string

RequiredMetrics returns the metric patterns required for processing

func (*AnomalyDetectionProcessor) RequiredPreviousResults

func (p *AnomalyDetectionProcessor) RequiredPreviousResults() []string

RequiredPreviousResults returns the prior result IDs required for processing

type AnomalyDetector

type AnomalyDetector struct {
	// contains filtered or unexported fields
}

AnomalyDetector detects statistical anomalies

func NewAnomalyDetector

func NewAnomalyDetector(stdThreshold float64) *AnomalyDetector

NewAnomalyDetector creates a new anomaly detector

func (*AnomalyDetector) Detect

func (ad *AnomalyDetector) Detect(values []float64) *Anomaly

Detect detects anomalies in the given values using z-score

type CircularBuffer

type CircularBuffer struct {
	// contains filtered or unexported fields
}

CircularBuffer implements a fixed-size circular buffer for metrics

func NewCircularBuffer

func NewCircularBuffer(size int) *CircularBuffer

NewCircularBuffer creates a new circular buffer

func (*CircularBuffer) Add

func (cb *CircularBuffer) Add(value float64)

Add adds a value to the buffer

func (*CircularBuffer) Latest

func (cb *CircularBuffer) Latest() float64

Latest returns the most recently added value

func (*CircularBuffer) Max

func (cb *CircularBuffer) Max() float64

Max returns the maximum value

func (*CircularBuffer) Mean

func (cb *CircularBuffer) Mean() float64

Mean calculates the mean of all values

func (*CircularBuffer) Percentile

func (cb *CircularBuffer) Percentile(p float64) float64

Percentile calculates the nth percentile

func (*CircularBuffer) Values

func (cb *CircularBuffer) Values() []float64

Values returns all values in the buffer

type CollectorManager

type CollectorManager struct {
	// contains filtered or unexported fields
}

CollectorManager manages multiple collectors

func NewCollectorManager

func NewCollectorManager() *CollectorManager

NewCollectorManager creates a new collector manager

func (*CollectorManager) AddCollector

func (m *CollectorManager) AddCollector(collector managedCollector)

AddCollector adds a collector

func (*CollectorManager) GetCollectors

func (m *CollectorManager) GetCollectors() []MetricCollector

GetCollectors gets all collectors

func (*CollectorManager) StartAll

func (m *CollectorManager) StartAll() error

StartAll starts all collectors

func (*CollectorManager) StopAll

func (m *CollectorManager) StopAll() error

StopAll stops all collectors

type ComponentHealth

type ComponentHealth struct {
	Status  string `json:"status"`
	Message string `json:"message"`
}

type ConsoleChannel

type ConsoleChannel struct {
	// contains filtered or unexported fields
}

ConsoleChannel is a notification channel that logs to the console

func NewConsoleChannel

func NewConsoleChannel(id string) *ConsoleChannel

NewConsoleChannel creates a new console channel

func (*ConsoleChannel) ID

func (c *ConsoleChannel) ID() string

ID returns the channel ID

func (*ConsoleChannel) IsEnabled

func (c *ConsoleChannel) IsEnabled() bool

IsEnabled returns whether the channel is enabled

func (*ConsoleChannel) Send

func (c *ConsoleChannel) Send(notification *Notification) error

Send sends a notification to the console

func (*ConsoleChannel) Type

func (c *ConsoleChannel) Type() string

Type returns the channel type

type DistributedMetricCollector

type DistributedMetricCollector struct {
	// contains filtered or unexported fields
}

DistributedMetricCollector collects metrics across a distributed cluster

func NewDistributedMetricCollector

func NewDistributedMetricCollector(config *DistributedMetricCollectorConfig, storage *storage.InMemoryStorage) *DistributedMetricCollector

NewDistributedMetricCollector creates a new distributed metric collector

func (*DistributedMetricCollector) AddCollector

func (d *DistributedMetricCollector) AddCollector(collector MetricCollector)

AddCollector adds a metric collector

func (*DistributedMetricCollector) DeregisterAlert

func (d *DistributedMetricCollector) DeregisterAlert(alertID string) bool

DeregisterAlert deregisters an alert

func (*DistributedMetricCollector) GetMetric

func (d *DistributedMetricCollector) GetMetric(ctx context.Context, name string, tags map[string]string, start, end time.Time) (*MetricSeries, error)

GetMetric retrieves a metric by name and tags

func (*DistributedMetricCollector) QueryMetrics

func (d *DistributedMetricCollector) QueryMetrics(ctx context.Context, query MetricQuery) ([]*MetricSeries, error)

QueryMetrics performs a query across metrics

func (*DistributedMetricCollector) RegisterAlert

func (d *DistributedMetricCollector) RegisterAlert(alert *Alert) error

RegisterAlert registers an alert

func (*DistributedMetricCollector) RemoveCollector

func (d *DistributedMetricCollector) RemoveCollector(collectorID string) bool

RemoveCollector removes a metric collector

func (*DistributedMetricCollector) Start

func (d *DistributedMetricCollector) Start() error

Start begins metric collection

func (*DistributedMetricCollector) Stop

Stop halts metric collection

func (*DistributedMetricCollector) StoreMetric

func (d *DistributedMetricCollector) StoreMetric(ctx context.Context, metric *Metric) error

StoreMetric stores a metric

type DistributedMetricCollectorConfig

type DistributedMetricCollectorConfig struct {
	// CollectionInterval is how often metrics are collected
	CollectionInterval time.Duration

	// RetentionPeriod is how long metrics are retained
	RetentionPeriod time.Duration

	// StoragePath is where metrics are stored
	StoragePath string

	// Collectors is the list of metric collectors to use
	Collectors []MetricCollector

	// EnableAggregation enables cluster-wide metric aggregation
	EnableAggregation bool

	// AggregationEndpoints is the list of endpoints to send aggregated metrics
	AggregationEndpoints []string

	// NodeID is the unique identifier for this node
	NodeID string

	// ClusterID is the identifier for the cluster
	ClusterID string

	// Tags are additional metadata tags for metrics
	Tags map[string]string
}

DistributedMetricCollectorConfig contains configuration for the distributed metric collector

func DefaultDistributedMetricCollectorConfig

func DefaultDistributedMetricCollectorConfig() *DistributedMetricCollectorConfig

DefaultDistributedMetricCollectorConfig returns the default configuration

type EmailChannel

type EmailChannel struct {
	// contains filtered or unexported fields
}

EmailChannel is a notification channel that sends emails

func NewEmailChannel

func NewEmailChannel(id, smtpServer string, smtpPort int, smtpUsername, smtpPassword, fromAddress string, recipients []string) *EmailChannel

NewEmailChannel creates a new email channel

func (*EmailChannel) ID

func (c *EmailChannel) ID() string

ID returns the channel ID

func (*EmailChannel) IsEnabled

func (c *EmailChannel) IsEnabled() bool

IsEnabled returns whether the channel is enabled

func (*EmailChannel) Send

func (c *EmailChannel) Send(notification *Notification) error

Send sends a notification via email

func (*EmailChannel) Type

func (c *EmailChannel) Type() string

Type returns the channel type

type ErrorRateTracker

type ErrorRateTracker struct {
	// contains filtered or unexported fields
}

ErrorRateTracker tracks error rate over a time window

func NewErrorRateTracker

func NewErrorRateTracker(windowMinutes int) *ErrorRateTracker

NewErrorRateTracker creates a new error rate tracker

func (*ErrorRateTracker) Rate

func (ert *ErrorRateTracker) Rate() float64

Rate returns the current error rate (errors per second)

func (*ErrorRateTracker) RecordError

func (ert *ErrorRateTracker) RecordError()

RecordError records an error occurrence

type KVMVMManager

type KVMVMManager struct {
	// contains filtered or unexported fields
}

KVMVMManager is an implementation of VMManagerInterface for KVM hypervisors It collects VM metrics using the libvirt API

func NewKVMVMManager

func NewKVMVMManager(ctx context.Context, config *KVMVMManagerConfig, nodeID string) (*KVMVMManager, error)

NewKVMVMManager creates a new KVM VM Manager with the given config

func (*KVMVMManager) Close

func (m *KVMVMManager) Close() error

Close closes the KVM VM Manager and any associated resources

func (*KVMVMManager) GetVMStats

func (m *KVMVMManager) GetVMStats(ctx context.Context, vmID string, detailLevel VMMetricDetailLevel) (*VMStats, error)

GetVMStats retrieves stats for a specific VM Implements VMManagerInterface

func (*KVMVMManager) GetVMs

func (m *KVMVMManager) GetVMs(ctx context.Context) ([]string, error)

GetVMs returns a list of all VM IDs Implements VMManagerInterface

type KVMVMManagerConfig

type KVMVMManagerConfig struct {
	// URI is the libvirt connection URI (e.g., qemu:///system)
	URI string
	// RefreshInterval is how often to refresh the VM cache
	RefreshInterval time.Duration
	// MetricCacheTTL is how long to cache metrics before re-collecting
	MetricCacheTTL time.Duration
	// Timeout for libvirt operations
	Timeout time.Duration
	// Detailed metrics collection (may increase overhead)
	DetailedMetrics bool
}

KVMVMManagerConfig contains configuration for the KVM VM Manager

func DefaultKVMVMManagerConfig

func DefaultKVMVMManagerConfig() *KVMVMManagerConfig

DefaultKVMVMManagerConfig returns a default configuration for KVM VM Manager

type LatencyMetrics

type LatencyMetrics struct {
	P50 float64 `json:"p50_ms"`
	P95 float64 `json:"p95_ms"`
	P99 float64 `json:"p99_ms"`
	Avg float64 `json:"avg_ms"`
	Max float64 `json:"max_ms"`
}

LatencyMetrics contains latency percentiles

type LibvirtConnection

type LibvirtConnection interface {
	// GetDomains returns a list of all domains (VMs)
	GetDomains(ctx context.Context) ([]LibvirtDomain, error)
	// GetDomainByID returns a domain by ID
	GetDomainByID(ctx context.Context, id string) (LibvirtDomain, error)
	// Close closes the connection
	Close() error
}

LibvirtConnection interface abstracts the libvirt connection This allows for easier testing and mocking

type LibvirtConnectionImpl

type LibvirtConnectionImpl struct {
}

LibvirtConnectionImpl is a concrete implementation of LibvirtConnection This would be implemented using libvirt-go in a real deployment

func (*LibvirtConnectionImpl) Close

func (l *LibvirtConnectionImpl) Close() error

Close implements LibvirtConnection.Close

func (*LibvirtConnectionImpl) GetDomainByID

func (l *LibvirtConnectionImpl) GetDomainByID(ctx context.Context, id string) (LibvirtDomain, error)

GetDomainByID implements LibvirtConnection.GetDomainByID

func (*LibvirtConnectionImpl) GetDomains

func (l *LibvirtConnectionImpl) GetDomains(ctx context.Context) ([]LibvirtDomain, error)

GetDomains implements LibvirtConnection.GetDomains

type LibvirtDomain

type LibvirtDomain interface {
	// GetID returns the domain ID
	GetID() string
	// GetName returns the domain name
	GetName() string
	// GetState returns the domain state
	GetState() (vm.VMState, error)
	// GetCPUStats returns CPU statistics
	GetCPUStats(ctx context.Context) (*VMCPUStats, error)
	// GetMemoryStats returns memory statistics
	GetMemoryStats(ctx context.Context) (*VMMemoryStats, error)
	// GetDiskStats returns disk statistics
	GetDiskStats(ctx context.Context) (map[string]*VMDiskStats, error)
	// GetNetworkStats returns network statistics
	GetNetworkStats(ctx context.Context) (map[string]*VMNetworkStats, error)
}

LibvirtDomain interface abstracts a libvirt domain (VM)

type LibvirtDomainImpl

type LibvirtDomainImpl struct {
	// contains filtered or unexported fields
}

LibvirtDomainImpl is a concrete implementation of LibvirtDomain This would be implemented using libvirt-go in a real deployment

func (*LibvirtDomainImpl) GetCPUStats

func (d *LibvirtDomainImpl) GetCPUStats(ctx context.Context) (*VMCPUStats, error)

GetCPUStats implements LibvirtDomain.GetCPUStats

func (*LibvirtDomainImpl) GetDiskStats

func (d *LibvirtDomainImpl) GetDiskStats(ctx context.Context) (map[string]*VMDiskStats, error)

GetDiskStats implements LibvirtDomain.GetDiskStats

func (*LibvirtDomainImpl) GetID

func (d *LibvirtDomainImpl) GetID() string

GetID implements LibvirtDomain.GetID

func (*LibvirtDomainImpl) GetMemoryStats

func (d *LibvirtDomainImpl) GetMemoryStats(ctx context.Context) (*VMMemoryStats, error)

GetMemoryStats implements LibvirtDomain.GetMemoryStats

func (*LibvirtDomainImpl) GetName

func (d *LibvirtDomainImpl) GetName() string

GetName implements LibvirtDomain.GetName

func (*LibvirtDomainImpl) GetNetworkStats

func (d *LibvirtDomainImpl) GetNetworkStats(ctx context.Context) (map[string]*VMNetworkStats, error)

GetNetworkStats implements LibvirtDomain.GetNetworkStats

func (*LibvirtDomainImpl) GetState

func (d *LibvirtDomainImpl) GetState() (vm.VMState, error)

GetState implements LibvirtDomain.GetState

type Metric

type Metric struct {
	// Name of the metric
	Name string `json:"name"`

	// Type of metric
	Type MetricType `json:"type"`

	// Value of the metric
	Value float64 `json:"value"`

	// Timestamp of the metric
	Timestamp time.Time `json:"timestamp"`

	// Tags associated with the metric
	Tags map[string]string `json:"tags"`

	// Unit of the metric (e.g., bytes, seconds, count)
	Unit string `json:"unit,omitempty"`

	// Source of the metric (e.g., node ID, component name)
	Source string `json:"source,omitempty"`
}

Metric represents a single metric data point

func NewGaugeMetric

func NewGaugeMetric(name, displayName, description string, category string) *Metric

NewGaugeMetric creates a new gauge metric

func NewMetric

func NewMetric(name string, metricType MetricType, value float64, tags map[string]string) *Metric

NewMetric creates a new metric

func (*Metric) WithSource

func (m *Metric) WithSource(source string) *Metric

WithSource sets the source of the metric

func (*Metric) WithTimestamp

func (m *Metric) WithTimestamp(timestamp time.Time) *Metric

WithTimestamp sets the timestamp of the metric

func (*Metric) WithUnit

func (m *Metric) WithUnit(unit string) *Metric

WithUnit sets the unit of the metric

type MetricAggregationConfig

type MetricAggregationConfig struct {
	// MetricName is the name of the metric to aggregate
	MetricName string

	// Method is the aggregation method
	Method AggregationMethod

	// TagsToAggregate are tags to group by for aggregation
	TagsToAggregate []string

	// RemoveTags are tags to remove before forwarding
	RemoveTags []string

	// AddTags are additional tags to add
	AddTags map[string]string

	// Interval is how often to aggregate and forward
	Interval time.Duration
}

MetricAggregationConfig defines aggregation configuration for a metric

type MetricAggregator

type MetricAggregator struct {
	// contains filtered or unexported fields
}

MetricAggregator aggregates metrics from multiple sources

func NewMetricAggregator

func NewMetricAggregator(nodeID, clusterID string, endpoints []string) *MetricAggregator

NewMetricAggregator creates a new metric aggregator

func (*MetricAggregator) AddAggregationConfig

func (m *MetricAggregator) AddAggregationConfig(config *MetricAggregationConfig)

AddAggregationConfig adds an aggregation configuration

func (*MetricAggregator) AddMetric

func (m *MetricAggregator) AddMetric(metric *Metric)

AddMetric adds a metric to the aggregation buffer

func (*MetricAggregator) GetMetricBuffer

func (m *MetricAggregator) GetMetricBuffer(metricName string) int

GetMetricBuffer gets current buffer size for a metric

func (*MetricAggregator) RemoveAggregationConfig

func (m *MetricAggregator) RemoveAggregationConfig(metricName string) bool

RemoveAggregationConfig removes an aggregation configuration

func (*MetricAggregator) Start

func (m *MetricAggregator) Start() error

Start starts the aggregator

func (*MetricAggregator) Stop

func (m *MetricAggregator) Stop() error

Stop stops the aggregator

type MetricBatch

type MetricBatch struct {
	Metrics   []*Metric `json:"metrics"`
	Timestamp time.Time `json:"timestamp"`
	Source    string    `json:"source"`
}

MetricBatch represents a batch of metrics to be processed together

func NewMetricBatch

func NewMetricBatch(source string) *MetricBatch

NewMetricBatch creates a new metric batch

func (*MetricBatch) AddMetric

func (b *MetricBatch) AddMetric(metric *Metric)

AddMetric adds a metric to the batch

func (*MetricBatch) IsEmpty

func (b *MetricBatch) IsEmpty() bool

IsEmpty returns true if the batch is empty

func (*MetricBatch) Size

func (b *MetricBatch) Size() int

Size returns the number of metrics in the batch

type MetricCollector

type MetricCollector interface {
	// ID returns the ID of the collector
	ID() string

	// Collect collects metrics
	Collect(ctx context.Context) ([]*Metric, error)

	// Enabled returns whether the collector is enabled
	Enabled() bool
}

MetricCollector is an interface for collecting metrics

type MetricHistoryManager

type MetricHistoryManager struct {
	// contains filtered or unexported fields
}

MetricHistoryManager manages historical metrics

func NewMetricHistoryManager

func NewMetricHistoryManager(registry *MetricRegistry, retentionTime, cleanupInterval time.Duration) *MetricHistoryManager

NewMetricHistoryManager creates a new metric history manager

func (*MetricHistoryManager) AnalyzeMetricTrend

func (m *MetricHistoryManager) AnalyzeMetricTrend(metricID string, period time.Duration) (float64, error)

AnalyzeMetricTrend analyzes the trend of a metric

func (*MetricHistoryManager) GetHistoricalValues

func (m *MetricHistoryManager) GetHistoricalValues(metricID string, start, end time.Time) ([]MetricValue, error)

GetHistoricalValues gets historical values for a metric

func (*MetricHistoryManager) PredictMetricValue

func (m *MetricHistoryManager) PredictMetricValue(metricID string, when time.Time) (float64, error)

PredictMetricValue predicts a future metric value

func (*MetricHistoryManager) Start

func (m *MetricHistoryManager) Start() error

Start starts the metric history manager

func (*MetricHistoryManager) Stop

func (m *MetricHistoryManager) Stop() error

Stop stops the metric history manager

type MetricQuery

type MetricQuery struct {
	// Pattern is the name pattern to match (supports wildcards)
	Pattern string `json:"pattern"`

	// Tags to filter by
	Tags map[string]string `json:"tags"`

	// Start time of the query range
	Start time.Time `json:"start"`

	// End time of the query range
	End time.Time `json:"end"`

	// Aggregation function to apply
	Aggregation string `json:"aggregation,omitempty"`

	// GroupBy defines how to group metrics
	GroupBy []string `json:"group_by,omitempty"`
}

MetricQuery defines parameters for querying metrics

type MetricRegistry

type MetricRegistry struct {
	// contains filtered or unexported fields
}

MetricRegistry manages metrics in memory

func NewMetricRegistry

func NewMetricRegistry() *MetricRegistry

NewMetricRegistry creates a new metric registry

func (*MetricRegistry) Cleanup

func (r *MetricRegistry) Cleanup(maxAge time.Duration)

Cleanup removes metrics older than the specified duration

func (*MetricRegistry) GetMetric

func (r *MetricRegistry) GetMetric(name string) (*MetricSeries, error)

GetMetric returns a merged metric series for the given metric name.

func (*MetricRegistry) GetMetricSeries

func (r *MetricRegistry) GetMetricSeries(name string) (*MetricSeries, error)

GetMetricSeries is a compatibility alias for GetMetric.

func (*MetricRegistry) GetMetrics

func (r *MetricRegistry) GetMetrics(name string) ([]*MetricSeries, error)

GetMetrics returns all metrics for a given name

func (*MetricRegistry) Query

func (r *MetricRegistry) Query(query MetricQuery) ([]*MetricSeries, error)

Query queries metrics from the registry

func (*MetricRegistry) Register

func (r *MetricRegistry) Register(metric *Metric)

Register adds a metric to the registry

func (*MetricRegistry) RegisterMetric

func (r *MetricRegistry) RegisterMetric(metric *Metric)

RegisterMetric is an alias for Register to match the expected interface

type MetricSeries

type MetricSeries struct {
	// Name of the metric series
	Name string `json:"name"`

	// Tags associated with the metric series
	Tags map[string]string `json:"tags"`

	// Metrics in the series
	Metrics []*Metric `json:"metrics"`
}

MetricSeries represents a series of metrics with the same name and tags

func NewMetricSeries

func NewMetricSeries(name string, tags map[string]string) *MetricSeries

NewMetricSeries creates a new metric series

func ParseMetricSeries

func ParseMetricSeries(data []byte) (*MetricSeries, error)

ParseMetricSeries parses a metric series from JSON

func (*MetricSeries) AddMetric

func (s *MetricSeries) AddMetric(metric *Metric)

AddMetric adds a metric to the series

func (*MetricSeries) Covers

func (s *MetricSeries) Covers(start, end time.Time) bool

Covers checks if the series covers the given time range

func (*MetricSeries) GetLastValue

func (s *MetricSeries) GetLastValue() *Metric

GetLastValue returns the most recent metric in the series.

func (*MetricSeries) GetValues

func (s *MetricSeries) GetValues(start, end time.Time) []*Metric

GetValues returns metrics in the requested time range.

func (*MetricSeries) PruneOlderThan

func (s *MetricSeries) PruneOlderThan(cutoff time.Time)

PruneOlderThan removes metrics older than the given time

func (*MetricSeries) Serialize

func (s *MetricSeries) Serialize() ([]byte, error)

Serialize serializes the metric series to JSON

func (*MetricSeries) Slice

func (s *MetricSeries) Slice(start, end time.Time) *MetricSeries

Slice returns a subset of the series within the given time range

type MetricType

type MetricType string

MetricType represents the type of metric

const (
	// MetricTypeGauge represents a gauge metric (a value that can go up and down)
	MetricTypeGauge MetricType = "gauge"

	// MetricTypeCounter represents a counter metric (a value that only increases)
	MetricTypeCounter MetricType = "counter"

	// MetricTypeHistogram represents a histogram metric (distribution of values)
	MetricTypeHistogram MetricType = "histogram"
)

type MetricValue

type MetricValue struct {
	Value     float64
	Timestamp time.Time
	Tags      map[string]string
}

MetricValue represents a metric value with metadata

type MetricsCollector

type MetricsCollector struct {
	// contains filtered or unexported fields
}

MetricsCollector provides application-specific metrics collection

func NewMetricsCollector

func NewMetricsCollector(monitoring *UnifiedMonitoringSystem) (*MetricsCollector, error)

NewMetricsCollector creates a new metrics collector

func (*MetricsCollector) RecordRequest

func (m *MetricsCollector) RecordRequest(ctx context.Context, method, endpoint string, status int, duration time.Duration)

RecordRequest records an HTTP request

func (*MetricsCollector) SetActiveConnections

func (m *MetricsCollector) SetActiveConnections(ctx context.Context, count int64)

SetActiveConnections updates the active connections gauge

type MetricsConfig

type MetricsConfig struct {
	CollectionInterval     time.Duration
	RetentionPeriod        time.Duration
	AnomalyThresholdStd    float64
	SLALatencyTarget       time.Duration
	SLAThroughputTarget    float64
	SLAErrorRateTarget     float64
	EnableDistributedTrace bool
	EnableProfiling        bool
	PrometheusPort         int
	PushgatewayURL         string
}

MetricsConfig configures production metrics collection

func DefaultMetricsConfig

func DefaultMetricsConfig() MetricsConfig

DefaultMetricsConfig returns production-ready configuration

type MetricsSummary

type MetricsSummary struct {
	Latency    LatencyMetrics    `json:"latency"`
	Throughput ThroughputMetrics `json:"throughput"`
	ErrorRate  float64           `json:"error_rate"`
	Timestamp  time.Time         `json:"timestamp"`
}

MetricsSummary represents a snapshot of current metrics

type MockVMManager

type MockVMManager struct {
	// contains filtered or unexported fields
}

MockVMManager implements VMManagerInterface for testing and examples

func NewMockVMManager

func NewMockVMManager(vmIDs []string) *MockVMManager

NewMockVMManager creates a new mock VM manager with the specified VM IDs

func (*MockVMManager) GetVMStats

func (m *MockVMManager) GetVMStats(ctx context.Context, vmID string, detailLevel VMMetricDetailLevel) (*VMStats, error)

GetVMStats retrieves stats for a specific VM

func (*MockVMManager) GetVMs

func (m *MockVMManager) GetVMs(ctx context.Context) ([]string, error)

GetVMs returns the list of VM IDs

type MonitoringConfig

type MonitoringConfig struct {
	// Service identification
	ServiceName string `json:"service_name"`
	Environment string `json:"environment"`
	Version     string `json:"version"`

	// Component configurations
	Dashboard  *dashboard.EngineConfig      `json:"dashboard"`
	Prometheus *prometheus.PrometheusConfig `json:"prometheus"`
	Tracing    *tracing.TracingConfig       `json:"tracing"`
	Anomaly    *ml_anomaly.DetectorConfig   `json:"anomaly"`

	// Integration settings
	EnableDashboards       bool `json:"enable_dashboards"`
	EnablePrometheus       bool `json:"enable_prometheus"`
	EnableTracing          bool `json:"enable_tracing"`
	EnableAnomalyDetection bool `json:"enable_anomaly_detection"`

	// Data retention
	MetricRetention  time.Duration `json:"metric_retention"`
	TraceRetention   time.Duration `json:"trace_retention"`
	AnomalyRetention time.Duration `json:"anomaly_retention"`

	// Performance settings
	MetricBufferSize    int           `json:"metric_buffer_size"`
	BatchProcessingSize int           `json:"batch_processing_size"`
	ProcessingInterval  time.Duration `json:"processing_interval"`

	// Security
	EnableAuthentication bool              `json:"enable_authentication"`
	TLSConfig            *TLSConfiguration `json:"tls_config"`
}

MonitoringConfig represents the configuration for the entire monitoring system

func DefaultMonitoringConfig

func DefaultMonitoringConfig() *MonitoringConfig

DefaultMonitoringConfig returns a default monitoring configuration

type NetworkCollector

type NetworkCollector struct {
	// contains filtered or unexported fields
}

NetworkCollector collects network metrics

type Notification

type Notification struct {
	// Type is the notification type
	Type NotificationType `json:"type"`

	// Title is the notification title
	Title string `json:"title"`

	// Message is the notification message
	Message string `json:"message"`

	// Severity is the notification severity
	Severity string `json:"severity,omitempty"`

	// Timestamp is when the notification was created
	Timestamp time.Time `json:"timestamp"`

	// Details contains additional information
	Details map[string]interface{} `json:"details,omitempty"`
}

Notification represents a notification

func NewNotification

func NewNotification(notificationType NotificationType, title, message, severity string, details map[string]interface{}) *Notification

NewNotification creates a new notification

type NotificationChannel

type NotificationChannel interface {
	// ID returns the channel ID
	ID() string

	// Send sends a notification
	Send(notification *Notification) error

	// IsEnabled returns whether the channel is enabled
	IsEnabled() bool

	// Type returns the channel type
	Type() string
}

NotificationChannel represents a notification channel

type NotificationManager

type NotificationManager struct {
	// contains filtered or unexported fields
}

NotificationManager manages notification channels and sending notifications

func NewNotificationManager

func NewNotificationManager() *NotificationManager

NewNotificationManager creates a new notification manager

func (*NotificationManager) BroadcastNotification

func (m *NotificationManager) BroadcastNotification(notification *Notification) map[string]error

BroadcastNotification sends a notification to all channels

func (*NotificationManager) DeregisterChannel

func (m *NotificationManager) DeregisterChannel(channelID string) bool

DeregisterChannel deregisters a notification channel

func (*NotificationManager) GetChannel

func (m *NotificationManager) GetChannel(channelID string) (NotificationChannel, error)

GetChannel gets a notification channel by ID

func (*NotificationManager) ListChannels

func (m *NotificationManager) ListChannels() []NotificationChannel

ListChannels lists all notification channels

func (*NotificationManager) RegisterChannel

func (m *NotificationManager) RegisterChannel(channel NotificationChannel) error

RegisterChannel registers a notification channel

func (*NotificationManager) SendNotification

func (m *NotificationManager) SendNotification(channelID string, notification *Notification) error

SendNotification sends a notification to a channel

type NotificationType

type NotificationType string

NotificationType represents the type of notification

const (
	// NotificationTypeAlert is a notification for an alert
	NotificationTypeAlert NotificationType = "alert"

	// NotificationTypeSystem is a notification for a system event
	NotificationTypeSystem NotificationType = "system"

	// NotificationTypeInfo is an informational notification
	NotificationTypeInfo NotificationType = "info"
)

type NovaCronMonitoringSystem

type NovaCronMonitoringSystem struct {
	// contains filtered or unexported fields
}

NovaCronMonitoringSystem is the main monitoring system that integrates all components

func NewNovaCronMonitoringSystem

func NewNovaCronMonitoringSystem(config *MonitoringConfig) (*NovaCronMonitoringSystem, error)

NewNovaCronMonitoringSystem creates a new monitoring system

func (*NovaCronMonitoringSystem) CreateDashboard

func (s *NovaCronMonitoringSystem) CreateDashboard(ctx context.Context, dashboard *dashboard.Dashboard) (*dashboard.Dashboard, error)

CreateDashboard creates a new dashboard

func (*NovaCronMonitoringSystem) GetAnomalyDetector

func (s *NovaCronMonitoringSystem) GetAnomalyDetector() *ml_anomaly.AnomalyDetector

GetAnomalyDetector returns the anomaly detector

func (*NovaCronMonitoringSystem) GetDashboardEngine

func (s *NovaCronMonitoringSystem) GetDashboardEngine() *dashboard.DashboardEngine

GetDashboardEngine returns the dashboard engine

func (*NovaCronMonitoringSystem) GetMetrics

func (s *NovaCronMonitoringSystem) GetMetrics() SystemMetrics

GetMetrics returns system metrics

func (*NovaCronMonitoringSystem) GetPrometheusIntegration

func (s *NovaCronMonitoringSystem) GetPrometheusIntegration() *prometheus.PrometheusIntegration

GetPrometheusIntegration returns the prometheus integration

func (*NovaCronMonitoringSystem) GetSystemHealth

func (s *NovaCronMonitoringSystem) GetSystemHealth() SystemHealth

GetSystemHealth returns the health status of the monitoring system

func (*NovaCronMonitoringSystem) GetTracingIntegration

func (s *NovaCronMonitoringSystem) GetTracingIntegration() *tracing.TracingIntegration

GetTracingIntegration returns the tracing integration

func (*NovaCronMonitoringSystem) RecordMetric

func (s *NovaCronMonitoringSystem) RecordMetric(metric *Metric) error

RecordMetric records a metric in the system

func (*NovaCronMonitoringSystem) Start

func (s *NovaCronMonitoringSystem) Start() error

Start starts the monitoring system

func (*NovaCronMonitoringSystem) StartSpan

func (s *NovaCronMonitoringSystem) StartSpan(ctx context.Context, component, operation string) (context.Context, *tracing.NovaCronSpan)

StartSpan starts a new trace span

func (*NovaCronMonitoringSystem) Stop

func (s *NovaCronMonitoringSystem) Stop() error

Stop stops the monitoring system

type PredictiveAnalyticsProcessor

type PredictiveAnalyticsProcessor struct {
	// contains filtered or unexported fields
}

PredictiveAnalyticsProcessor predicts future metric values

func (*PredictiveAnalyticsProcessor) Enabled

func (p *PredictiveAnalyticsProcessor) Enabled() bool

Enabled returns whether the processor is enabled

func (*PredictiveAnalyticsProcessor) ID

ID returns the processor ID

func (*PredictiveAnalyticsProcessor) Process

Process processes metrics and generates an analytics result

func (*PredictiveAnalyticsProcessor) RequiredMetrics

func (p *PredictiveAnalyticsProcessor) RequiredMetrics() []string

RequiredMetrics returns the metric patterns required for processing

func (*PredictiveAnalyticsProcessor) RequiredPreviousResults

func (p *PredictiveAnalyticsProcessor) RequiredPreviousResults() []string

RequiredPreviousResults returns the prior result IDs required for processing

type ProductionMetrics

type ProductionMetrics struct {
	// contains filtered or unexported fields
}

ProductionMetrics provides comprehensive telemetry for DWCP v3 production rollout

func NewProductionMetrics

func NewProductionMetrics(config MetricsConfig) *ProductionMetrics

NewProductionMetrics creates a new production metrics collector

func (*ProductionMetrics) GetMetricsSummary

func (pm *ProductionMetrics) GetMetricsSummary() MetricsSummary

GetMetricsSummary returns a summary of current metrics

func (*ProductionMetrics) RecordError

func (pm *ProductionMetrics) RecordError(component, errorType, severity string)

RecordError records an error occurrence

func (*ProductionMetrics) RecordMigrationLatency

func (pm *ProductionMetrics) RecordMigrationLatency(duration time.Duration, source, destination, vmSize, mode string)

RecordMigrationLatency records VM migration latency

func (*ProductionMetrics) RecordRolloutProgress

func (pm *ProductionMetrics) RecordRolloutProgress(percentage float64, stage, region string)

RecordRolloutProgress updates rollout progress

func (*ProductionMetrics) RecordThroughput

func (pm *ProductionMetrics) RecordThroughput(bytesPerSecond float64, component, direction, transport string)

RecordThroughput records data transfer throughput

func (*ProductionMetrics) Start

func (pm *ProductionMetrics) Start(ctx context.Context) error

Start begins the metrics collection loop

func (*ProductionMetrics) Stop

func (pm *ProductionMetrics) Stop()

Stop gracefully stops metrics collection

func (*ProductionMetrics) TraceOperation

func (pm *ProductionMetrics) TraceOperation(ctx context.Context, operationName string) (context.Context, trace.Span)

TraceOperation creates a traced operation context

type SimpleAlertManager

type SimpleAlertManager struct{}

SimpleAlertManager implements a basic alert manager for anomaly detection

func (*SimpleAlertManager) SendAnomalyAlert

func (am *SimpleAlertManager) SendAnomalyAlert(ctx context.Context, anomaly *ml_anomaly.Anomaly) error

func (*SimpleAlertManager) SendPredictiveAlert

func (am *SimpleAlertManager) SendPredictiveAlert(ctx context.Context, prediction *ml_anomaly.Prediction) error

func (*SimpleAlertManager) ShouldAlert

func (am *SimpleAlertManager) ShouldAlert(anomaly *ml_anomaly.Anomaly) bool

type StorageCollector

type StorageCollector struct {
	// contains filtered or unexported fields
}

StorageCollector collects storage metrics

type SystemCollector

type SystemCollector struct {
	// contains filtered or unexported fields
}

SystemCollector collects system metrics

func NewSystemCollector

func NewSystemCollector(registry *MetricRegistry, interval time.Duration) *SystemCollector

NewSystemCollector creates a new system collector

func (*SystemCollector) Collect

func (c *SystemCollector) Collect() ([]*MetricBatch, error)

Collect collects metrics

func (*SystemCollector) GetMetrics

func (c *SystemCollector) GetMetrics() []*Metric

GetMetrics gets the metrics this collector provides

func (*SystemCollector) SetCollectInterval

func (c *SystemCollector) SetCollectInterval(interval time.Duration)

SetCollectInterval sets the collection interval

func (*SystemCollector) Start

func (c *SystemCollector) Start() error

Start starts the collector

func (*SystemCollector) Stop

func (c *SystemCollector) Stop() error

Stop stops the collector

type SystemHealth

type SystemHealth struct {
	Timestamp  time.Time                  `json:"timestamp"`
	Status     string                     `json:"status"` // "healthy", "degraded", "unhealthy"
	Components map[string]ComponentHealth `json:"components"`
}

Helper types

type SystemMetrics

type SystemMetrics struct {
	Timestamp         time.Time `json:"timestamp"`
	DashboardCount    int       `json:"dashboard_count"`
	ActiveUsers       int       `json:"active_users"`
	AnomaliesDetected int       `json:"anomalies_detected"`
	TracesCollected   int64     `json:"traces_collected"`
	MetricsIngested   int64     `json:"metrics_ingested"`
}

type SystemUtilizationProcessor

type SystemUtilizationProcessor struct {
	// contains filtered or unexported fields
}

SystemUtilizationProcessor processes system utilization metrics

func (*SystemUtilizationProcessor) Enabled

func (p *SystemUtilizationProcessor) Enabled() bool

Enabled returns whether the processor is enabled

func (*SystemUtilizationProcessor) ID

ID returns the processor ID

func (*SystemUtilizationProcessor) Process

Process processes metrics and generates an analytics result

func (*SystemUtilizationProcessor) RequiredMetrics

func (p *SystemUtilizationProcessor) RequiredMetrics() []string

RequiredMetrics returns the metric patterns required for processing

func (*SystemUtilizationProcessor) RequiredPreviousResults

func (p *SystemUtilizationProcessor) RequiredPreviousResults() []string

RequiredPreviousResults returns the prior result IDs required for processing

type TLSConfiguration

type TLSConfiguration struct {
	CertFile string `json:"cert_file"`
	KeyFile  string `json:"key_file"`
	CAFile   string `json:"ca_file"`
}

TLSConfiguration represents TLS configuration

type ThroughputMetrics

type ThroughputMetrics struct {
	Current float64 `json:"current_gbps"`
	Average float64 `json:"average_gbps"`
	Max     float64 `json:"max_gbps"`
}

ThroughputMetrics contains throughput statistics

type TimeRange

type TimeRange struct {
	Start time.Time
	End   time.Time
}

TimeRange represents a time range

type UnifiedMonitoringSystem

type UnifiedMonitoringSystem struct {
	// contains filtered or unexported fields
}

UnifiedMonitoringSystem provides comprehensive observability

func NewUnifiedMonitoringSystem

func NewUnifiedMonitoringSystem() (*UnifiedMonitoringSystem, error)

NewUnifiedMonitoringSystem creates a new monitoring system

func (*UnifiedMonitoringSystem) GetMeter

func (u *UnifiedMonitoringSystem) GetMeter(name string) metric.Meter

GetMeter returns a meter for creating instruments

func (*UnifiedMonitoringSystem) Start

func (u *UnifiedMonitoringSystem) Start(addr string) error

Start begins the monitoring system

func (*UnifiedMonitoringSystem) Stop

func (u *UnifiedMonitoringSystem) Stop() error

Stop shuts down the monitoring system

type VMCPUStats

type VMCPUStats struct {
	// Usage percentage (0-100)
	Usage float64

	// Usage per core if available
	CoreUsage []float64

	// Number of vCPUs
	NumCPUs int

	// Time spent in steal (hypervisor overhead)
	StealTime float64

	// Ready time (time VM was ready but couldn't get CPU time)
	ReadyTime float64

	// System time percentage
	SystemTime float64

	// User time percentage
	UserTime float64

	// IO wait time percentage
	IOWaitTime float64
}

VMCPUStats contains CPU metrics for a VM

type VMDiskStats

type VMDiskStats struct {
	// Disk identifier
	DiskID string

	// Path or name
	Path string

	// Total size in bytes
	Size int64

	// Used space in bytes
	Used int64

	// Used percentage (0-100)
	UsagePercent float64

	// Read operations per second
	ReadIOPS float64

	// Write operations per second
	WriteIOPS float64

	// Read throughput in bytes per second
	ReadThroughput float64

	// Write throughput in bytes per second
	WriteThroughput float64

	// Average read latency in milliseconds
	ReadLatency float64

	// Average write latency in milliseconds
	WriteLatency float64

	// Disk type (e.g., system, data)
	Type string
}

VMDiskStats contains disk metrics for a VM disk

type VMManagerInterface

type VMManagerInterface interface {
	// GetVMs returns a list of all VM IDs
	GetVMs(ctx context.Context) ([]string, error)

	// GetVMStats retrieves stats for a specific VM
	GetVMStats(ctx context.Context, vmID string, detailLevel VMMetricDetailLevel) (*VMStats, error)
}

VMManagerInterface defines the interface for VM management

type VMMemoryStats

type VMMemoryStats struct {
	// Total memory allocated to the VM in bytes
	Total int64

	// Used memory in bytes
	Used int64

	// Used percentage (0-100)
	UsagePercent float64

	// Free memory in bytes
	Free int64

	// Swap usage in bytes
	SwapUsed int64

	// Swap total in bytes
	SwapTotal int64

	// Page faults per second
	PageFaults float64

	// Major page faults per second
	MajorPageFaults float64

	// Ballooning target if using dynamic memory
	BalloonTarget int64

	// Current balloon size if using dynamic memory
	BalloonCurrent int64
}

VMMemoryStats contains memory metrics for a VM

type VMMetricDetailLevel

type VMMetricDetailLevel int

VMMetricDetailLevel represents the level of detail for VM metrics

const (
	// BasicMetrics collects only essential metrics
	BasicMetrics VMMetricDetailLevel = iota

	// StandardMetrics collects normal operational metrics
	StandardMetrics

	// DetailedMetrics collects comprehensive metrics including per-process stats
	DetailedMetrics

	// DiagnosticMetrics collects all available metrics for troubleshooting
	DiagnosticMetrics
)

type VMMetricTypes

type VMMetricTypes struct {
	CPU              bool
	Memory           bool
	Disk             bool
	Network          bool
	IOPs             bool
	ProcessStats     bool
	ApplicationStats bool
	GuestMetrics     bool
}

VMMetricTypes represents a set of VM metric collection options

type VMNetworkStats

type VMNetworkStats struct {
	// Interface identifier
	InterfaceID string

	// Interface name
	Name string

	// Bytes received per second
	RxBytes float64

	// Bytes transmitted per second
	TxBytes float64

	// Packets received per second
	RxPackets float64

	// Packets transmitted per second
	TxPackets float64

	// Dropped packets received
	RxDropped float64

	// Dropped packets transmitted
	TxDropped float64

	// Error packets received
	RxErrors float64

	// Error packets transmitted
	TxErrors float64
}

VMNetworkStats contains network metrics for a VM interface

type VMProcessStats

type VMProcessStats struct {
	// Process ID
	PID int64

	// Process name
	Name string

	// Process command line
	Command string

	// CPU usage percentage
	CPUUsage float64

	// Memory usage percentage
	MemoryPercent float64

	// Memory usage in bytes
	MemoryUsage int64

	// Read operations per second
	ReadIOPS float64

	// Write operations per second
	WriteIOPS float64

	// Read throughput in bytes per second
	ReadThroughput float64

	// Write throughput in bytes per second
	WriteThroughput float64

	// Open file descriptors
	OpenFiles int64

	// Running time in seconds
	RunTime float64
}

VMProcessStats contains metrics for a process running in a VM

type VMState

type VMState int

VMState represents the current state of a VM

const (
	// VMStateUnknown indicates the VM state is unknown
	VMStateUnknown VMState = iota

	// VMStateRunning indicates the VM is running
	VMStateRunning

	// VMStateStopped indicates the VM is stopped
	VMStateStopped

	// VMStateStopping indicates the VM is in the process of stopping
	VMStateStopping

	// VMStateStarting indicates the VM is in the process of starting
	VMStateStarting

	// VMStateTerminated indicates the VM is terminated or deleted
	VMStateTerminated

	// VMStatePaused indicates the VM is paused
	VMStatePaused

	// VMStateSuspended indicates the VM is suspended
	VMStateSuspended
)

func (VMState) String

func (s VMState) String() string

String returns a string representation of the VM state

type VMStats

type VMStats struct {
	// VMID is the unique identifier for the VM
	VMID string

	// CPU statistics
	CPU VMCPUStats

	// Memory statistics
	Memory VMMemoryStats

	// Disk statistics
	Disks []VMDiskStats

	// Network statistics
	Networks []VMNetworkStats

	// Process statistics if available
	Processes []VMProcessStats

	// Timestamp when the stats were collected
	Timestamp time.Time
}

VMStats contains all stats for a VM

type VMTelemetryCollector

type VMTelemetryCollector struct {
	// contains filtered or unexported fields
}

VMTelemetryCollector collects detailed metrics from VMs

func NewVMTelemetryCollector

func NewVMTelemetryCollector(config *VMTelemetryCollectorConfig, collector *DistributedMetricCollector) *VMTelemetryCollector

NewVMTelemetryCollector creates a new VM telemetry collector

func (*VMTelemetryCollector) Collect

func (c *VMTelemetryCollector) Collect(ctx context.Context) ([]*Metric, error)

Collect collects metrics from all VMs

func (*VMTelemetryCollector) Enabled

func (c *VMTelemetryCollector) Enabled() bool

Enabled returns whether the collector is enabled

func (*VMTelemetryCollector) ID

func (c *VMTelemetryCollector) ID() string

ID returns the collector ID

func (*VMTelemetryCollector) Start

func (c *VMTelemetryCollector) Start() error

Start begins metric collection

func (*VMTelemetryCollector) Stop

func (c *VMTelemetryCollector) Stop() error

Stop halts metric collection

type VMTelemetryCollectorConfig

type VMTelemetryCollectorConfig struct {
	// CollectionInterval is how often metrics are collected
	CollectionInterval time.Duration

	// VMManager is used to access VM-specific APIs
	VMManager VMManagerInterface

	// EnabledMetrics configures which metrics to collect
	EnabledMetrics VMMetricTypes

	// Tags are default tags to apply to all metrics
	Tags map[string]string

	// NodeID is the unique identifier for this node
	NodeID string

	// DetailLevel controls the granularity of metrics
	DetailLevel VMMetricDetailLevel
}

VMTelemetryCollectorConfig contains configuration for VM telemetry collection

func DefaultVMTelemetryCollectorConfig

func DefaultVMTelemetryCollectorConfig() *VMTelemetryCollectorConfig

DefaultVMTelemetryCollectorConfig returns a default configuration

type VirtualMachineCollector

type VirtualMachineCollector struct {
	// contains filtered or unexported fields
}

VirtualMachineCollector collects VM metrics

func NewVirtualMachineCollector

func NewVirtualMachineCollector(registry *MetricRegistry, interval time.Duration, vmManager interface{}) *VirtualMachineCollector

NewVirtualMachineCollector creates a new VM collector

func (*VirtualMachineCollector) Collect

func (c *VirtualMachineCollector) Collect() ([]*MetricBatch, error)

Collect collects VM metrics

func (*VirtualMachineCollector) GetMetrics

func (c *VirtualMachineCollector) GetMetrics() []*Metric

GetMetrics gets the metrics this collector provides

func (*VirtualMachineCollector) SetCollectInterval

func (c *VirtualMachineCollector) SetCollectInterval(interval time.Duration)

SetCollectInterval sets the collection interval

func (*VirtualMachineCollector) Start

func (c *VirtualMachineCollector) Start() error

Start starts the collector

func (*VirtualMachineCollector) Stop

func (c *VirtualMachineCollector) Stop() error

Stop stops the collector

type WebhookChannel

type WebhookChannel struct {
	// contains filtered or unexported fields
}

WebhookChannel is a notification channel that sends webhooks

func NewWebhookChannel

func NewWebhookChannel(id, url string, headers map[string]string, authentication, authenticationToken string) *WebhookChannel

NewWebhookChannel creates a new webhook channel

func (*WebhookChannel) ID

func (c *WebhookChannel) ID() string

ID returns the channel ID

func (*WebhookChannel) IsEnabled

func (c *WebhookChannel) IsEnabled() bool

IsEnabled returns whether the channel is enabled

func (*WebhookChannel) Send

func (c *WebhookChannel) Send(notification *Notification) error

Send sends a notification via webhook

func (*WebhookChannel) Type

func (c *WebhookChannel) Type() string

Type returns the channel type

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL