failover

package
v0.0.0-...-0e8ba2f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2026 License: Apache-2.0 Imports: 28 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// PipelineReuseFailoverReservation is used to check if a VM can reuse an existing reservation.
	// It validates host compatibility without checking capacity (since reservation already has capacity).
	PipelineReuseFailoverReservation = "kvm-valid-host-reuse-failover-reservation"

	// PipelineNewFailoverReservation is used to find a host for creating a new reservation.
	// It validates host compatibility AND checks capacity.
	PipelineNewFailoverReservation = "kvm-new-failover-reservation"

	// PipelineAcknowledgeFailoverReservation is used to validate that a failover reservation
	// is still valid for all its allocated VMs. It sends an evacuation-style scheduling request
	// for each VM with only the reservation's host as the eligible target.
	PipelineAcknowledgeFailoverReservation = "kvm-acknowledge-failover-reservation"
)

Pipeline names for failover reservation scheduling

Variables

This section is empty.

Functions

func CheckVMsStillEligible

func CheckVMsStillEligible(
	vms map[string]VM,
	failoverReservations []v1alpha1.Reservation,
) map[string][]string

CheckVMsStillEligible checks if VMs in reservations are still eligible. Returns a map of reservation name -> list of VM UUIDs that are no longer eligible.

func FindEligibleReservations

func FindEligibleReservations(
	vm VM,
	failoverReservations []v1alpha1.Reservation,
) []v1alpha1.Reservation

FindEligibleReservations finds all reservations that a VM is eligible to use.

func IsVMEligibleForReservation

func IsVMEligibleForReservation(vm VM, reservation v1alpha1.Reservation, allFailoverReservations []v1alpha1.Reservation) bool

IsVMEligibleForReservation checks if a VM is eligible to use a specific reservation. A VM is eligible if it satisfies all the following constraints: (1) A VM cannot reserve a slot on its own hypervisor. (2) A VM's N reservation slots must be placed on N distinct hypervisors. (3) For any reservation r, no two VMs that use r may be on the same hypervisor. (4) For VM v with slots R, any other VM that uses any slot must not run on v's host or slot hosts. (5) For VM v with slots R, no two other VMs using v's slots can be on the same hypervisor.

func LoggerFromContext

func LoggerFromContext(ctx context.Context) logr.Logger

LoggerFromContext returns a logger with greq and req values from the context. This creates a child logger with the request tracking values pre-attached, so you don't need to repeat them in every log call.

func ValidateFailoverReservationResources

func ValidateFailoverReservationResources(res *v1alpha1.Reservation) error

ValidateFailoverReservationResources validates that a failover reservation has valid resource keys. Returns an error if the reservation has invalid resource keys (only "cpu" and "memory" are allowed). This ensures reservations are properly considered by the scheduling filters.

func WithNewGlobalRequestID

func WithNewGlobalRequestID(ctx context.Context) context.Context

WithNewGlobalRequestID creates a new context with a failover-prefixed global request ID.

Types

type DBVMSource

type DBVMSource struct {
	NovaReader external.NovaReaderInterface
}

DBVMSource implements VMSource by reading directly from the database. This is the preferred implementation as it avoids the size limitations of Knowledge CRDs.

func NewDBVMSource

func NewDBVMSource(novaReader external.NovaReaderInterface) *DBVMSource

NewDBVMSource creates a new DBVMSource.

func (*DBVMSource) GetVM

func (s *DBVMSource) GetVM(ctx context.Context, vmUUID string) (*VM, error)

GetVM returns a specific VM by UUID. Returns nil, nil if the VM is not found (not an error, just doesn't exist).

func (*DBVMSource) ListVMs

func (s *DBVMSource) ListVMs(ctx context.Context) ([]VM, error)

ListVMs returns all VMs by joining server and flavor data from the database.

func (*DBVMSource) ListVMsOnHypervisors

func (s *DBVMSource) ListVMsOnHypervisors(
	ctx context.Context,
	hypervisorList *hv1.HypervisorList,
	trustHypervisorLocation bool,
) ([]VM, error)

ListVMsOnHypervisors returns VMs that are on the given hypervisors. If trustHypervisorLocation is true, uses hypervisor CRD as source of truth for VM location. If trustHypervisorLocation is false, uses postgres as source of truth but filters to VMs on known hypervisors. Also logs warnings about data sync issues between postgres and hypervisor CRD.

type DependencyGraph

type DependencyGraph struct {
	// contains filtered or unexported fields
}

DependencyGraph encapsulates the data structures needed for eligibility checking. It tracks relationships between VMs, reservations, and hypervisors.

type FailoverConfig

type FailoverConfig struct {
	// FlavorFailoverRequirements maps flavor name patterns to required failover count.
	// Example: {"hana_*": 2, "m1.xlarge": 1}
	// A VM with a matching flavor will need this many failover reservations.
	FlavorFailoverRequirements map[string]int `json:"flavorFailoverRequirements"`

	// ReconcileInterval is how often to check for missing failover reservations.
	// Supports Go duration strings like "30s", "1m", "15m".
	ReconcileInterval metav1.Duration `json:"reconcileInterval"`

	// Creator tag for failover reservations (for identification and cleanup).
	Creator string `json:"creator"`

	// DatasourceName is the name of the Datasource CRD that provides database connection info.
	// This is used to read VM data from the Nova database.
	DatasourceName string `json:"datasourceName"`

	// SchedulerURL is the URL of the nova external scheduler API.
	// Example: "http://localhost:8080/scheduler/nova/external"
	SchedulerURL string `json:"schedulerURL"`

	// MaxVMsToProcess limits the number of VMs to process per reconciliation cycle.
	// Set to negative to process all VMs (default behavior).
	// Useful for debugging and testing with large VM counts.
	MaxVMsToProcess int `json:"maxVMsToProcess"`

	// ShortReconcileInterval is used when MaxVMsToProcess limits processing.
	// This allows faster catch-up when there are more VMs to process.
	// Set to 0 to use ReconcileInterval (default behavior).
	// Supports Go duration strings like "100ms", "1s", "1m".
	ShortReconcileInterval metav1.Duration `json:"shortReconcileInterval"`

	// MinSuccessForShortInterval is the minimum number of successful reservations (created + reused)
	// required to use ShortReconcileInterval. Default: 1. Use 0 to require no minimum.
	MinSuccessForShortInterval *int `json:"minSuccessForShortInterval"`

	// MaxFailuresForShortInterval is the maximum number of failures allowed to still use
	// ShortReconcileInterval. Default: 99. Use 0 to allow no failures.
	MaxFailuresForShortInterval *int `json:"maxFailuresForShortInterval"`

	// TrustHypervisorLocation when true, uses the hypervisor CRD as the source of truth
	// for VM location instead of postgres (OSEXTSRVATTRHost). This is useful when there
	// are data sync issues between nova and the hypervisor operator.
	// When enabled:
	// - VM location comes from hypervisor CRD (which hypervisor lists the VM in its instances)
	// - VM size/flavor still comes from postgres (needed for scheduling)
	// Default: false (use postgres OSEXTSRVATTRHost for location)
	TrustHypervisorLocation bool `json:"trustHypervisorLocation"`

	// RevalidationInterval is how often to re-validate acknowledged failover reservations.
	// After a reservation is acknowledged, it will be re-validated after this interval
	// to ensure the reservation host is still valid for all allocated VMs.
	// Default: 30 minutes
	// Supports Go duration strings like "15m", "30m", "1h".
	RevalidationInterval metav1.Duration `json:"revalidationInterval"`

	// LimitOneNewReservationPerHypervisor when true, prevents creating multiple new
	// reservations on the same hypervisor within a single reconcile cycle.
	// This helps spread reservations across hypervisors.
	// Default: true
	LimitOneNewReservationPerHypervisor bool `json:"limitOneNewReservationPerHypervisor"`

	// VMSelectionRotationInterval controls how often the VM selection offset rotates
	// when MaxVMsToProcess limits processing. Every N reconcile cycles, the offset
	// rotates to process different VMs. This ensures all VMs eventually get processed.
	// Default: 4 (rotate every 4th reconcile cycle). Use 0 to disable rotation.
	VMSelectionRotationInterval *int `json:"vmSelectionRotationInterval"`
}

FailoverConfig defines the configuration for failover reservation management.

func DefaultConfig

func DefaultConfig() FailoverConfig

DefaultConfig returns a default configuration.

func (*FailoverConfig) ApplyDefaults

func (c *FailoverConfig) ApplyDefaults()

ApplyDefaults fills in any unset values with defaults.

type FailoverReservationController

type FailoverReservationController struct {
	client.Client
	VMSource        VMSource
	Config          FailoverConfig
	SchedulerClient *reservations.SchedulerClient
	Recorder        events.EventRecorder // Event recorder for emitting Kubernetes events
	// contains filtered or unexported fields
}

FailoverReservationController manages failover reservations for VMs. It provides two reconciliation modes: 1. Periodic bulk reconciliation (ReconcilePeriodic) - processes all VMs to ensure proper failover coverage 2. Watch-based per-reservation reconciliation (Reconcile) - handles acknowledgment and validation of individual reservations

func NewFailoverReservationController

func NewFailoverReservationController(c client.Client, vmSource VMSource, config FailoverConfig, schedulerClient *reservations.SchedulerClient) *FailoverReservationController

func (*FailoverReservationController) Reconcile

Reconcile handles watch-based reconciliation for a single failover reservation. It validates the reservation and acknowledges it if valid, or deletes it if invalid. After processing, it requeues for periodic re-validation.

func (*FailoverReservationController) ReconcilePeriodic

func (c *FailoverReservationController) ReconcilePeriodic(ctx context.Context) (ctrl.Result, error)

ReconcilePeriodic handles the periodic bulk reconciliation of all VMs and reservations. This ensures VMs have proper failover coverage by creating, reusing, and cleaning up reservations. TODO consider moving Step 3-5 (particularly) to the watch-based reconciliation

func (*FailoverReservationController) SetupWithManager

func (c *FailoverReservationController) SetupWithManager(mgr ctrl.Manager, mcl *multicluster.Client) error

SetupWithManager sets up the watch-based reconciler with the Manager. This handles per-reservation reconciliation triggered by CRD changes.

func (*FailoverReservationController) Start

Start implements manager.Runnable. It runs the periodic reconciliation loop at the configured interval. This can be called directly when the controller is created after the manager starts.

type VM

type VM struct {
	// UUID is the unique identifier of the VM.
	UUID string
	// FlavorName is the name of the flavor used by the VM.
	FlavorName string
	// ProjectID is the OpenStack project ID that owns the VM.
	ProjectID string
	// CurrentHypervisor is the hypervisor where the VM is currently running.
	CurrentHypervisor string
	// AvailabilityZone is the availability zone where the VM is located.
	// This is used to ensure failover reservations are created in the same AZ.
	AvailabilityZone string
	// Resources contains the VM's resource allocations (e.g., "memory", "vcpus").
	Resources map[string]resource.Quantity
	// FlavorExtraSpecs contains the flavor's extra specifications (e.g., traits, capabilities).
	// This is used by filters like filter_has_requested_traits and filter_capabilities.
	FlavorExtraSpecs map[string]string
}

VM represents a virtual machine that may need failover reservations.

type VMSource

type VMSource interface {
	// ListVMs returns all VMs that might need failover reservations.
	ListVMs(ctx context.Context) ([]VM, error)
	// ListVMsOnHypervisors returns VMs that are on the given hypervisors.
	// If trustHypervisorLocation is true, uses hypervisor CRD as source of truth for VM location.
	// If trustHypervisorLocation is false, uses postgres as source of truth but filters to VMs on known hypervisors.
	// Also logs warnings about data sync issues between postgres and hypervisor CRD.
	ListVMsOnHypervisors(ctx context.Context, hypervisorList *hv1.HypervisorList, trustHypervisorLocation bool) ([]VM, error)
	// GetVM returns a specific VM by UUID.
	// Returns nil, nil if the VM is not found (not an error, just doesn't exist).
	GetVM(ctx context.Context, vmUUID string) (*VM, error)
}

VMSource provides VMs that may need failover reservations. This interface allows swapping the implementation when a VM CRD arrives.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL