podautoscaler

package

v0.5.0-rc.2 Latest Latest Go to latest Published: Nov 1, 2025 License: Apache-2.0 Imports: 47 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/vllm-project/aibrix

Links

Open Source Insights

Documentation ¶

Overview ¶

Package podautoscaler provides controllers for managing PodAutoscaler resources. The controller supports three scaling strategies: - HPA: Creates and manages Kubernetes HorizontalPodAutoscaler resources (KEDA-like wrapper) - KPA: Knative-style Pod Autoscaling with panic/stable windows - APA: Application-specific Pod Autoscaling with custom metrics

Architecture: - Stateless autoscaler management: AutoScalers are created on-demand for each reconciliation - HPA wrapper: For HPA strategy, we create and manage actual K8s HPA resources - Custom scaling: For KPA/APA, we directly compute and apply scaling decisions

Index ¶

Constants
Variables
func Add(mgr manager.Manager, runtimeConfig config.RuntimeConfig) error
func GetReadyPodsCount(ctx context.Context, client client.Client, namespace string, ...) (int, error)
type AutoScaler
type DefaultAutoScaler
- func NewDefaultAutoScaler(factory metrics.MetricFetcherFactory, client client.Client) *DefaultAutoScaler
- func (a *DefaultAutoScaler) ComputeDesiredReplicas(ctx context.Context, request ReplicaComputeRequest) (*ReplicaComputeResult, error)
type PodAutoscalerReconciler
- func (r *PodAutoscalerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
- func (r *PodAutoscalerReconciler) Run(ctx context.Context, errChan chan<- error)
type ReplicaComputeRequest
type ReplicaComputeResult
type ScaleDecision
type ScalingTargetKey
type ValidationResult
type WorkloadScale
- func NewWorkloadScale(client client.Client, restMapper meta.RESTMapper) WorkloadScale

Constants ¶

View Source

const (
	RayClusterFleet = "RayClusterFleet"
	StormService    = "StormService"
)

View Source

const (
	ConditionReady         = "Ready"
	ConditionValidSpec     = "ValidSpec"
	ConditionConflict      = "MultiPodAutoscalerConflict"
	ConditionScalingActive = "ScalingActive"
	ConditionAbleToScale   = "AbleToScale"

	ReasonAsExpected             = "AsExpected"
	ReasonReconcilingScaleDiff   = "ReconcilingScaleDiff"
	ReasonStable                 = "Stable"
	ReasonInvalidScalingStrategy = "InvalidScalingStrategy"
	ReasonInvalidBounds          = "InvalidBounds"
	ReasonMissingTargetRef       = "MissingScaleTargetRef"
	ReasonMetricsConfigError     = "MetricsConfigError"
	ReasonInvalidSpec            = "InvalidSpec"
	ReasonConfigured             = "Configured"
)

View Source

const AutoscalingStormServiceModeAnnotationKey = "autoscaling.aibrix.ai/storm-service-mode"

Variables ¶

View Source

var (
	DefaultResyncInterval           = 10 * time.Second
	DefaultRequeueDuration          = 10 * time.Second
	DefaultReconcileTimeoutDuration = 10 * time.Second
)

Functions ¶

func Add ¶

func Add(mgr manager.Manager, runtimeConfig config.RuntimeConfig) error

Add creates a new PodAutoscaler Controller and adds it to the Manager with default RBAC. The Manager will set fields on the Controller and Start it when the Manager is Started.

func GetReadyPodsCount ¶

func GetReadyPodsCount(ctx context.Context, client client.Client, namespace string, selector labels.Selector) (int, error)

GetReadyPodsCount counts the number of ready pods matching the given selector

Types ¶

type AutoScaler ¶

type AutoScaler interface {
	// ComputeDesiredReplicas performs metric-based scaling calculation.
	// This is the primary method for scaling decisions and returns only the recommendation.
	// It does NOT perform any actual scaling operations.
	// All per-PA configuration is extracted from the PodAutoscaler spec on each call.
	ComputeDesiredReplicas(ctx context.Context, request ReplicaComputeRequest) (*ReplicaComputeResult, error)
}

AutoScaler provides scaling decision capabilities based on metrics and algorithms. This interface focuses purely on scaling logic without actual resource manipulation. All implementations are stateless and thread-safe, supporting concurrent reconciliation.

type DefaultAutoScaler ¶

type DefaultAutoScaler struct {
	// contains filtered or unexported fields
}

DefaultAutoScaler implements the complete scaling pipeline All components are stateless or thread-safe, allowing concurrent reconciliation

func NewDefaultAutoScaler ¶

func NewDefaultAutoScaler(
	factory metrics.MetricFetcherFactory,
	client client.Client,
) *DefaultAutoScaler

NewDefaultAutoScaler creates a new default autoscaler

func (*DefaultAutoScaler) ComputeDesiredReplicas ¶

func (a *DefaultAutoScaler) ComputeDesiredReplicas(ctx context.Context, request ReplicaComputeRequest) (*ReplicaComputeResult, error)

ComputeDesiredReplicas computes desired replicas based on all metrics in MetricsSources. It returns the maximum recommended replicas across all valid metrics.

type PodAutoscalerReconciler ¶

type PodAutoscalerReconciler struct {
	client.Client
	Scheme        *runtime.Scheme
	EventRecorder record.EventRecorder
	Mapper        apimeta.RESTMapper

	RuntimeConfig config.RuntimeConfig
	// contains filtered or unexported fields
}

PodAutoscalerReconciler reconciles a PodAutoscaler object. It uses stateless autoscaler management where AutoScalers are created on-demand for each reconciliation cycle, avoiding memory leaks and stale state issues.

func (*PodAutoscalerReconciler) Reconcile ¶

func (r *PodAutoscalerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)

func (*PodAutoscalerReconciler) Run ¶

func (r *PodAutoscalerReconciler) Run(ctx context.Context, errChan chan<- error)

type ReplicaComputeRequest ¶

type ReplicaComputeRequest struct {
	PodAutoscaler   autoscalingv1alpha1.PodAutoscaler
	ScalingContext  scalingctx.ScalingContext // Single source of truth for PA-level configuration
	CurrentReplicas int32
	Pods            []corev1.Pod
	Timestamp       time.Time
}

ReplicaComputeRequest represents a request for replica calculation. This type is used both for the public interface and internal pipeline processing.

type ReplicaComputeResult ¶

type ReplicaComputeResult struct {
	DesiredReplicas int32
	Algorithm       string
	Reason          string
	Valid           bool
}

ReplicaComputeResult represents the result of replica calculation

type ScaleDecision ¶

type ScaleDecision struct {
	DesiredReplicas int32
	ShouldScale     bool
	Reason          string
	Algorithm       string
}

ScaleDecision represents a scaling decision made by the autoscaler

type ScalingTargetKey ¶

type ScalingTargetKey struct {
	Namespace     string
	APIVersion    string
	Kind          string
	Name          string
	SubTargetRole string // from SubTargetSelector.RoleName
}

type ValidationResult ¶

type ValidationResult struct {
	Valid   bool
	Reason  string
	Message string
}

type WorkloadScale ¶

type WorkloadScale interface {
	// Validate checks if the target is valid and scalable
	Validate(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler) error

	// GetCurrentReplicasFromScale extracts the current replica count from a scale object
	GetCurrentReplicasFromScale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, scale *unstructured.Unstructured) (int32, error)

	// SetDesiredReplicas updates the replica count
	SetDesiredReplicas(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, replicas int32) error

	// GetPodSelectorFromScale extracts the label selector from an existing scale object
	// For role-level scaling, it adds the role label requirement
	// This avoids re-fetching the scale object when the controller already has it
	GetPodSelectorFromScale(ctx context.Context, pa *autoscalingv1alpha1.PodAutoscaler, scale *unstructured.Unstructured) (labels.Selector, error)
}

WorkloadScale provides scaling operations for different workload types. It provides the mechanism to get/set replica counts on workload resources, while AutoScaler provides the intelligence to compute desired replica counts. The interface is stateless - all methods take PodAutoscaler as a parameter.

func NewWorkloadScale ¶

func NewWorkloadScale(
	client client.Client,
	restMapper meta.RESTMapper,
) WorkloadScale

NewWorkloadScale creates a stateless WorkloadScale implementation

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
aggregation
algorithm Package algorithm provides scaling algorithms for different autoscaling strategies.	Package algorithm provides scaling algorithms for different autoscaling strategies.
context
metrics
monitor
types

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL