Documentation
¶
Overview ¶
Package upgrade implements the cluster upgrade orchestrator: plan generation, the control-plane / addon / nodegroup phases, and the sequencing engine that runs a plan with health gates between phases.
EKS upgrades one minor version at a time, so a multi-minor upgrade expands into sequential "hops" (1.31 → 1.32 → 1.33). Each hop runs control plane → addons (dependency order, versions compatible with the hop target) → nodegroup rolls, with a gate after every phase.
The orchestrator is resumable by re-derivation rather than state files: BuildPlan inspects actual cluster state and marks already-satisfied steps completed, so rerunning `refresh cluster upgrade --to X` after a failure (or after success) only executes what remains.
Index ¶
- Variables
- type ConfirmFunc
- type EKSAPI
- type ExecuteOptions
- type Hop
- type NodegroupGate
- type NodegroupRollOptions
- type Plan
- type PlanOptions
- type ProgressFunc
- type Report
- type Service
- func (s *Service) BuildPlan(ctx context.Context, clusterName, targetVersion string, opts PlanOptions) (*Plan, error)
- func (s *Service) Execute(ctx context.Context, plan *Plan, opts ExecuteOptions) (*Report, error)
- func (s *Service) UpgradeAddons(ctx context.Context, clusterName, targetVersion string, skip []string, ...) error
- func (s *Service) UpgradeControlPlane(ctx context.Context, clusterName, targetVersion string, progress ProgressFunc) error
- func (s *Service) UpgradeNodegroups(ctx context.Context, clusterName, targetVersion string, ...) error
- type Step
- type StepStatus
- type StepType
Constants ¶
This section is empty.
Variables ¶
var ErrAborted = errors.New("upgrade aborted by user")
ErrAborted is returned by Execute when the user declines a phase confirmation. The cluster was not touched by the declined phase.
Functions ¶
This section is empty.
Types ¶
type ConfirmFunc ¶
ConfirmFunc asks the user to approve a mutating phase; returning false aborts the run before the phase starts.
type EKSAPI ¶
type EKSAPI interface {
DescribeCluster(ctx context.Context, params *eks.DescribeClusterInput, optFns ...func(*eks.Options)) (*eks.DescribeClusterOutput, error)
UpdateClusterVersion(ctx context.Context, params *eks.UpdateClusterVersionInput, optFns ...func(*eks.Options)) (*eks.UpdateClusterVersionOutput, error)
DescribeUpdate(ctx context.Context, params *eks.DescribeUpdateInput, optFns ...func(*eks.Options)) (*eks.DescribeUpdateOutput, error)
DescribeClusterVersions(ctx context.Context, params *eks.DescribeClusterVersionsInput, optFns ...func(*eks.Options)) (*eks.DescribeClusterVersionsOutput, error)
ListInsights(ctx context.Context, params *eks.ListInsightsInput, optFns ...func(*eks.Options)) (*eks.ListInsightsOutput, error)
ListAddons(ctx context.Context, params *eks.ListAddonsInput, optFns ...func(*eks.Options)) (*eks.ListAddonsOutput, error)
DescribeAddon(ctx context.Context, params *eks.DescribeAddonInput, optFns ...func(*eks.Options)) (*eks.DescribeAddonOutput, error)
DescribeAddonVersions(ctx context.Context, params *eks.DescribeAddonVersionsInput, optFns ...func(*eks.Options)) (*eks.DescribeAddonVersionsOutput, error)
UpdateAddon(ctx context.Context, params *eks.UpdateAddonInput, optFns ...func(*eks.Options)) (*eks.UpdateAddonOutput, error)
ListNodegroups(ctx context.Context, params *eks.ListNodegroupsInput, optFns ...func(*eks.Options)) (*eks.ListNodegroupsOutput, error)
DescribeNodegroup(ctx context.Context, params *eks.DescribeNodegroupInput, optFns ...func(*eks.Options)) (*eks.DescribeNodegroupOutput, error)
UpdateNodegroupVersion(ctx context.Context, params *eks.UpdateNodegroupVersionInput, optFns ...func(*eks.Options)) (*eks.UpdateNodegroupVersionOutput, error)
}
EKSAPI abstracts the EKS client methods the upgrade orchestrator uses. It is a superset of addons.EKSAPI so one client (or mock) serves both.
type ExecuteOptions ¶
type ExecuteOptions struct {
// Yes skips all phase confirmations (--yes).
Yes bool
// Confirm prompts before each mutating phase when Yes is false. Required
// unless Yes is true.
Confirm ConfirmFunc
// Progress receives human-readable progress lines.
Progress ProgressFunc
// SkipAddons / SkipNodegroups / Force mirror the plan options and are
// passed through to the phase executors.
SkipAddons []string
SkipNodegroups []string
Force bool
// NodegroupGate overrides the built-in pre-roll health gate.
NodegroupGate NodegroupGate
}
ExecuteOptions tunes plan execution.
type Hop ¶
type Hop struct {
From string `json:"from" yaml:"from"`
To string `json:"to" yaml:"to"`
Steps []Step `json:"steps" yaml:"steps"`
}
Hop is a single minor-version upgrade cycle within the plan.
type NodegroupGate ¶
NodegroupGate is a pre-flight check run before each nodegroup roll. A nil gate falls back to the built-in check (nodegroup ACTIVE with no reported health issues).
type NodegroupRollOptions ¶
type NodegroupRollOptions struct {
// SkipPatterns are substring patterns for nodegroups to leave alone.
SkipPatterns []string
// Force terminates pods that can't be drained due to PDBs (passed
// through to UpdateNodegroupVersion).
Force bool
// Gate overrides the built-in pre-flight health gate.
Gate NodegroupGate
}
NodegroupRollOptions tunes the nodegroup phase.
type Plan ¶
type Plan struct {
ClusterName string `json:"clusterName" yaml:"clusterName"`
CurrentVersion string `json:"currentVersion" yaml:"currentVersion"`
TargetVersion string `json:"targetVersion" yaml:"targetVersion"`
Hops []Hop `json:"hops" yaml:"hops"`
Warnings []string `json:"warnings,omitempty" yaml:"warnings,omitempty"`
}
Plan is the full ordered upgrade plan for a cluster.
func (*Plan) PendingSteps ¶
PendingSteps counts mutating steps the engine would actually execute. Readiness steps are checks, not work, so they don't count.
type PlanOptions ¶
type PlanOptions struct {
// SkipAddons are addon names the user manages out-of-band (Helm/GitOps);
// they appear in the plan as manual steps and are never mutated.
SkipAddons []string
// SkipNodegroups are substring patterns for nodegroups to leave alone.
SkipNodegroups []string
}
PlanOptions tunes plan generation.
type ProgressFunc ¶
ProgressFunc receives human-readable progress lines during execution.
type Report ¶
type Report struct {
Completed []string `json:"completed,omitempty" yaml:"completed,omitempty"`
FailedAt string `json:"failedAt,omitempty" yaml:"failedAt,omitempty"`
Remaining []string `json:"remaining,omitempty" yaml:"remaining,omitempty"`
}
Report describes how far an execution got: what ran, where it stopped, and what remains. Rerunning the same command resumes from live cluster state.
type Service ¶
type Service struct {
// PollInterval is how often in-flight updates are re-checked.
// Tests shrink it; defaults to defaultPollInterval.
PollInterval time.Duration
// contains filtered or unexported fields
}
Service builds and executes cluster upgrade plans.
func NewService ¶
NewService creates the upgrade orchestrator service.
func (*Service) BuildPlan ¶
func (s *Service) BuildPlan(ctx context.Context, clusterName, targetVersion string, opts PlanOptions) (*Plan, error)
BuildPlan derives the full ordered upgrade plan for clusterName to reach targetVersion. The plan is also the resume mechanism: steps whose desired state is already satisfied by the live cluster are marked completed, so a rerun after a partial upgrade (or a second run after success) executes only what remains.
func (*Service) Execute ¶
Execute runs the plan: hops in order, phases (control plane → addons → nodegroups) in order within each hop, a confirmation before every mutating phase unless opts.Yes, and a halt with a precise completed / failed-at / remaining report on the first failure.
Execution state lives in the cluster itself, not in Execute: steps already marked completed by BuildPlan are skipped, and each phase executor re-checks live state, so rerunning after a failure (or a SIGINT, or a complete success) is safe and only performs the remaining work.
func (*Service) UpgradeAddons ¶
func (s *Service) UpgradeAddons(ctx context.Context, clusterName, targetVersion string, skip []string, progress ProgressFunc) error
UpgradeAddons updates every installed addon (minus the skip list) to the latest version compatible with targetVersion, serially in dependency order (vpc-cni → coredns/kube-proxy → the rest), waiting for each to go ACTIVE.
It runs after the control-plane step of a hop, so targetVersion is also the cluster's (new) current version; versions are still chosen explicitly against targetVersion rather than "latest for whatever the cluster runs" so the intent survives mid-phase retries. The addon service's built-in pre/post health checks act as the phase gate: the first failure halts the phase (and therefore the hop) with the failing addon named.
func (*Service) UpgradeControlPlane ¶
func (s *Service) UpgradeControlPlane(ctx context.Context, clusterName, targetVersion string, progress ProgressFunc) error
UpgradeControlPlane moves the cluster's control plane to targetVersion and waits until the update finishes and the cluster is ACTIVE again.
Idempotent: if the control plane already runs targetVersion (or newer) it returns immediately; if an update is already in flight it attaches and watches instead of failing. On context cancellation (Ctrl+C) it returns ctx.Err() — the upgrade keeps running server-side and a rerun re-attaches.
func (*Service) UpgradeNodegroups ¶
func (s *Service) UpgradeNodegroups(ctx context.Context, clusterName, targetVersion string, opts NodegroupRollOptions, progress ProgressFunc) error
UpgradeNodegroups rolls every managed nodegroup to targetVersion, serially and in listing order, with a pre-flight gate before each roll. It is the same UpdateNodegroupVersion machinery as the AMI refresh — a version roll IS an AMI refresh with Version set.
Already-current nodegroups are skipped (idempotent rerun); custom-AMI nodegroups are surfaced as manual actions, never mutated. A gate failure halts the remaining nodegroups so the operator can intervene.
type Step ¶
type Step struct {
Type StepType `json:"type" yaml:"type"`
Description string `json:"description" yaml:"description"`
Target string `json:"target,omitempty" yaml:"target,omitempty"`
Version string `json:"version,omitempty" yaml:"version,omitempty"`
Status StepStatus `json:"status" yaml:"status"`
Reason string `json:"reason,omitempty" yaml:"reason,omitempty"`
}
Step is one entry in the ordered upgrade plan.
type StepStatus ¶
type StepStatus string
StepStatus is the planned/derived state of a step.
const ( // StatusPending means the step still needs to run. StatusPending StepStatus = "pending" // StatusCompleted means live cluster state already satisfies the step, // so the engine skips it (this is what makes reruns resumable/no-ops). StatusCompleted StepStatus = "completed" // StatusBlocked means the step cannot run until the blocker is resolved; // a plan containing blocked steps refuses to execute. StatusBlocked StepStatus = "blocked" // StatusManual means the orchestrator will not perform this step (e.g. // custom-AMI nodegroups); it is surfaced for the operator instead. StatusManual StepStatus = "manual" )