upgrade

package
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package upgrade implements the cluster upgrade orchestrator: plan generation, the control-plane / addon / nodegroup phases, and the sequencing engine that runs a plan with health gates between phases.

EKS upgrades one minor version at a time, so a multi-minor upgrade expands into sequential "hops" (1.31 → 1.32 → 1.33). Each hop runs control plane → addons (dependency order, versions compatible with the hop target) → nodegroup rolls, with a gate after every phase.

The orchestrator is resumable by re-derivation rather than state files: BuildPlan inspects actual cluster state and marks already-satisfied steps completed, so rerunning `refresh cluster upgrade --to X` after a failure (or after success) only executes what remains.

Index

Constants

This section is empty.

Variables

View Source
var ErrAborted = errors.New("upgrade aborted by user")

ErrAborted is returned by Execute when the user declines a phase confirmation. The cluster was not touched by the declined phase.

Functions

This section is empty.

Types

type ConfirmFunc

type ConfirmFunc func(prompt string) bool

ConfirmFunc asks the user to approve a mutating phase; returning false aborts the run before the phase starts.

type EKSAPI

type EKSAPI interface {
	DescribeCluster(ctx context.Context, params *eks.DescribeClusterInput, optFns ...func(*eks.Options)) (*eks.DescribeClusterOutput, error)
	UpdateClusterVersion(ctx context.Context, params *eks.UpdateClusterVersionInput, optFns ...func(*eks.Options)) (*eks.UpdateClusterVersionOutput, error)
	DescribeUpdate(ctx context.Context, params *eks.DescribeUpdateInput, optFns ...func(*eks.Options)) (*eks.DescribeUpdateOutput, error)
	DescribeClusterVersions(ctx context.Context, params *eks.DescribeClusterVersionsInput, optFns ...func(*eks.Options)) (*eks.DescribeClusterVersionsOutput, error)
	ListInsights(ctx context.Context, params *eks.ListInsightsInput, optFns ...func(*eks.Options)) (*eks.ListInsightsOutput, error)
	ListAddons(ctx context.Context, params *eks.ListAddonsInput, optFns ...func(*eks.Options)) (*eks.ListAddonsOutput, error)
	DescribeAddon(ctx context.Context, params *eks.DescribeAddonInput, optFns ...func(*eks.Options)) (*eks.DescribeAddonOutput, error)
	DescribeAddonVersions(ctx context.Context, params *eks.DescribeAddonVersionsInput, optFns ...func(*eks.Options)) (*eks.DescribeAddonVersionsOutput, error)
	UpdateAddon(ctx context.Context, params *eks.UpdateAddonInput, optFns ...func(*eks.Options)) (*eks.UpdateAddonOutput, error)
	ListNodegroups(ctx context.Context, params *eks.ListNodegroupsInput, optFns ...func(*eks.Options)) (*eks.ListNodegroupsOutput, error)
	DescribeNodegroup(ctx context.Context, params *eks.DescribeNodegroupInput, optFns ...func(*eks.Options)) (*eks.DescribeNodegroupOutput, error)
	UpdateNodegroupVersion(ctx context.Context, params *eks.UpdateNodegroupVersionInput, optFns ...func(*eks.Options)) (*eks.UpdateNodegroupVersionOutput, error)
}

EKSAPI abstracts the EKS client methods the upgrade orchestrator uses. It is a superset of addons.EKSAPI so one client (or mock) serves both.

type ExecuteOptions

type ExecuteOptions struct {
	// Yes skips all phase confirmations (--yes).
	Yes bool
	// Confirm prompts before each mutating phase when Yes is false. Required
	// unless Yes is true.
	Confirm ConfirmFunc
	// Progress receives human-readable progress lines.
	Progress ProgressFunc
	// SkipAddons / SkipNodegroups / Force mirror the plan options and are
	// passed through to the phase executors.
	SkipAddons     []string
	SkipNodegroups []string
	Force          bool
	// NodegroupGate overrides the built-in pre-roll health gate.
	NodegroupGate NodegroupGate
	// NodegroupObserver, when set, renders a live per-node roll view during each
	// nodegroup roll. Supplied by the command (view) layer; nil → text progress.
	NodegroupObserver RollObserver
}

ExecuteOptions tunes plan execution.

type Hop

type Hop struct {
	From  string `json:"from" yaml:"from"`
	To    string `json:"to" yaml:"to"`
	Steps []Step `json:"steps" yaml:"steps"`
}

Hop is a single minor-version upgrade cycle within the plan.

type NodegroupGate

type NodegroupGate func(ctx context.Context, nodegroupName string) error

NodegroupGate is a pre-flight check run before each nodegroup roll. A nil gate falls back to the built-in check (nodegroup ACTIVE with no reported health issues).

type NodegroupRollOptions

type NodegroupRollOptions struct {
	// SkipPatterns are substring patterns for nodegroups to leave alone.
	SkipPatterns []string
	// Force terminates pods that can't be drained due to PDBs (passed
	// through to UpdateNodegroupVersion).
	Force bool
	// Gate overrides the built-in pre-flight health gate.
	Gate NodegroupGate
	// Observer, when set, renders a live per-node roll view during each roll.
	Observer RollObserver
}

NodegroupRollOptions tunes the nodegroup phase.

type Plan

type Plan struct {
	ClusterName    string   `json:"clusterName" yaml:"clusterName"`
	CurrentVersion string   `json:"currentVersion" yaml:"currentVersion"`
	TargetVersion  string   `json:"targetVersion" yaml:"targetVersion"`
	Hops           []Hop    `json:"hops" yaml:"hops"`
	Warnings       []string `json:"warnings,omitempty" yaml:"warnings,omitempty"`
}

Plan is the full ordered upgrade plan for a cluster.

func (*Plan) Blocked

func (p *Plan) Blocked() bool

Blocked reports whether any step in the plan is blocked.

func (*Plan) Blockers

func (p *Plan) Blockers() []string

Blockers returns the descriptions of all blocked steps across hops.

func (*Plan) PendingSteps

func (p *Plan) PendingSteps() int

PendingSteps counts mutating steps the engine would actually execute. Readiness steps are checks, not work, so they don't count.

type PlanOptions

type PlanOptions struct {
	// SkipAddons are addon names the user manages out-of-band (Helm/GitOps);
	// they appear in the plan as manual steps and are never mutated.
	SkipAddons []string
	// SkipNodegroups are substring patterns for nodegroups to leave alone.
	SkipNodegroups []string
}

PlanOptions tunes plan generation.

type ProgressFunc

type ProgressFunc func(format string, args ...any)

ProgressFunc receives human-readable progress lines during execution.

type Report

type Report struct {
	Completed []string `json:"completed,omitempty" yaml:"completed,omitempty"`
	FailedAt  string   `json:"failedAt,omitempty" yaml:"failedAt,omitempty"`
	Remaining []string `json:"remaining,omitempty" yaml:"remaining,omitempty"`
}

Report describes how far an execution got: what ran, where it stopped, and what remains. Rerunning the same command resumes from live cluster state.

type RollObserver added in v0.8.0

type RollObserver func(ctx context.Context, nodegroupName string)

RollObserver renders a live view of a single nodegroup roll. It is supplied by the command (view) layer and invoked by the nodegroup phase AFTER a roll starts and BEFORE the authoritative DescribeUpdate wait — so rendering never happens in the service itself. It must be best-effort and bounded (it must not block the roll or affect its result); a nil observer means text progress only.

type Service

type Service struct {

	// PollInterval is how often in-flight updates are re-checked.
	// Tests shrink it; defaults to defaultPollInterval.
	PollInterval time.Duration
	// contains filtered or unexported fields
}

Service builds and executes cluster upgrade plans.

func NewService

func NewService(eksClient EKSAPI, logger *slog.Logger) *Service

NewService creates the upgrade orchestrator service.

func (*Service) BuildPlan

func (s *Service) BuildPlan(ctx context.Context, clusterName, targetVersion string, opts PlanOptions) (*Plan, error)

BuildPlan derives the full ordered upgrade plan for clusterName to reach targetVersion. The plan is also the resume mechanism: steps whose desired state is already satisfied by the live cluster are marked completed, so a rerun after a partial upgrade (or a second run after success) executes only what remains.

func (*Service) Execute

func (s *Service) Execute(ctx context.Context, plan *Plan, opts ExecuteOptions) (*Report, error)

Execute runs the plan: hops in order, phases (control plane → addons → nodegroups) in order within each hop, a confirmation before every mutating phase unless opts.Yes, and a halt with a precise completed / failed-at / remaining report on the first failure.

Execution state lives in the cluster itself, not in Execute: steps already marked completed by BuildPlan are skipped, and each phase executor re-checks live state, so rerunning after a failure (or a SIGINT, or a complete success) is safe and only performs the remaining work.

func (*Service) UpgradeAddons

func (s *Service) UpgradeAddons(ctx context.Context, clusterName, targetVersion string, skip []string, progress ProgressFunc) error

UpgradeAddons updates every installed addon (minus the skip list) to the latest version compatible with targetVersion, serially in dependency order (vpc-cni → coredns/kube-proxy → the rest), waiting for each to go ACTIVE.

It runs after the control-plane step of a hop, so targetVersion is also the cluster's (new) current version; versions are still chosen explicitly against targetVersion rather than "latest for whatever the cluster runs" so the intent survives mid-phase retries. The addon service's built-in pre/post health checks act as the phase gate: the first failure halts the phase (and therefore the hop) with the failing addon named.

func (*Service) UpgradeControlPlane

func (s *Service) UpgradeControlPlane(ctx context.Context, clusterName, targetVersion string, progress ProgressFunc) error

UpgradeControlPlane moves the cluster's control plane to targetVersion and waits until the update finishes and the cluster is ACTIVE again.

Idempotent: if the control plane already runs targetVersion (or newer) it returns immediately; if an update is already in flight it attaches and watches instead of failing. On context cancellation (Ctrl+C) it returns ctx.Err() — the upgrade keeps running server-side and a rerun re-attaches.

func (*Service) UpgradeNodegroups

func (s *Service) UpgradeNodegroups(ctx context.Context, clusterName, targetVersion string, opts NodegroupRollOptions, progress ProgressFunc) error

UpgradeNodegroups rolls every managed nodegroup to targetVersion, serially and in listing order, with a pre-flight gate before each roll. It is the same UpdateNodegroupVersion machinery as the AMI refresh — a version roll IS an AMI refresh with Version set.

Already-current nodegroups are skipped (idempotent rerun); custom-AMI nodegroups are surfaced as manual actions, never mutated. A gate failure halts the remaining nodegroups so the operator can intervene.

type Step

type Step struct {
	Type        StepType   `json:"type" yaml:"type"`
	Description string     `json:"description" yaml:"description"`
	Target      string     `json:"target,omitempty" yaml:"target,omitempty"`
	Version     string     `json:"version,omitempty" yaml:"version,omitempty"`
	Status      StepStatus `json:"status" yaml:"status"`
	Reason      string     `json:"reason,omitempty" yaml:"reason,omitempty"`
}

Step is one entry in the ordered upgrade plan.

type StepStatus

type StepStatus string

StepStatus is the planned/derived state of a step.

const (
	// StatusPending means the step still needs to run.
	StatusPending StepStatus = "pending"
	// StatusCompleted means live cluster state already satisfies the step,
	// so the engine skips it (this is what makes reruns resumable/no-ops).
	StatusCompleted StepStatus = "completed"
	// StatusBlocked means the step cannot run until the blocker is resolved;
	// a plan containing blocked steps refuses to execute.
	StatusBlocked StepStatus = "blocked"
	// StatusManual means the orchestrator will not perform this step (e.g.
	// custom-AMI nodegroups); it is surfaced for the operator instead.
	StatusManual StepStatus = "manual"
)

type StepType

type StepType string

StepType identifies which phase a plan step belongs to.

const (
	StepReadiness    StepType = "readiness"
	StepControlPlane StepType = "control-plane"
	StepAddon        StepType = "addon"
	StepNodegroup    StepType = "nodegroup"
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL