GO-2026-6002: ToolHive: SSRF in remote MCP server authentication discovery (host-side, bypasses container isolation) in github.com/stacklok/toolhive

upgrade

package

v0.29.1 Latest Latest Go to latest Published: Jun 4, 2026 License: Apache-2.0 Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/stacklok/toolhive

Links

Documentation ¶

Overview ¶

Package upgrade implements registry-sourced workload upgrade checks for ToolHive (RFC THV-0068). It compares the image and configuration of a running workload against the latest metadata served by the registry and reports whether an upgrade is available, along with any environment-variable or posture (transport/network/permission) drift.

Dependency direction (cycle guard) ¶

This package depends on pkg/workloads (for the apply path in later phases) and pkg/runner. To avoid an import cycle, pkg/workloads MUST NEVER import this package. Higher-level entry points (CLI, API handlers) wire the two together; the manager itself stays unaware of upgrade logic.

No rollback ¶

The apply path (Phase D) intentionally provides no rollback. It resolves, verifies, and pulls the candidate image BEFORE destroying the existing workload so that a failure during preparation leaves the running workload untouched. Once the workload is deleted and recreated, there is no automatic revert to the previous image or configuration; recovery is a forward operation (re-running the previous configuration explicitly).

Index ¶

Variables
type Applier
- func NewApplier(manager workloads.Manager, checker *Checker, appConfig config.Provider) (*Applier, error)
- func (a *Applier) Apply(ctx context.Context, name string, opts ApplyOptions) (*CheckResult, error)
type ApplyOptions
type CheckResult
type Checker
- func NewChecker(provider registry.Provider) (*Checker, error)
- func (c *Checker) Check(_ context.Context, cfg *runner.RunConfig) (*CheckResult, error)
- func (c *Checker) CheckAll(ctx context.Context, configs []*runner.RunConfig) []*CheckResult
type ConfigDrift
type EnvVarDrift
type EnvVarInfo
type StringChange
type UpgradeStatus

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrApplyAfterDestroy = errors.New("upgrade failed after workload recreate began; workload state is uncertain")

ErrApplyAfterDestroy marks an Apply failure that occurred AFTER the destructive recreate began (the manager has already started to stop/delete the workload). Unlike preparation failures — which leave the running workload untouched — a failure tagged with this sentinel means the workload may be stopped, deleted, or only partially recreated: its state is uncertain and the failure is NOT safely retryable against an intact workload. Callers use errors.Is(err, ErrApplyAfterDestroy) to distinguish this 5xx-class condition from the 4xx-class "candidate could not be prepared" failures.

Functions ¶

This section is empty.

Types ¶

type Applier ¶

type Applier struct {
	// contains filtered or unexported fields
}

Applier is the canonical, security-critical path for applying an available registry-sourced upgrade to a running workload. It re-resolves and verifies the candidate image, pulls it, and only then asks the workload manager to recreate the workload with the new image.

Applier is the single place where an upgrade is materialized; both the CLI (Phase D2) and the API (Phase D3) delegate here so that the verify-then-pull ordering and TOCTOU guard live in exactly one place.

func NewApplier ¶

func NewApplier(manager workloads.Manager, checker *Checker, appConfig config.Provider) (*Applier, error)

NewApplier creates an Applier.

All dependencies are required and validated; the constructor fails loudly rather than producing an Applier that would panic or silently no-op at apply time. The appConfig provider supplies the registry source URLs recorded on the upgraded workload's config.

func (*Applier) Apply ¶

func (a *Applier) Apply(ctx context.Context, name string, opts ApplyOptions) (*CheckResult, error)

Apply upgrades the named workload to the candidate image the registry currently reports for the workload's registry server.

Sequence (see RFC THV-0068 design §4; step numbers match the inline comments):

Load the workload's current RunConfig (treated as read-only).
Re-run the upgrade check against the registry. If the workload is not in StatusUpgradeAvailable, the current CheckResult is returned unchanged as a no-op (NOT an error) so the caller can decide how to message it. The check is ALWAYS re-derived here; a CheckResult computed earlier by the caller is never trusted, closing the time-of-check/time-of-use window.
Re-resolve and VERIFY the candidate image from the registry by server name (passing the name, not the image ref, is what triggers provenance verification inside ResolveMCPServer).
Reject non-image (remote) registry entries: there is nothing to pull and recreate, so refuse rather than risk destroying the workload.
Build a merged RunConfig that preserves the workload's user configuration while bumping the image to the candidate and re-validating env vars against the candidate's fresh metadata.
Run the policy gate and perform a verified pull of the candidate image.
Only after all of the above succeed, ask the manager to recreate the workload with the new config.

No rollback: steps 3-6 all happen BEFORE the destructive recreate in step 7, so any failure while preparing the candidate (resolution, verification, policy, pull) leaves the running workload completely untouched. Once step 7 begins, the manager deletes and recreates the workload; there is no automatic revert to the previous image or configuration if recreation fails. Recovery is a forward operation (re-running the previous configuration explicitly).

Runtime boundary: the "verified pull before destruction" guarantee is precise only for local container runtimes. On Kubernetes, EnforcePolicyAndPullImage runs the verification and policy gate but DELEGATES the byte-level image pull to the kubelet, which happens AFTER the workload is recreated. In all cases, verification and the policy gate always precede destruction; only the local runtime additionally guarantees the bytes are present before destruction. (Local runtimes are the scope of this phase; the boundary is documented for completeness.)

On success the *CheckResult that drove the upgrade is returned (including any detected drift) so the caller can report what changed.

type ApplyOptions ¶

type ApplyOptions struct {
	// EnvVars are additional or overriding environment variables to merge into
	// the upgraded workload's configuration.
	EnvVars map[string]string

	// Secrets are additional secret parameters (`<name>,target=<env>`) to merge
	// into the upgraded workload's configuration.
	Secrets []string

	// EnvVarValidator validates that required environment variables and secrets
	// are supplied for the candidate registry entry.
	EnvVarValidator runner.EnvVarValidator

	// VerifySetting controls image signature verification. Empty defaults to
	// retriever.VerifyImageWarn.
	VerifySetting string

	// CACertPath is an optional path to a CA certificate bundle used when
	// resolving the candidate image from a registry over TLS.
	CACertPath string
}

ApplyOptions controls how an upgrade is applied to a workload.

It is defined here for use by the Applier (Phase D). No apply logic is implemented in this phase.

type CheckResult ¶

type CheckResult struct {
	// WorkloadName is the name of the workload that was checked.
	WorkloadName string `json:"workload_name"`

	// RegistryServer is the registry entry name the workload was sourced from.
	// Empty when the workload is not registry-sourced.
	RegistryServer string `json:"registry_server,omitempty"`

	// Status is the upgrade status for the workload.
	Status UpgradeStatus `json:"status"`

	// CurrentImage is the image reference the workload is currently running.
	CurrentImage string `json:"current_image,omitempty"`

	// CandidateImage is the image reference the registry currently reports.
	CandidateImage string `json:"candidate_image,omitempty"`

	// Reason provides additional context, primarily for StatusUnknown.
	Reason string `json:"reason,omitempty"`

	// EnvVarDrift describes environment variables the candidate registry entry
	// declares that differ from the workload's current configuration.
	EnvVarDrift *EnvVarDrift `json:"env_var_drift,omitempty"`

	// ConfigDrift describes posture differences (transport, permission profile)
	// between the workload and the candidate registry entry.
	ConfigDrift *ConfigDrift `json:"config_drift,omitempty"`
}

CheckResult is the outcome of checking a single workload for an available upgrade. EnvVarDrift and ConfigDrift are only populated when an upgrade is available and the relevant drift was detected.

type Checker ¶

type Checker struct {
	// contains filtered or unexported fields
}

Checker determines whether registry-sourced workloads have an available upgrade by comparing their current image and configuration against the metadata the injected registry provider reports.

func NewChecker ¶

func NewChecker(provider registry.Provider) (*Checker, error)

NewChecker creates a Checker backed by the given registry provider.

The provider is the source of truth for candidate image metadata; callers typically pass the shared singleton from registry.GetDefaultProvider so the provider's response cache is reused across checks. It returns an error if the provider is nil.

func (*Checker) Check ¶

func (c *Checker) Check(_ context.Context, cfg *runner.RunConfig) (*CheckResult, error)

Check evaluates a single workload's RunConfig against the registry and returns the upgrade status. It never mutates the supplied config. Per-item problems (missing server, unparsable tags, non-image entries) are encoded in the returned CheckResult's Status/Reason rather than returned as an error; an error is returned only for an invalid call (nil config).

func (*Checker) CheckAll ¶

func (c *Checker) CheckAll(ctx context.Context, configs []*runner.RunConfig) []*CheckResult

CheckAll evaluates a batch of workloads. It never returns an error: each workload's outcome (including per-item failures) is encoded in its own CheckResult. The returned slice preserves the input order. Nil entries in the input are skipped.

type ConfigDrift ¶

type ConfigDrift struct {
	// Transport is set when the candidate's transport differs from the
	// workload's current transport.
	Transport *StringChange `json:"transport,omitempty"`

	// PermissionProfile is set when the candidate's permission profile differs
	// from the workload's current profile.
	PermissionProfile *StringChange `json:"permission_profile,omitempty"`
}

ConfigDrift describes posture differences between a workload's current configuration and the candidate registry entry. A nil field means that aspect did not drift (or could not be compared).

type EnvVarDrift ¶

type EnvVarDrift struct {
	// Added lists environment variables the candidate declares that the
	// workload does not currently supply (via plain env vars or secrets).
	Added []EnvVarInfo `json:"added,omitempty"`

	// Removed lists environment variables the workload supplies that the
	// candidate no longer declares. Populated on a best-effort basis; may be
	// empty even when removals exist (forward-compatible field).
	Removed []EnvVarInfo `json:"removed,omitempty"`
}

EnvVarDrift describes how the candidate registry entry's declared environment variables differ from those the workload currently satisfies.

type EnvVarInfo ¶

type EnvVarInfo struct {
	// Name is the environment variable name.
	Name string `json:"name"`

	// Description is the human-readable purpose of the variable.
	Description string `json:"description,omitempty"`

	// Required indicates whether the candidate marks the variable as required.
	Required bool `json:"required"`

	// Secret indicates whether the variable holds sensitive data.
	Secret bool `json:"secret,omitempty"`

	// Default is the candidate's default value. It is cleared (left empty)
	// whenever Secret is true: a secret env var's default could carry sensitive
	// data, and surfacing it in a drift report (which may be logged or returned
	// over the API) would leak it. Non-secret defaults are safe to display.
	Default string `json:"default,omitempty"`
}

EnvVarInfo is a registry-declared environment variable surfaced in drift.

type StringChange ¶

type StringChange struct {
	From string `json:"from"`
	To   string `json:"to"`
}

StringChange records a string-valued configuration change from the workload's current value to the candidate registry value.

type UpgradeStatus ¶

type UpgradeStatus string

UpgradeStatus is the outcome of an upgrade check for a single workload.

const (
	// StatusUpToDate indicates the workload is already running the candidate
	// image reported by the registry (or a newer one).
	StatusUpToDate UpgradeStatus = "up-to-date"

	// StatusUpgradeAvailable indicates the registry reports a newer image than
	// the one the workload is currently running.
	StatusUpgradeAvailable UpgradeStatus = "upgrade-available"

	// StatusNotRegistrySourced indicates the workload was not created from a
	// registry entry (no RegistryServerName), so no upgrade can be determined.
	StatusNotRegistrySourced UpgradeStatus = "not-registry-sourced"

	// StatusServerNotFound indicates the workload references a registry server
	// name that no longer exists in the configured registry.
	StatusServerNotFound UpgradeStatus = "server-not-found"

	// StatusUnknown indicates the upgrade status could not be determined, e.g.
	// the registry lookup failed, the entry is a remote (non-image) server, or
	// the image tags are not comparable. The Reason field explains why.
	StatusUnknown UpgradeStatus = "unknown"
)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL