coldstart

package
v0.1.19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 25, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package coldstart implements the / the design doc rule cascade that decides how an agent resolves a digest when its local cache misses and the DHT lookup did not return enough providers.

The orchestrator is invoked by the mirror miss path after a FindProviders call returns empty. It runs the following pipeline:

1. Compute HRW top-K from the local membership snapshot. 2. Dial all K in parallel with `pull_intent_query`, collecting responses up to a 2 s timeout (the step 4). 3. Apply the 7-rule cascade in priority order (the step 5): 1. failure short-circuit -> return ErrFailureShortCircuit (5xx) 2. cache hit -> return the responder's transfer addr 3. in-flight piggyback -> DHT-poll until provider appears 4. transient cooldown -> return ErrCooldownActive (5xx) 5. all-unreachable expand -> re-run step 2 with top-2K 6. degraded eager expand -> re-run step 2 with top-2K 7. cold-start -> please_pull to lowest-rank reachable, then DHT-poll for the provider 4. While DHT-polling (rules 3 and 7), bound by the per-digest timeout from the design doc (manifest/config 5 s; layer max(10 s, size/50 MB/s) × 3) and the per-kind poll interval (200 ms manifest/config; 1 s layer).

The Resolver is stateless across calls; concurrent Resolve invocations for the same digest are safe and will independently arrive at the same outcome (the inflight map at the puller side dedupes the origin pull itself).

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrFailureShortCircuit fires rule 1.
	ErrFailureShortCircuit = errors.New("coldstart: failure short-circuit")
	// ErrCooldownActive fires rule 4.
	ErrCooldownActive = errors.New("coldstart: transient cooldown active")
	// ErrExhausted fires when the cascade reaches its terminal state
	// without producing a provider (e.g., expanded-2K exhausted, or
	// please_pull completed but the DHT poll timed out).
	ErrExhausted = errors.New("coldstart: cascade exhausted")
)

Sentinel errors. Mirror layer maps all of these to 5xx; tests distinguish to validate which rule fired.

View Source
var ErrPrefetchInvalid = errors.New("coldstart: prefetch invalid arguments")

ErrPrefetchInvalid signals a programmer error in the prefetch call: registry or repository was empty, which would produce a malformed PleasePull RPC.

View Source
var ErrPrefetchPartial = errors.New("coldstart: prefetch had per-puller failures")

ErrPrefetchPartial signals at least one puller's PleasePull RPC failed. The other pullers were still asked to pull. Callers can errors.Is against this for retry / logging logic; the prefetch is best-effort so this is informational rather than fatal.

Functions

This section is empty.

Types

type ChildDigest

type ChildDigest struct {
	Digest digest.Digest
	Kind   ifaces.OriginRefKind
}

ChildDigest pairs a child digest with the OCI URL-family kind the puller MUST target. Kind is one of ifaces.KindConfig (the manifest's image-config blob) or ifaces.KindBlob (every layer descriptor); both are pulled from /v2/<repo>/blobs/<digest> at the registry level but are carried separately on the wire so per-kind metrics agree end-to-end across the please_pull boundary. internal/manifest's TypedChildren is the canonical producer of these values.

type Discovery

type Discovery interface {
	FindProviders(ctx context.Context, d digest.Digest) ([]ifaces.Provider, error)
	Health() float64
}

Discovery is the subset of the libp2p discovery host that the orchestrator needs. Kept narrow for ease of mocking.

type MetricsHooks

type MetricsHooks struct {
	// OnRankMismatch fires once per pull_intent response whose
	// reported hrw_rank disagrees with the requester's computed
	// rank for that responder. kindLabel is "manifest", "config",
	// or "layer".
	OnRankMismatch func(kindLabel string, responder ifaces.NodeID)
	// OnDhtFalseEmpty fires when the orchestrator observes the
	// false-empty case: DHT had returned 0 providers, but a
	// pull_intent_query reports has_cached=true.
	OnDhtFalseEmpty func()
	// OnTopKProbeHit fires when any rule before rule 7 (cold-start)
	// resolves the request. Used to track how often the probe saves
	// an origin pull.
	OnTopKProbeHit func()
	// OnColdStartDuration is called once per Resolve with the total
	// elapsed time and the outcome rule that fired ("rule1".."rule7"
	// or "expanded_rule_N").
	OnColdStartDuration func(kindLabel, outcome string, d time.Duration)
	// OnDesignatedPullerTakeover fires when a pull_intent_query
	// responder reports in_flight=true but its started_at is older
	// than the per-the design doc stall threshold, so the requester excludes
	// it from rule-3 piggyback and routes via the next-ranked node
	// (rule 6 / rule 7). kindLabel is "manifest", "config", or
	// "layer". Maps to the design doc metric
	// `p2p_designated_puller_takeover_total`.
	OnDesignatedPullerTakeover func(kindLabel string)
	// OnTopKExpansion fires once per expansion pass to top-2K (or
	// top-(K × TopKExpansionFactor) when the factor is configured).
	// reason is "degraded" (rule-6 DHT-degraded expand) or
	// "all_unreachable" (rule-5 expansion). Maps to the design doc metric
	// `p2p_topk_expansion_total{reason=}`.
	OnTopKExpansion func(reason string)
	// OnPrefetchBatch fires once per PrefetchLayers call with the
	// number of distinct pullers contacted and the number of layer
	// digests grouped into those batches (after self / unreachable
	// filtering). Maps to the design doc / the design doc batched-please_pull metrics:
	// `p2p_prefetch_batches_total` (count) and
	// `p2p_prefetch_digests_batched_total` (sum).
	OnPrefetchBatch func(pullers, digests int)
}

MetricsHooks lets the metrics package wire Prometheus counters without coupling the orchestrator to client_golang. All hooks are nil-safe.

type Options

type Options struct {
	Members   ifaces.Members
	Discovery Discovery
	Coord     ifaces.Coordinator
	Inflight  *inflight.Map
	Logger    *slog.Logger
	Metrics   MetricsHooks
	Now       func() time.Time
	HrwK      int       // default 3
	HrwScope  hrw.Scope // default ScopeCluster
	SelfZone  string    // required when HrwScope == ScopeZone

	// LocalIntent computes self's PullIntent synchronously, without
	// the libp2p coord round-trip. When non-nil, the cold-start
	// orchestrator includes self as a first-class participant in the
	// the design doc rule cascade - rule 2 (cache hit on self), rule 3 (self
	// in-flight), rule 4 (self in cooldown), and rule 7 (self picked
	// as designated puller) all behave the same as for any peer.
	//
	// Without LocalIntent, the resolver excludes self from
	// queryTargets and from `reachable`, which means a self-as-HRW-
	// rank-0 case routes please_pull to rank 1 - two nodes both
	// trying to delegate to each other can each origin-pull the same
	// digest, violating the cache-hit "one origin pull per digest"
	// invariant. New deployments MUST wire LocalIntent.
	LocalIntent ifaces.LocalIntentProvider
	// LocalPull starts an origin pull on self without the libp2p
	// please_pull RPC. Used when rule 7's lowest-rank-reachable puller
	// is self. nil + LocalIntent non-nil + rule 7 picks self ->
	// resolver falls back to Coord.PleasePull(self, ...), which will
	// either fail to dial or burn a stream slot to no benefit; tests
	// should set both together.
	LocalPull ifaces.LocalPullStarter

	// Tunables (defaults applied if zero).
	QueryTimeout         time.Duration // default 2s - step 5 wait window
	PollManifest         time.Duration // default 200ms -
	PollLayer            time.Duration // default 1s -
	TransientCooldownCap time.Duration // default 30s - rule 4
	HonorSweepInterval   time.Duration // default 1m, negative disables full sweeps

	// TopKExpansionFactor is the multiplier applied to HrwK on the
	// expansion pass under rule 5 / rule 6 (the step 5; the design doc
	// `topk_expansion_factor_degraded`). Defaults to 2 when ≤1.
	TopKExpansionFactor int

	// TrustedFailureClasses is the set of the design doc origin-error classes
	// the requester accepts as a cluster-wide 5xx-immediate signal
	// when a top-K responder reports recently_failed in rule 1.
	// Classes outside this set (notably `transient`) are honored
	// only locally by the reporting puller. Empty defaults to
	// {auth, not_found, rate_limited} per design the design doc / config
	// `origin_failure_classes_trusted_cluster_wide`.
	TrustedFailureClasses []ifaces.FailureClass
}

Options configures a Resolver.

type Resolution

type Resolution struct {
	// Providers are transfer endpoints (host:port) the caller should
	// fetch from, in priority order. Non-empty on success.
	Providers []ifaces.Provider
	// Outcome names which rule fired. Useful for tests and metrics.
	Outcome string
}

Resolution carries the orchestrator's verdict.

type Resolver

type Resolver struct {
	// contains filtered or unexported fields
}

Resolver runs the cascade.

func New

func New(opts Options) *Resolver

New builds a Resolver. Required fields: Members, Discovery, Coord, Inflight.

func (*Resolver) PrefetchChildren

func (r *Resolver) PrefetchChildren(ctx context.Context, children []ChildDigest, registry, repository string) error

PrefetchChildren is the kind-preserving sibling of PrefetchLayers. It groups children by (HRW puller, kind) and emits one PleasePull (or StartLocalPull) RPC per group, so the single-repo-per- batch invariant ("all digests in a batch MUST share kind") is honored while still preserving the per-kind metric label all the way through the wire.

A manifest typically yields one KindConfig digest and N KindBlob digests. If all N+1 children HRW to the same puller, PrefetchChildren issues TWO RPCs (one per kind) rather than one mixed RPC - that's the trade for keeping the kind label honest. The CPU/RPC overhead is negligible (one extra round-trip per manifest serve, dwarfed by the layer pull itself), and the alternative (collapsing config into blob on the wire) leaves p2p_origin_pull_total{kind="config"} permanently zero.

func (*Resolver) PrefetchLayers

func (r *Resolver) PrefetchLayers(ctx context.Context, digests []digest.Digest, registry, repository string) error

PrefetchLayers groups digests by their HRW rank-0 reachable designated puller and issues one PleasePull RPC per puller. Digests HRW'ing to self are diverted to the local LocalPullStarter (if configured) and batched as a single StartLocalPull call; if no LocalPullStarter is configured, the self-bucket is skipped (the per-digest Resolve cascade will still recover via rule 7 when containerd actually asks for the layer). Already-cached or otherwise filtered digests should be removed by the caller before invoking.

PrefetchLayers blocks until every per-puller RPC has completed or errored, but each RPC is bounded by QueryTimeout. Callers run it in a goroutine for fire-and-forget semantics. A returned error means at least one puller's RPC failed; callers can log it and otherwise ignore - the next per-digest Resolve call from containerd will fall back to the full the design doc cascade.

registry and repository identify the upstream and OCI repo. They MUST be non-empty (the design doc single-repo-per-batch invariant); an empty value returns ErrPrefetchInvalid without issuing any RPC.

All digests passed via PrefetchLayers are sent as KindBlob; this is a back-compat shim that pre-dates per-kind labelling. New callers should use PrefetchChildren which preserves the config-vs-layer kind end-to-end through the wire so per-kind metrics ("manifest | config | layer") remain honest.

func (*Resolver) Resolve

func (r *Resolver) Resolve(ctx context.Context, d digest.Digest, kind ifaces.OriginRefKind, registry, repository string, expectedSize int64) (*Resolution, error)

Resolve runs the cascade for d. The returned error is one of the sentinel errors above, or a context cancellation, or a transport error. expectedSize is 0 if unknown (e.g., manifest digest before parsing); it is used to compute the per-the design doc stall threshold for rule 3 (in_flight) and the DHT-polling deadline overall.

registry+repository identify the upstream and OCI repo for the please_pull RPC (the design doc single-repo-per-batch invariant). Both must be non-empty for rule 7 to fire; the orchestrator otherwise falls through to ErrExhausted.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL