pauseresume

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Overview

Package pauseresume ships Harbor's ONE pause/resume primitive — the unified Coordinator that HITL approval, tool-side OAuth, A2A AUTH_REQUIRED / INPUT_REQUIRED, and operator/Console PAUSE all converge on (CLAUDE.md §7 rule 4, RFC §3.3 + §6.3). There are not four parallel pause implementations; there is this one.

The shape

The Coordinator interface is three methods — Request, Resume, Status — keyed on an opaque, runtime-owned Token. Planners never instantiate the Coordinator and never reach into it: a planner signals "I need a pause" by returning the planner.RequestPause Decision shape (Phase 42 / D-047), and the runtime executor (a later phase) drives the Coordinator. This package never imports the planner package — it consumes only the planner.PauseReason enum, via the byte-stable Reason typedef bridge below. That one-directional dependency on a pure enum keeps the swappable-planner property intact (the master-plan flags Phase 50 as a critical-path phase for exactly this reason).

Durability

A pause is always recorded in a process-local registry keyed by Token. Durability — surviving a Runtime restart — rides on the existing state.StateStore (Phase 07), handed to the Coordinator as an OPTIONAL checkpoint store via WithCheckpointStore. When a checkpoint store is configured, Request serialises the pause record (including the trajectory, via trajectory.Trajectory.Serialize) and Saves it; a fresh Coordinator over the same store rehydrates the pause on demand. When NO checkpoint store is configured, pauses are process-local only and explicitly do NOT survive restart. This is the master-plan acceptance criterion verbatim: "pauses survive Runtime restart only when StateStore-backed checkpoint is configured."

Phase 50 deliberately does NOT mint a second persistence-driver seam. state.StateStore is already the §4.4 persistence seam, with three V1 drivers (in-mem / SQLite / Postgres) at conformance parity; a parallel CheckpointStore interface would be the §13 two-parallel-implementations smell. See D-067.

Fail loudly

There is no silent-degradation path (RFC §3.4, CLAUDE.md §13). trajectory.ErrUnserializable from Trajectory.Serialize propagates verbatim out of Request — a pause whose trajectory cannot serialise is rejected loud, never half-persisted. trajectory.ErrToolContextLost from HandleRegistry.Get propagates verbatim out of Resume — a run is never resumed with a nil tool context. Missing identity fails closed with ErrIdentityRequired; an unknown token surfaces ErrPauseNotFound.

Concurrent reuse (D-025)

The Coordinator is a compiled artifact: immutable after construction, with per-call state living in ctx + arguments and the pause registry behind a documented-invariant mutex. One Coordinator is safe to share across N concurrent goroutines; concurrent_test.go pins N≥100 under -race.

§13 primitive-with-consumer

Phase 50 ships the primitive. The first end-to-end RequestPause-driven-through-the-Coordinator consumer is Phase 53 (steering wiring) — same wave, Stage 3. Phase 51 (Stage 2) also consumes this package's surface (the pause-record serialise contract). See the phase-50 plan's "§13 primitive-with-consumer obligation" section and D-067.

Index

Constants

View Source
const (
	// EventTypePauseRequested — emitted by Coordinator.Request when a
	// pause record is created. Payload is PauseRequestedPayload.
	EventTypePauseRequested events.EventType = "pause.requested"
	// EventTypePauseResumed — emitted by Coordinator.Resume when a
	// pause record terminates. Payload is PauseResumedPayload.
	EventTypePauseResumed events.EventType = "pause.resumed"
	// EventTypePausePayloadArtifactRouted — emitted by the Phase 72e
	// `pause.list` snapshot path when a pause record's Payload
	// serialised size meets or exceeds the configured heavy-content
	// threshold and is routed through the ArtifactStore instead of
	// shipped inline (D-026 — context-window safety net applied to
	// Protocol read snapshots). The emit makes the bypass LOUD — a
	// heavy payload is never silently truncated. Payload is
	// PausePayloadArtifactRoutedPayload.
	EventTypePausePayloadArtifactRouted events.EventType = "pause.payload_artifact_routed"
)

Canonical event types emitted by the Coordinator. Registered into the events package's canonical registry from this package's init() so a Publish never trips events.ErrUnknownEventType.

View Source
const (
	// DefaultListPageSize is the per-page row count applied when a
	// ListRequest leaves PageSize zero.
	DefaultListPageSize = 50
	// MaxListPageSize is the largest PageSize Coordinator.List accepts;
	// a larger value fails closed with ErrInvalidPage.
	MaxListPageSize = 200
)

Pause-list pagination bounds (Phase 72e / D-110). The defaults mirror the Protocol-edge types.PauseListRequest contract — a List caller that leaves Page / PageSize zero gets the documented defaults; a PageSize above the max fails closed with ErrInvalidPage (never a silent clamp). Kept here as the runtime-side single source so the Coordinator validation and the Protocol-edge handler agree.

View Source
const (
	// ReasonApprovalRequired — a human needs to approve a
	// planner-chosen tool call before execution (HITL).
	ReasonApprovalRequired = planner.PauseApprovalRequired
	// ReasonAwaitInput — the run needs additional input from the
	// user / supervisor before continuing.
	ReasonAwaitInput = planner.PauseAwaitInput
	// ReasonExternalEvent — the run is waiting on an external event
	// (webhook, scheduled trigger, A2A callback, tool-side OAuth
	// completion).
	ReasonExternalEvent = planner.PauseExternalEvent
	// ReasonConstraintsConflict — a constraint conflict (budget vs.
	// tool requirement, identity scope mismatch) requires operator
	// resolution.
	ReasonConstraintsConflict = planner.PauseConstraintsConflict
)

The four canonical pause reasons (RFC §6.3 — settled). Re-exported from the planner package via the Reason typedef bridge so callers in the runtime tree reach them under the pauseresume namespace.

View Source
const DefaultSweepInterval = time.Minute

DefaultSweepInterval is the sweep cadence applied when no WithSweepInterval option is given. Mirrors the documented `pauseresume.sweep_interval` config default.

View Source
const FormatVersion = 1

FormatVersion is the pause-record wire-format version. RFC §6.3 settles the pause-state serialisation format as "JSON with format_version: 1" — aligned with the event bus (also JSON) and operational simplicity (resolves brief 02 Q-2). Phase 50 shipped the checkpointRecord envelope with this field as the forward-compat hinge; Phase 51 closes the fail-loudly serialise contract ON it.

FormatVersion is an int so a future format bump is a single-line change with a deterministic, comparable guard on load (see DeserializeRecord). Bumping it is an RFC change.

Variables

View Source
var (
	// ErrIdentityRequired — Request / Resume / Status was called with
	// an identity triple missing one of (tenant, user, session). The
	// Coordinator fails closed (CLAUDE.md §6 rule 9 + D-001); there is
	// no identity-downgrading knob.
	ErrIdentityRequired = errors.New("pauseresume: identity triple incomplete")

	// ErrPauseNotFound — Resume / Status was called for a Token with
	// no live pause record (and, when a checkpoint store is configured,
	// no persisted checkpoint either). Typical cause: an already-cleared
	// resume, or a token from a different Runtime process with no
	// shared checkpoint store.
	ErrPauseNotFound = errors.New("pauseresume: pause token not found")

	// ErrAlreadyResumed — Resume was called for a Token whose pause
	// record is already in StatusResumed. Resume is idempotent: the
	// second call is rejected loud rather than re-applying side
	// effects.
	ErrAlreadyResumed = errors.New("pauseresume: pause already resumed")

	// ErrScopeMismatch — Resume was called with an identity triple
	// whose (tenant, user, session) does not match the triple the
	// pause was Requested under. Authentication on resume is checked
	// against the original pause's identity scope (RFC §3.3).
	ErrScopeMismatch = errors.New("pauseresume: resume identity scope does not match pause")

	// ErrInvalidReason — Request was called with a Reason that is not
	// one of the four canonical pause reasons (RFC §6.3). Fails closed
	// rather than recording a malformed pause record.
	ErrInvalidReason = errors.New("pauseresume: invalid pause reason")

	// ErrCheckpointCorrupt — a checkpoint loaded from the StateStore
	// failed to decode into a pause record. Surfaces store corruption
	// loud rather than resuming with a half-decoded record.
	ErrCheckpointCorrupt = errors.New("pauseresume: checkpoint record is corrupt")

	// ErrInvalidDecision — Resume was called with a Decision that is
	// not one of the four canonical values (approve / reject / resume
	// / timeout). Fails closed rather than emitting a `pause.resumed`
	// event with an untyped Decision — the §13 fail-loudly contract
	// the typed Decision marker exists to enforce (issue #113, D-096).
	ErrInvalidDecision = errors.New("pauseresume: invalid resume decision")

	// ErrUnsupportedFormatVersion — a pause record loaded from the
	// StateStore carries a format_version this Runtime does not
	// recognise (a zero/absent version, or a higher version written by
	// a newer Runtime). The load-side half of the RFC §6.3 "JSON with
	// format_version: 1" contract: rather than silently mis-decoding a
	// forward-incompatible record against the current schema, the load
	// fails loud (Phase 51 / D-069).
	ErrUnsupportedFormatVersion = errors.New("pauseresume: unsupported pause-record format_version")

	// ErrInvalidPage — Coordinator.List was called with a pagination
	// shape outside the accepted bounds: a negative Page, a negative
	// PageSize, or a PageSize above MaxListPageSize. The List path
	// fails closed rather than silently clamping — a silent clamp would
	// defeat the per-row identity boundary the snapshot guarantees
	// (Phase 72e / D-110).
	ErrInvalidPage = errors.New("pauseresume: invalid pause-list pagination")

	// ErrCrossTenantScope — Coordinator.List was called with a
	// ListFilter naming a tenant other than the caller's own (or more
	// than one tenant) without ListRequest.AdminScoped set. Cross-tenant
	// pause visibility requires the verified auth.ScopeAdmin claim
	// (D-079); the Coordinator fails closed rather than leaking
	// foreign-tenant pause records (Phase 72e / D-110).
	ErrCrossTenantScope = errors.New("pauseresume: cross-tenant pause-list requires the admin scope claim")

	// ErrSweeperMisconfigured — RunSweeper was started against a
	// Coordinator the sweeper cannot maintain: a foreign Coordinator
	// implementation (the maintenance scan needs this package's
	// concrete registry), or one constructed without
	// WithMaxParkDuration (nothing would ever expire — a sweeper that
	// silently spins forever reaping nothing is the §13
	// silent-degradation shape). Phase 111c / D-200.
	ErrSweeperMisconfigured = errors.New("pauseresume: sweeper misconfigured")
)

Sentinel errors. Callers compare via errors.Is.

Two fail-loudly errors are NOT redefined here: trajectory.ErrUnserializable (raised by trajectory.Trajectory.Serialize when a pause request's trajectory carries a non-JSON-encodable leaf) and trajectory.ErrToolContextLost (raised by trajectory.HandleRegistry.Get when a handle cannot be re-attached on resume). The Coordinator propagates both verbatim; callers reach them via errors.As against the trajectory package's struct sentinels. Redefining them here would fork the fail-loudly contract Phase 43 already owns.

Functions

func DeserializeRecord

func DeserializeRecord(b []byte) (checkpointRecord, error)

DeserializeRecord parses canonical pause-record JSON bytes back into a checkpointRecord, failing LOUD on a corrupt envelope or an unsupported format_version — never returning a half-decoded record.

  • Malformed JSON or a type mismatch on a typed field surfaces ErrCheckpointCorrupt (wrapping the underlying json error).
  • A format_version the runtime does not recognise surfaces ErrUnsupportedFormatVersion naming the version read — a record written by a newer Runtime is rejected loud, not silently mis-decoded against the current schema.

The format_version guard is the load-side half of the RFC §6.3 "JSON with format_version: 1" contract: SerializeRecord stamps it, DeserializeRecord enforces it. A pause record with a missing / zero / unknown version is a corruption or a forward-incompatible write — both fail loud here (D-069).

func IsValidDecision

func IsValidDecision(d Decision) bool

IsValidDecision reports whether d is one of the four canonical pause-resume Decision values. An empty Decision is rejected loud — a `pause.resumed` event without a Decision is the bug shape this enum closes.

func IsValidReason

func IsValidReason(r Reason) bool

IsValidReason reports whether r is one of the four canonical pause reasons. Delegates to planner.IsValidPauseReason — the planner package owns the enum, this package owns the bridge.

func RunSweeper added in v1.3.0

func RunSweeper(ctx context.Context, coord Coordinator, opts ...SweeperOption) error

RunSweeper runs the pause sweeper until ctx is cancelled. It is the ONE exported entry into the pause-lifecycle maintenance loop (Phase 111c / D-200): on every tick it scans the Coordinator's pause registry and resumes each pause past its max-park deadline with DecisionTimeout, under the pause's own identity scope.

RunSweeper blocks; callers run it on a goroutine they cancel + join on shutdown (CLAUDE.md §5 concurrency rules — the assembly registers it on the closer chain). It returns ctx.Err() when cancelled — the caller treats context.Canceled as a clean shutdown.

Fail-loud preconditions (ErrSweeperMisconfigured):

  • coord must be this package's Coordinator (pauseresume.New) — the maintenance scan reads the concrete registry (see the file doc for why List cannot serve it).
  • the Coordinator must carry a max-park duration (WithMaxParkDuration > 0) — a sweeper over never-expiring pauses would silently reap nothing forever.

Per-record reap failures (a store error on the checkpoint delete) are logged at Error and do NOT stop the loop: one wedged record must not shield every other expired pause from being reaped. A PRE-flip failure leaves the entry paused, so the next pass's expired scan retries it (a lost tool-context handle is no longer in this class — timeout resumes skip the handle re-attach, D-207); a POST-flip failure (the checkpoint delete after Resume already flipped the entry) marks the entry delete-pending and the next pass's retryPendingDeletes phase re-attempts the delete + the skipped pause.resumed emit — a resumed-but-undeleted checkpoint must never orphan silently (Wave C checkpoint audit). Losing a reap race to a legitimate concurrent Resume (ErrAlreadyResumed / ErrPauseNotFound) is benign — the pause resolved exactly once — and is not logged as a failure.

Each pass also rescues CRASH-ORPHANED checkpoints — store rows whose pause was never rehydrated because the process that parked it died — into the registry via the StateStore's maintenance scan (rescanCrashOrphans, D-207), so the max-park ceiling applies to them like any live pause.

func SerializeRecord

func SerializeRecord(rec checkpointRecord) ([]byte, error)

SerializeRecord returns the canonical JSON bytes of a pause-record envelope, failing LOUD on ANY non-JSON-encodable leaf — never silently dropping a field.

The contract (RFC §6.3 + §3.4, D-069)

This is the Phase 51 closure of the silent-context-loss bug for the pause record's OWN envelope. Phase 43 already closed it for the trajectory (trajectory.Trajectory.Serialize); Phase 50 propagated trajectory.ErrUnserializable verbatim out of Request. But the pause record carries one more caller-controlled, JSON-tree-shaped field — Payload map[string]any — and Phase 50's checkpoint save reached it via a bare json.Marshal. A bare json.Marshal on a non-encodable Payload leaf returns *json.UnsupportedTypeError: technically loud, but WITHOUT the actionable dotted field path the fail-loudly contract requires (RFC §3.4: "MUST return ErrUnserializable naming the offending field path"). SerializeRecord closes that gap.

  • Success: returns canonical JSON bytes with format_version set to the current FormatVersion. The round-trip SerializeRecord → DeserializeRecord → SerializeRecord is byte-identical for any record whose Payload holds JSON-tree shapes (map[string]any / []any / primitives).
  • Failure: returns (nil, trajectory.ErrUnserializable{Field: "PauseRecord.payload.<key>"}) — the offending leaf is named in the caller's own envelope vocabulary. No silent-drop path; no half-persisted checkpoint (coordinator.go rejects the Request before touching the in-memory registry or the store).

The pre-flight reflective walk is trajectory.ValidateEncodable — the SAME walker Phase 43 uses for the trajectory. Phase 51 does NOT re-implement a second fail-loudly serialiser (that would be the CLAUDE.md §13 two-parallel-implementations anti-pattern, exactly the shape the Wave 8 audit's capfilter extraction killed); it shares the Phase 43 primitive. See D-069's "reuse vs share" call.

FormatVersion is stamped here, not trusted from the caller: the caller hands SerializeRecord a checkpointRecord and SerializeRecord owns the version field. This keeps "what version did we write" a single source of truth.

Types

type Coordinator

type Coordinator interface {
	// Request records a pause and returns its opaque Token. When a
	// checkpoint store is configured the pause record (including the
	// optional trajectory) is persisted; a non-serialisable trajectory
	// fails loud with trajectory.ErrUnserializable and nothing is
	// persisted.
	Request(ctx context.Context, req PauseRequest) (Pause, error)

	// Resume terminates a pause: it validates the resuming identity
	// scope, re-attaches any tool-context handles via the
	// HandleRegistry (propagating trajectory.ErrToolContextLost on a
	// lost handle), marks the record resumed, and clears the
	// checkpoint. A second Resume of the same Token returns
	// ErrAlreadyResumed.
	//
	// `decision` is the typed marker carried on the emitted
	// `pause.resumed` event so wire consumers (the Console, third-party
	// clients, integration tests) can distinguish approve from reject
	// from generic resume from timeout without parsing free-form
	// `Reason` strings. An invalid Decision is rejected loud with
	// ErrInvalidDecision — there is no untyped default (§13 fail-loudly,
	// D-096).
	Resume(ctx context.Context, token Token, decision Decision, payload map[string]any) error

	// Status returns a read-only snapshot of a pause record's
	// lifecycle. When the Token is absent from the in-memory registry
	// but a checkpoint store is configured, Status falls back to a
	// checkpoint load (the restart-survival path). An unknown Token
	// returns ErrPauseNotFound.
	Status(ctx context.Context, token Token) (Status, error)

	// List returns a paginated snapshot of pause records visible under
	// the caller's identity scope (Phase 72e — the `pause.list`
	// Protocol surface). Read-only: it does NOT mutate the registry,
	// does NOT call Resume, and does NOT clear checkpoints.
	//
	// Identity-mandatory: a missing (tenant, user, session) triple on
	// req.Identity returns wrapped ErrIdentityRequired. Pagination is
	// mandatory — a 0 / negative / over-max PageSize, or a negative
	// Page, returns wrapped ErrInvalidPage; the snapshot is never
	// silently clamped.
	//
	// Cross-tenant visibility: when req.Filter.TenantIDs names a tenant
	// other than req.Identity.TenantID (or more than one tenant), the
	// caller MUST set req.AdminScoped — otherwise List returns wrapped
	// ErrCrossTenantScope. The Coordinator does NOT read the scope from
	// ctx; the Protocol-edge handler is responsible for verifying the
	// `auth.ScopeAdmin` claim and setting AdminScoped (separation of
	// concerns — D-079).
	List(ctx context.Context, req ListRequest) (ListResponse, error)
}

Coordinator is Harbor's unified pause/resume primitive. One Coordinator is built per Runtime process and shared across all runs; it is safe for concurrent use by N goroutines (D-025).

Implementations MUST:

  • mint an opaque, unique Token per Request;
  • fail closed on a missing identity triple (ErrIdentityRequired);
  • propagate trajectory.ErrUnserializable verbatim out of Request and trajectory.ErrToolContextLost verbatim out of Resume — no silent-degradation path;
  • validate the resuming identity scope against the pause's scope (ErrScopeMismatch);
  • treat Resume as idempotent — a second Resume of the same Token returns ErrAlreadyResumed, never a double-apply.

func New

func New(opts ...Option) Coordinator

New constructs the V1 process-local Coordinator. The returned value is immutable after construction (D-025) and safe for concurrent use by N goroutines.

With no options, the Coordinator is fully process-local: no checkpoint store (pauses do not survive restart), a fresh process-local handle registry, no event bus, time.Now as the clock.

type Decision

type Decision string

Decision is the typed marker carried on a `pause.resumed` event so wire consumers can distinguish the *kind* of resume (approve vs. reject vs. generic resume vs. timeout) without parsing free-form `Reason` strings.

The Phase 31 approval gate already owns its own narrower enum (`approval.ApprovalDecision` — approve / reject / pending) for the gate-internal resolution channel; that enum is approval-specific and stays where it lives. This `pauseresume.Decision` is the broader runtime-level enum that ALSO covers tool-side OAuth completion (`DecisionResume`) and deadline-driven resumes (`DecisionTimeout`). A single enum on `Coordinator.Resume` would force either pollution of `approval.ApprovalDecision` with non-approval values or a parallel "ApprovalDecision vs PauseResumeDecision" split — both are §13 "two parallel implementations of the same conceptual feature" smells. Keeping the gate-internal enum narrow and the coordinator-edge enum broad is the right factoring.

Wire consumers (the Console, third-party clients, integration tests) switch on this typed value rather than parsing `Reason` strings — the audit-flagged anti-pattern issue #113 / D-096 closes.

const (
	// DecisionApprove — the gate's approver said yes (HITL approval).
	// Mirrors the steering inbox's `ControlApprove` and the approval
	// package's `DecisionApprove`.
	DecisionApprove Decision = "approve"

	// DecisionReject — the gate's approver said no (HITL rejection).
	// Mirrors the steering inbox's `ControlReject` and the approval
	// package's `DecisionReject`. A REJECT terminates the run with
	// `Finish{ConstraintsConflict}` in the RunLoop (D-071).
	DecisionReject Decision = "reject"

	// DecisionResume — a generic resume of a non-approval pause. The
	// canonical producer is the tool-side OAuth provider completing a
	// flow (Phase 30), but any future non-approval resume (a steering
	// `RESUME` not tied to an APPROVE / REJECT, an A2A `INPUT_REQUIRED`
	// fulfilled) lands here.
	DecisionResume Decision = "resume"

	// DecisionTimeout — a deadline-driven resume (the pause's max-park
	// window elapsed and the runtime resumed it to surface a
	// constraint-conflict). Produced by the pause sweeper (sweeper.go,
	// Phase 111c / D-200) when a pause outlives the Coordinator's
	// WithMaxParkDuration ceiling. Terminal for the waiting run: the
	// steering RunLoop finishes it with Finish{ConstraintsConflict}
	// (the D-071 REJECT posture applied to deadlines), never a silent
	// unpark-and-continue.
	DecisionTimeout Decision = "timeout"
)

type ListFilter

type ListFilter struct {
	// States narrows by lifecycle state. Empty defaults to
	// [StatusPaused] — the intervention-queue use case.
	States []State
	// TenantIDs narrows to a tenant set. Empty defaults to the
	// caller's own tenant. A foreign tenant OR len>1 requires
	// ListRequest.AdminScoped.
	TenantIDs []string
	// UserIDs narrows to a user set within the visible tenants.
	UserIDs []string
	// SessionIDs narrows to a session set.
	SessionIDs []string
	// RunIDs narrows to a run set.
	RunIDs []string
	// Reasons narrows to one or more canonical pause reasons.
	Reasons []Reason
	// Since is an optional lower bound on PausedAt (inclusive).
	Since time.Time
	// Until is an optional upper bound on PausedAt (inclusive).
	Until time.Time
}

ListFilter is the runtime-internal filter shape for Coordinator.List. Empty slices are wildcards; a zero Since / Until is "no bound".

type ListRequest

type ListRequest struct {
	// Identity is the caller's (tenant, user, session) triple.
	// Mandatory — an incomplete triple fails closed with
	// ErrIdentityRequired.
	Identity identity.Identity
	// Filter narrows the snapshot. An empty filter means "the caller's
	// own identity scope, status=paused".
	Filter ListFilter
	// Page is the 1-based page number. 0 is treated as 1; a negative
	// Page fails closed with ErrInvalidPage.
	Page int
	// PageSize is the per-page row count. 0 is treated as the default
	// (DefaultListPageSize); a negative or over-max value fails closed
	// with ErrInvalidPage.
	PageSize int
	// AdminScoped is true when the caller carries the verified
	// auth.ScopeAdmin claim. Set by the Protocol-edge handler; the
	// Coordinator itself does NOT read the scope from ctx.
	AdminScoped bool
}

ListRequest is the input to Coordinator.List — the runtime-internal projection of the Protocol-edge types.PauseListRequest.

type ListResponse

type ListResponse struct {
	// Snapshots is the page of pause records, ordered PausedAt
	// descending (newest first).
	Snapshots []Pause
	// Statuses is parallel to Snapshots — Statuses[i] is the lifecycle
	// status of Snapshots[i].
	Statuses []Status
	// Page is the 1-based page number this response covers.
	Page int
	// PageSize is the per-page row count applied.
	PageSize int
	// PageCount is the total number of pages over the filtered set.
	PageCount int
	// TotalRows is the total filtered row count across all pages.
	TotalRows int
	// Truncated is true when a status=resumed filter was requested but
	// the resumed slice has aged out of the in-memory registry — see
	// the Coordinator's destructive-on-resume contract (coordinator.go).
	Truncated bool
}

ListResponse is the value returned by Coordinator.List.

type Option

type Option func(*coordinator)

Option configures a coordinator at construction. Options are applied in order; later options override earlier ones for the same field.

func WithBus

func WithBus(b events.EventBus) Option

WithBus hands the Coordinator an event bus. When set, Request emits pause.requested and Resume emits pause.resumed. When not set, no events are emitted (the Coordinator still functions — event emission is observability, not correctness).

func WithCheckpointStore

func WithCheckpointStore(s state.StateStore) Option

WithCheckpointStore hands the Coordinator a state.StateStore for durable pauses. When set, Request persists every pause record and a fresh Coordinator over the same store rehydrates pauses on demand — pauses survive a Runtime restart. When NOT set, pauses are process-local only and explicitly do not survive restart.

Phase 50 deliberately does not mint a parallel persistence-driver seam: state.StateStore is already the §4.4 persistence seam (three V1 drivers at conformance parity). See D-067.

func WithClock

func WithClock(now func() time.Time) Option

WithClock overrides the wall-clock source. Defaults to time.Now. Tests pass a controllable clock so PausedAt / ResumedAt are deterministic (CLAUDE.md §11).

func WithHandleRegistry

func WithHandleRegistry(r trajectory.HandleRegistry) Option

WithHandleRegistry overrides the handle registry used to re-attach the non-serialisable half of ToolContext on resume. Defaults to a fresh process-local registry (trajectory.NewProcessLocalRegistry). Pass a shared registry when tool dispatch and pause/resume must see the same handle table.

func WithMaxParkDuration added in v1.3.0

func WithMaxParkDuration(d time.Duration) Option

WithMaxParkDuration sets the operator-configured ceiling on how long a pause may stay parked (Phase 111c / D-200, RFC §3.3's typed `timeout` Decision — D-096). When d > 0, every pause carries an expiry derived from PausedAt + d, and the pause sweeper (RunSweeper) resumes expired pauses with DecisionTimeout — terminal for the waiting run (a deadline the human missed is a constraint the planner cannot resolve; mirrors D-071's REJECT posture). A non-positive d is the documented "never expire" default — today's pre-111c behaviour, not an error.

type Pause

type Pause struct {
	// Token is the opaque runtime-issued handle for this pause.
	Token Token
	// Reason is one of the four canonical pause reasons.
	Reason Reason
	// Payload is the sanitised, bounded pause payload (auth URL +
	// scopes for OAuth, approval context for HITL, etc.).
	Payload map[string]any
	// PausedAt is the wall-clock time the pause was recorded.
	PausedAt time.Time
	// Identity is the (tenant, user, session) triple the pause was
	// recorded under. Resume validates the resuming scope against it.
	Identity identity.Identity
}

Pause is the value returned by Coordinator.Request: the opaque Token plus the sanitised reason / payload / timestamp / identity of the paused run. Payload is depth/size-bounded by the caller (the Protocol edge enforces the RFC §6.3 steering-payload bounds before a pause request ever reaches the Coordinator); this package treats Payload as opaque sanitised data.

type PausePayloadArtifactRoutedPayload

type PausePayloadArtifactRoutedPayload struct {
	events.SafeSealed
	// Token is the opaque pause Token whose Payload was routed.
	Token string
	// ArtifactID is the content-addressed ID of the artifact the
	// heavy Payload was materialised into.
	ArtifactID string
	// PayloadBytes is the serialised byte length of the routed Payload.
	PayloadBytes int
	// ThresholdBytes is the configured heavy-content threshold the
	// PayloadBytes met or exceeded.
	ThresholdBytes int
}

PausePayloadArtifactRoutedPayload is the typed payload for a `pause.payload_artifact_routed` event. SafePayload by construction: every field is the runtime's own bookkeeping — the opaque pause Token, the content-addressed artifact ref ID, and the byte sizes involved. NO caller-controlled payload bytes are carried (the heavy payload itself went to the ArtifactStore, which is the whole point of the bypass). Phase 72e (D-110, D-026).

type PauseRequest

type PauseRequest struct {
	// Identity is the (tenant, user, session) triple the paused run
	// belongs to. Mandatory — a partial triple fails closed with
	// ErrIdentityRequired (CLAUDE.md §6 rule 9).
	Identity identity.Identity
	// Reason is one of the four canonical pause reasons. An invalid
	// reason fails closed with ErrInvalidReason.
	Reason Reason
	// Payload is the sanitised, bounded pause payload. Optional;
	// may be nil.
	Payload map[string]any
	// Trajectory, when non-nil, is checkpointed alongside the pause
	// record when a checkpoint store is configured. A non-serialisable
	// trajectory fails Request loud with trajectory.ErrUnserializable
	// — the pause is NOT half-persisted. Optional; may be nil (a pause
	// with no trajectory is valid — e.g. a pre-run approval gate).
	Trajectory *trajectory.Trajectory
}

PauseRequest is the input to Coordinator.Request.

type PauseRequestedPayload

type PauseRequestedPayload struct {
	events.SafeSealed
	// Token is the opaque pause Token.
	Token string
	// Reason is the canonical pause reason string.
	Reason string
}

PauseRequestedPayload is the typed payload for a pause.requested event. SafePayload by construction: every field is the Coordinator's own bookkeeping. The Token is opaque and carries no caller bytes; the Reason is one of four canonical enum values. The pause Payload itself is NOT carried on the event — it may carry caller-controlled data (an OAuth auth URL, approval context) and is left to the Protocol-edge projection (a later phase) to redact/bound. Observers that need the payload read it via Coordinator.Status.

type PauseResumedPayload

type PauseResumedPayload struct {
	events.SafeSealed
	// Token is the opaque pause Token.
	Token string
	// Reason is the canonical pause reason string (one of the four
	// canonical pause reasons — the reason the pause was *requested*).
	Reason string
	// Decision is the typed marker indicating *how* the pause resumed
	// (approve / reject / resume / timeout). Wire consumers switch on
	// this value rather than parsing `Reason` strings. See D-096.
	Decision Decision
}

PauseResumedPayload is the typed payload for a pause.resumed event. SafePayload by construction — Token + Reason + Decision only, no caller bytes.

`Decision` is the load-bearing typed marker wire consumers (the Console, third-party clients, integration tests) switch on to distinguish approve from reject from generic resume from timeout — the §13 "overloaded `Reason` string" anti-pattern issue #113 / D-096 closes. `Reason` is the human-readable pause-reason classification preserved for context; `Decision` is the typed channel observers branch on.

type Reason

type Reason = planner.PauseReason

Reason is the pause-reason enum carried on a pause record. It is a byte-stable typedef bridge onto the planner-side planner.PauseReason (RFC §6.3, D-047): the planner package is the truth source for the four canonical values; this package re-exports them under the pauseresume namespace so runtime call sites do not import the planner package directly. The typedef (not a fresh type) keeps the two byte-identical.

type State

type State string

State is the lifecycle state of a pause record.

const (
	// StatusPaused — the run is paused; Resume has not been called.
	StatusPaused State = "paused"
	// StatusResumed — Resume has been called for this Token; the
	// record is terminal (a second Resume returns ErrAlreadyResumed).
	StatusResumed State = "resumed"
)

type Status

type Status struct {
	// State is StatusPaused or StatusResumed.
	State State
	// Reason is the pause reason recorded at Request time.
	Reason Reason
	// PausedAt is the wall-clock time the pause was recorded.
	PausedAt time.Time
	// ResumedAt is the wall-clock time Resume was called; the zero
	// value unless State == StatusResumed.
	ResumedAt time.Time
	// Decision is the typed marker the pause was resumed with
	// (approve / reject / resume / timeout — D-096); the zero value
	// while State == StatusPaused. Lets an in-process observer (the
	// steering RunLoop's timeout detection — Phase 111c / D-200)
	// distinguish a sweeper-reaped timeout from a legitimate resume
	// without parsing event payloads. Process-local: a rehydrated
	// (restart-survival) Status never carries a Decision because a
	// resumed checkpoint is deleted, not kept.
	Decision Decision
}

Status is the value returned by Coordinator.Status: a read-only snapshot of a pause record's lifecycle without mutating it.

type SweeperOption added in v1.3.0

type SweeperOption func(*sweeperConfig)

SweeperOption configures RunSweeper. Options are applied in order; later options override earlier ones for the same field.

func WithSweepInterval added in v1.3.0

func WithSweepInterval(d time.Duration) SweeperOption

WithSweepInterval overrides the sweep cadence. Non-positive values fall back to DefaultSweepInterval.

func WithSweeperLogger added in v1.3.0

func WithSweeperLogger(l *slog.Logger) SweeperOption

WithSweeperLogger hands the sweeper a logger for reap / failure lines. Defaults to slog.Default().

type Token

type Token string

Token is the opaque, runtime-issued handle for a paused run. Clients never construct or parse a Token — the runtime owns the encoding (Phase 50 uses a ULID source). A Token is unique per Request call.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL