Documentation
¶
Overview ¶
Package pauseresume ships Harbor's ONE pause/resume primitive — the unified Coordinator that HITL approval, tool-side OAuth, A2A AUTH_REQUIRED / INPUT_REQUIRED, and operator/Console PAUSE all converge on (CLAUDE.md §7 rule 4, RFC §3.3 + §6.3). There are not four parallel pause implementations; there is this one.
The shape ¶
The Coordinator interface is three methods — Request, Resume, Status — keyed on an opaque, runtime-owned Token. Planners never instantiate the Coordinator and never reach into it: a planner signals "I need a pause" by returning the planner.RequestPause Decision shape (Phase 42 / D-047), and the runtime executor (a later phase) drives the Coordinator. This package never imports the planner package — it consumes only the planner.PauseReason enum, via the byte-stable Reason typedef bridge below. That one-directional dependency on a pure enum keeps the swappable-planner property intact (the master-plan flags Phase 50 as a critical-path phase for exactly this reason).
Durability ¶
A pause is always recorded in a process-local registry keyed by Token. Durability — surviving a Runtime restart — rides on the existing state.StateStore (Phase 07), handed to the Coordinator as an OPTIONAL checkpoint store via WithCheckpointStore. When a checkpoint store is configured, Request serialises the pause record (including the trajectory, via trajectory.Trajectory.Serialize) and Saves it; a fresh Coordinator over the same store rehydrates the pause on demand. When NO checkpoint store is configured, pauses are process-local only and explicitly do NOT survive restart. This is the master-plan acceptance criterion verbatim: "pauses survive Runtime restart only when StateStore-backed checkpoint is configured."
Phase 50 deliberately does NOT mint a second persistence-driver seam. state.StateStore is already the §4.4 persistence seam, with three V1 drivers (in-mem / SQLite / Postgres) at conformance parity; a parallel CheckpointStore interface would be the §13 two-parallel-implementations smell. See D-067.
Fail loudly ¶
There is no silent-degradation path (RFC §3.4, CLAUDE.md §13). trajectory.ErrUnserializable from Trajectory.Serialize propagates verbatim out of Request — a pause whose trajectory cannot serialise is rejected loud, never half-persisted. trajectory.ErrToolContextLost from HandleRegistry.Get propagates verbatim out of Resume — a run is never resumed with a nil tool context. Missing identity fails closed with ErrIdentityRequired; an unknown token surfaces ErrPauseNotFound.
Concurrent reuse (D-025) ¶
The Coordinator is a compiled artifact: immutable after construction, with per-call state living in ctx + arguments and the pause registry behind a documented-invariant mutex. One Coordinator is safe to share across N concurrent goroutines; concurrent_test.go pins N≥100 under -race.
§13 primitive-with-consumer ¶
Phase 50 ships the primitive. The first end-to-end RequestPause-driven-through-the-Coordinator consumer is Phase 53 (steering wiring) — same wave, Stage 3. Phase 51 (Stage 2) also consumes this package's surface (the pause-record serialise contract). See the phase-50 plan's "§13 primitive-with-consumer obligation" section and D-067.
Index ¶
- Constants
- Variables
- func DeserializeRecord(b []byte) (checkpointRecord, error)
- func IsValidDecision(d Decision) bool
- func IsValidReason(r Reason) bool
- func RunSweeper(ctx context.Context, coord Coordinator, opts ...SweeperOption) error
- func SerializeRecord(rec checkpointRecord) ([]byte, error)
- type Coordinator
- type Decision
- type ListFilter
- type ListRequest
- type ListResponse
- type Option
- type Pause
- type PausePayloadArtifactRoutedPayload
- type PauseRequest
- type PauseRequestedPayload
- type PauseResumedPayload
- type Reason
- type State
- type Status
- type SweeperOption
- type Token
Constants ¶
const ( // EventTypePauseRequested — emitted by Coordinator.Request when a // pause record is created. Payload is PauseRequestedPayload. EventTypePauseRequested events.EventType = "pause.requested" // EventTypePauseResumed — emitted by Coordinator.Resume when a // pause record terminates. Payload is PauseResumedPayload. EventTypePauseResumed events.EventType = "pause.resumed" // EventTypePausePayloadArtifactRouted — emitted by the Phase 72e // `pause.list` snapshot path when a pause record's Payload // serialised size meets or exceeds the configured heavy-content // threshold and is routed through the ArtifactStore instead of // shipped inline (D-026 — context-window safety net applied to // Protocol read snapshots). The emit makes the bypass LOUD — a // heavy payload is never silently truncated. Payload is // PausePayloadArtifactRoutedPayload. EventTypePausePayloadArtifactRouted events.EventType = "pause.payload_artifact_routed" )
Canonical event types emitted by the Coordinator. Registered into the events package's canonical registry from this package's init() so a Publish never trips events.ErrUnknownEventType.
const ( // DefaultListPageSize is the per-page row count applied when a // ListRequest leaves PageSize zero. DefaultListPageSize = 50 // MaxListPageSize is the largest PageSize Coordinator.List accepts; // a larger value fails closed with ErrInvalidPage. MaxListPageSize = 200 )
Pause-list pagination bounds (Phase 72e / D-110). The defaults mirror the Protocol-edge types.PauseListRequest contract — a List caller that leaves Page / PageSize zero gets the documented defaults; a PageSize above the max fails closed with ErrInvalidPage (never a silent clamp). Kept here as the runtime-side single source so the Coordinator validation and the Protocol-edge handler agree.
const ( // ReasonApprovalRequired — a human needs to approve a // planner-chosen tool call before execution (HITL). ReasonApprovalRequired = planner.PauseApprovalRequired // ReasonAwaitInput — the run needs additional input from the // user / supervisor before continuing. ReasonAwaitInput = planner.PauseAwaitInput // ReasonExternalEvent — the run is waiting on an external event // (webhook, scheduled trigger, A2A callback, tool-side OAuth // completion). ReasonExternalEvent = planner.PauseExternalEvent // ReasonConstraintsConflict — a constraint conflict (budget vs. // tool requirement, identity scope mismatch) requires operator // resolution. ReasonConstraintsConflict = planner.PauseConstraintsConflict )
The four canonical pause reasons (RFC §6.3 — settled). Re-exported from the planner package via the Reason typedef bridge so callers in the runtime tree reach them under the pauseresume namespace.
const DefaultSweepInterval = time.Minute
DefaultSweepInterval is the sweep cadence applied when no WithSweepInterval option is given. Mirrors the documented `pauseresume.sweep_interval` config default.
const FormatVersion = 1
FormatVersion is the pause-record wire-format version. RFC §6.3 settles the pause-state serialisation format as "JSON with format_version: 1" — aligned with the event bus (also JSON) and operational simplicity (resolves brief 02 Q-2). Phase 50 shipped the checkpointRecord envelope with this field as the forward-compat hinge; Phase 51 closes the fail-loudly serialise contract ON it.
FormatVersion is an int so a future format bump is a single-line change with a deterministic, comparable guard on load (see DeserializeRecord). Bumping it is an RFC change.
Variables ¶
var ( // ErrIdentityRequired — Request / Resume / Status was called with // an identity triple missing one of (tenant, user, session). The // Coordinator fails closed (CLAUDE.md §6 rule 9 + D-001); there is // no identity-downgrading knob. ErrIdentityRequired = errors.New("pauseresume: identity triple incomplete") // ErrPauseNotFound — Resume / Status was called for a Token with // no live pause record (and, when a checkpoint store is configured, // no persisted checkpoint either). Typical cause: an already-cleared // resume, or a token from a different Runtime process with no // shared checkpoint store. ErrPauseNotFound = errors.New("pauseresume: pause token not found") // ErrAlreadyResumed — Resume was called for a Token whose pause // record is already in StatusResumed. Resume is idempotent: the // second call is rejected loud rather than re-applying side // effects. ErrAlreadyResumed = errors.New("pauseresume: pause already resumed") // ErrScopeMismatch — Resume was called with an identity triple // whose (tenant, user, session) does not match the triple the // pause was Requested under. Authentication on resume is checked // against the original pause's identity scope (RFC §3.3). ErrScopeMismatch = errors.New("pauseresume: resume identity scope does not match pause") // ErrInvalidReason — Request was called with a Reason that is not // one of the four canonical pause reasons (RFC §6.3). Fails closed // rather than recording a malformed pause record. ErrInvalidReason = errors.New("pauseresume: invalid pause reason") // ErrCheckpointCorrupt — a checkpoint loaded from the StateStore // failed to decode into a pause record. Surfaces store corruption // loud rather than resuming with a half-decoded record. ErrCheckpointCorrupt = errors.New("pauseresume: checkpoint record is corrupt") // ErrInvalidDecision — Resume was called with a Decision that is // not one of the four canonical values (approve / reject / resume // / timeout). Fails closed rather than emitting a `pause.resumed` // event with an untyped Decision — the §13 fail-loudly contract // the typed Decision marker exists to enforce (issue #113, D-096). ErrInvalidDecision = errors.New("pauseresume: invalid resume decision") // ErrUnsupportedFormatVersion — a pause record loaded from the // StateStore carries a format_version this Runtime does not // recognise (a zero/absent version, or a higher version written by // a newer Runtime). The load-side half of the RFC §6.3 "JSON with // format_version: 1" contract: rather than silently mis-decoding a // forward-incompatible record against the current schema, the load // fails loud (Phase 51 / D-069). ErrUnsupportedFormatVersion = errors.New("pauseresume: unsupported pause-record format_version") // ErrInvalidPage — Coordinator.List was called with a pagination // shape outside the accepted bounds: a negative Page, a negative // PageSize, or a PageSize above MaxListPageSize. The List path // fails closed rather than silently clamping — a silent clamp would // defeat the per-row identity boundary the snapshot guarantees // (Phase 72e / D-110). ErrInvalidPage = errors.New("pauseresume: invalid pause-list pagination") // ErrCrossTenantScope — Coordinator.List was called with a // ListFilter naming a tenant other than the caller's own (or more // than one tenant) without ListRequest.AdminScoped set. Cross-tenant // pause visibility requires the verified auth.ScopeAdmin claim // (D-079); the Coordinator fails closed rather than leaking // foreign-tenant pause records (Phase 72e / D-110). ErrCrossTenantScope = errors.New("pauseresume: cross-tenant pause-list requires the admin scope claim") // ErrSweeperMisconfigured — RunSweeper was started against a // Coordinator the sweeper cannot maintain: a foreign Coordinator // implementation (the maintenance scan needs this package's // concrete registry), or one constructed without // WithMaxParkDuration (nothing would ever expire — a sweeper that // silently spins forever reaping nothing is the §13 // silent-degradation shape). Phase 111c / D-200. ErrSweeperMisconfigured = errors.New("pauseresume: sweeper misconfigured") )
Sentinel errors. Callers compare via errors.Is.
Two fail-loudly errors are NOT redefined here: trajectory.ErrUnserializable (raised by trajectory.Trajectory.Serialize when a pause request's trajectory carries a non-JSON-encodable leaf) and trajectory.ErrToolContextLost (raised by trajectory.HandleRegistry.Get when a handle cannot be re-attached on resume). The Coordinator propagates both verbatim; callers reach them via errors.As against the trajectory package's struct sentinels. Redefining them here would fork the fail-loudly contract Phase 43 already owns.
Functions ¶
func DeserializeRecord ¶
DeserializeRecord parses canonical pause-record JSON bytes back into a checkpointRecord, failing LOUD on a corrupt envelope or an unsupported format_version — never returning a half-decoded record.
- Malformed JSON or a type mismatch on a typed field surfaces ErrCheckpointCorrupt (wrapping the underlying json error).
- A format_version the runtime does not recognise surfaces ErrUnsupportedFormatVersion naming the version read — a record written by a newer Runtime is rejected loud, not silently mis-decoded against the current schema.
The format_version guard is the load-side half of the RFC §6.3 "JSON with format_version: 1" contract: SerializeRecord stamps it, DeserializeRecord enforces it. A pause record with a missing / zero / unknown version is a corruption or a forward-incompatible write — both fail loud here (D-069).
func IsValidDecision ¶
IsValidDecision reports whether d is one of the four canonical pause-resume Decision values. An empty Decision is rejected loud — a `pause.resumed` event without a Decision is the bug shape this enum closes.
func IsValidReason ¶
IsValidReason reports whether r is one of the four canonical pause reasons. Delegates to planner.IsValidPauseReason — the planner package owns the enum, this package owns the bridge.
func RunSweeper ¶ added in v1.3.0
func RunSweeper(ctx context.Context, coord Coordinator, opts ...SweeperOption) error
RunSweeper runs the pause sweeper until ctx is cancelled. It is the ONE exported entry into the pause-lifecycle maintenance loop (Phase 111c / D-200): on every tick it scans the Coordinator's pause registry and resumes each pause past its max-park deadline with DecisionTimeout, under the pause's own identity scope.
RunSweeper blocks; callers run it on a goroutine they cancel + join on shutdown (CLAUDE.md §5 concurrency rules — the assembly registers it on the closer chain). It returns ctx.Err() when cancelled — the caller treats context.Canceled as a clean shutdown.
Fail-loud preconditions (ErrSweeperMisconfigured):
- coord must be this package's Coordinator (pauseresume.New) — the maintenance scan reads the concrete registry (see the file doc for why List cannot serve it).
- the Coordinator must carry a max-park duration (WithMaxParkDuration > 0) — a sweeper over never-expiring pauses would silently reap nothing forever.
Per-record reap failures (a store error on the checkpoint delete) are logged at Error and do NOT stop the loop: one wedged record must not shield every other expired pause from being reaped. A PRE-flip failure leaves the entry paused, so the next pass's expired scan retries it (a lost tool-context handle is no longer in this class — timeout resumes skip the handle re-attach, D-207); a POST-flip failure (the checkpoint delete after Resume already flipped the entry) marks the entry delete-pending and the next pass's retryPendingDeletes phase re-attempts the delete + the skipped pause.resumed emit — a resumed-but-undeleted checkpoint must never orphan silently (Wave C checkpoint audit). Losing a reap race to a legitimate concurrent Resume (ErrAlreadyResumed / ErrPauseNotFound) is benign — the pause resolved exactly once — and is not logged as a failure.
Each pass also rescues CRASH-ORPHANED checkpoints — store rows whose pause was never rehydrated because the process that parked it died — into the registry via the StateStore's maintenance scan (rescanCrashOrphans, D-207), so the max-park ceiling applies to them like any live pause.
func SerializeRecord ¶
SerializeRecord returns the canonical JSON bytes of a pause-record envelope, failing LOUD on ANY non-JSON-encodable leaf — never silently dropping a field.
The contract (RFC §6.3 + §3.4, D-069) ¶
This is the Phase 51 closure of the silent-context-loss bug for the pause record's OWN envelope. Phase 43 already closed it for the trajectory (trajectory.Trajectory.Serialize); Phase 50 propagated trajectory.ErrUnserializable verbatim out of Request. But the pause record carries one more caller-controlled, JSON-tree-shaped field — Payload map[string]any — and Phase 50's checkpoint save reached it via a bare json.Marshal. A bare json.Marshal on a non-encodable Payload leaf returns *json.UnsupportedTypeError: technically loud, but WITHOUT the actionable dotted field path the fail-loudly contract requires (RFC §3.4: "MUST return ErrUnserializable naming the offending field path"). SerializeRecord closes that gap.
- Success: returns canonical JSON bytes with format_version set to the current FormatVersion. The round-trip SerializeRecord → DeserializeRecord → SerializeRecord is byte-identical for any record whose Payload holds JSON-tree shapes (map[string]any / []any / primitives).
- Failure: returns (nil, trajectory.ErrUnserializable{Field: "PauseRecord.payload.<key>"}) — the offending leaf is named in the caller's own envelope vocabulary. No silent-drop path; no half-persisted checkpoint (coordinator.go rejects the Request before touching the in-memory registry or the store).
The pre-flight reflective walk is trajectory.ValidateEncodable — the SAME walker Phase 43 uses for the trajectory. Phase 51 does NOT re-implement a second fail-loudly serialiser (that would be the CLAUDE.md §13 two-parallel-implementations anti-pattern, exactly the shape the Wave 8 audit's capfilter extraction killed); it shares the Phase 43 primitive. See D-069's "reuse vs share" call.
FormatVersion is stamped here, not trusted from the caller: the caller hands SerializeRecord a checkpointRecord and SerializeRecord owns the version field. This keeps "what version did we write" a single source of truth.
Types ¶
type Coordinator ¶
type Coordinator interface {
// Request records a pause and returns its opaque Token. When a
// checkpoint store is configured the pause record (including the
// optional trajectory) is persisted; a non-serialisable trajectory
// fails loud with trajectory.ErrUnserializable and nothing is
// persisted.
Request(ctx context.Context, req PauseRequest) (Pause, error)
// Resume terminates a pause: it validates the resuming identity
// scope, re-attaches any tool-context handles via the
// HandleRegistry (propagating trajectory.ErrToolContextLost on a
// lost handle), marks the record resumed, and clears the
// checkpoint. A second Resume of the same Token returns
// ErrAlreadyResumed.
//
// `decision` is the typed marker carried on the emitted
// `pause.resumed` event so wire consumers (the Console, third-party
// clients, integration tests) can distinguish approve from reject
// from generic resume from timeout without parsing free-form
// `Reason` strings. An invalid Decision is rejected loud with
// ErrInvalidDecision — there is no untyped default (§13 fail-loudly,
// D-096).
Resume(ctx context.Context, token Token, decision Decision, payload map[string]any) error
// Status returns a read-only snapshot of a pause record's
// lifecycle. When the Token is absent from the in-memory registry
// but a checkpoint store is configured, Status falls back to a
// checkpoint load (the restart-survival path). An unknown Token
// returns ErrPauseNotFound.
Status(ctx context.Context, token Token) (Status, error)
// List returns a paginated snapshot of pause records visible under
// the caller's identity scope (Phase 72e — the `pause.list`
// Protocol surface). Read-only: it does NOT mutate the registry,
// does NOT call Resume, and does NOT clear checkpoints.
//
// Identity-mandatory: a missing (tenant, user, session) triple on
// req.Identity returns wrapped ErrIdentityRequired. Pagination is
// mandatory — a 0 / negative / over-max PageSize, or a negative
// Page, returns wrapped ErrInvalidPage; the snapshot is never
// silently clamped.
//
// Cross-tenant visibility: when req.Filter.TenantIDs names a tenant
// other than req.Identity.TenantID (or more than one tenant), the
// caller MUST set req.AdminScoped — otherwise List returns wrapped
// ErrCrossTenantScope. The Coordinator does NOT read the scope from
// ctx; the Protocol-edge handler is responsible for verifying the
// `auth.ScopeAdmin` claim and setting AdminScoped (separation of
// concerns — D-079).
List(ctx context.Context, req ListRequest) (ListResponse, error)
}
Coordinator is Harbor's unified pause/resume primitive. One Coordinator is built per Runtime process and shared across all runs; it is safe for concurrent use by N goroutines (D-025).
Implementations MUST:
- mint an opaque, unique Token per Request;
- fail closed on a missing identity triple (ErrIdentityRequired);
- propagate trajectory.ErrUnserializable verbatim out of Request and trajectory.ErrToolContextLost verbatim out of Resume — no silent-degradation path;
- validate the resuming identity scope against the pause's scope (ErrScopeMismatch);
- treat Resume as idempotent — a second Resume of the same Token returns ErrAlreadyResumed, never a double-apply.
func New ¶
func New(opts ...Option) Coordinator
New constructs the V1 process-local Coordinator. The returned value is immutable after construction (D-025) and safe for concurrent use by N goroutines.
With no options, the Coordinator is fully process-local: no checkpoint store (pauses do not survive restart), a fresh process-local handle registry, no event bus, time.Now as the clock.
type Decision ¶
type Decision string
Decision is the typed marker carried on a `pause.resumed` event so wire consumers can distinguish the *kind* of resume (approve vs. reject vs. generic resume vs. timeout) without parsing free-form `Reason` strings.
The Phase 31 approval gate already owns its own narrower enum (`approval.ApprovalDecision` — approve / reject / pending) for the gate-internal resolution channel; that enum is approval-specific and stays where it lives. This `pauseresume.Decision` is the broader runtime-level enum that ALSO covers tool-side OAuth completion (`DecisionResume`) and deadline-driven resumes (`DecisionTimeout`). A single enum on `Coordinator.Resume` would force either pollution of `approval.ApprovalDecision` with non-approval values or a parallel "ApprovalDecision vs PauseResumeDecision" split — both are §13 "two parallel implementations of the same conceptual feature" smells. Keeping the gate-internal enum narrow and the coordinator-edge enum broad is the right factoring.
Wire consumers (the Console, third-party clients, integration tests) switch on this typed value rather than parsing `Reason` strings — the audit-flagged anti-pattern issue #113 / D-096 closes.
const ( // DecisionApprove — the gate's approver said yes (HITL approval). // Mirrors the steering inbox's `ControlApprove` and the approval // package's `DecisionApprove`. DecisionApprove Decision = "approve" // DecisionReject — the gate's approver said no (HITL rejection). // Mirrors the steering inbox's `ControlReject` and the approval // package's `DecisionReject`. A REJECT terminates the run with // `Finish{ConstraintsConflict}` in the RunLoop (D-071). DecisionReject Decision = "reject" // DecisionResume — a generic resume of a non-approval pause. The // canonical producer is the tool-side OAuth provider completing a // flow (Phase 30), but any future non-approval resume (a steering // `RESUME` not tied to an APPROVE / REJECT, an A2A `INPUT_REQUIRED` // fulfilled) lands here. DecisionResume Decision = "resume" // DecisionTimeout — a deadline-driven resume (the pause's max-park // window elapsed and the runtime resumed it to surface a // constraint-conflict). Produced by the pause sweeper (sweeper.go, // Phase 111c / D-200) when a pause outlives the Coordinator's // WithMaxParkDuration ceiling. Terminal for the waiting run: the // steering RunLoop finishes it with Finish{ConstraintsConflict} // (the D-071 REJECT posture applied to deadlines), never a silent // unpark-and-continue. DecisionTimeout Decision = "timeout" )
type ListFilter ¶
type ListFilter struct {
// States narrows by lifecycle state. Empty defaults to
// [StatusPaused] — the intervention-queue use case.
States []State
// TenantIDs narrows to a tenant set. Empty defaults to the
// caller's own tenant. A foreign tenant OR len>1 requires
// ListRequest.AdminScoped.
TenantIDs []string
// UserIDs narrows to a user set within the visible tenants.
UserIDs []string
// SessionIDs narrows to a session set.
SessionIDs []string
// RunIDs narrows to a run set.
RunIDs []string
// Reasons narrows to one or more canonical pause reasons.
Reasons []Reason
// Since is an optional lower bound on PausedAt (inclusive).
Since time.Time
// Until is an optional upper bound on PausedAt (inclusive).
Until time.Time
}
ListFilter is the runtime-internal filter shape for Coordinator.List. Empty slices are wildcards; a zero Since / Until is "no bound".
type ListRequest ¶
type ListRequest struct {
// Identity is the caller's (tenant, user, session) triple.
// Mandatory — an incomplete triple fails closed with
// ErrIdentityRequired.
Identity identity.Identity
// Filter narrows the snapshot. An empty filter means "the caller's
// own identity scope, status=paused".
Filter ListFilter
// Page is the 1-based page number. 0 is treated as 1; a negative
// Page fails closed with ErrInvalidPage.
Page int
// PageSize is the per-page row count. 0 is treated as the default
// (DefaultListPageSize); a negative or over-max value fails closed
// with ErrInvalidPage.
PageSize int
// AdminScoped is true when the caller carries the verified
// auth.ScopeAdmin claim. Set by the Protocol-edge handler; the
// Coordinator itself does NOT read the scope from ctx.
AdminScoped bool
}
ListRequest is the input to Coordinator.List — the runtime-internal projection of the Protocol-edge types.PauseListRequest.
type ListResponse ¶
type ListResponse struct {
// Snapshots is the page of pause records, ordered PausedAt
// descending (newest first).
Snapshots []Pause
// Statuses is parallel to Snapshots — Statuses[i] is the lifecycle
// status of Snapshots[i].
Statuses []Status
// Page is the 1-based page number this response covers.
Page int
// PageSize is the per-page row count applied.
PageSize int
// PageCount is the total number of pages over the filtered set.
PageCount int
// TotalRows is the total filtered row count across all pages.
TotalRows int
// Truncated is true when a status=resumed filter was requested but
// the resumed slice has aged out of the in-memory registry — see
// the Coordinator's destructive-on-resume contract (coordinator.go).
Truncated bool
}
ListResponse is the value returned by Coordinator.List.
type Option ¶
type Option func(*coordinator)
Option configures a coordinator at construction. Options are applied in order; later options override earlier ones for the same field.
func WithBus ¶
WithBus hands the Coordinator an event bus. When set, Request emits pause.requested and Resume emits pause.resumed. When not set, no events are emitted (the Coordinator still functions — event emission is observability, not correctness).
func WithCheckpointStore ¶
func WithCheckpointStore(s state.StateStore) Option
WithCheckpointStore hands the Coordinator a state.StateStore for durable pauses. When set, Request persists every pause record and a fresh Coordinator over the same store rehydrates pauses on demand — pauses survive a Runtime restart. When NOT set, pauses are process-local only and explicitly do not survive restart.
Phase 50 deliberately does not mint a parallel persistence-driver seam: state.StateStore is already the §4.4 persistence seam (three V1 drivers at conformance parity). See D-067.
func WithClock ¶
WithClock overrides the wall-clock source. Defaults to time.Now. Tests pass a controllable clock so PausedAt / ResumedAt are deterministic (CLAUDE.md §11).
func WithHandleRegistry ¶
func WithHandleRegistry(r trajectory.HandleRegistry) Option
WithHandleRegistry overrides the handle registry used to re-attach the non-serialisable half of ToolContext on resume. Defaults to a fresh process-local registry (trajectory.NewProcessLocalRegistry). Pass a shared registry when tool dispatch and pause/resume must see the same handle table.
func WithMaxParkDuration ¶ added in v1.3.0
WithMaxParkDuration sets the operator-configured ceiling on how long a pause may stay parked (Phase 111c / D-200, RFC §3.3's typed `timeout` Decision — D-096). When d > 0, every pause carries an expiry derived from PausedAt + d, and the pause sweeper (RunSweeper) resumes expired pauses with DecisionTimeout — terminal for the waiting run (a deadline the human missed is a constraint the planner cannot resolve; mirrors D-071's REJECT posture). A non-positive d is the documented "never expire" default — today's pre-111c behaviour, not an error.
type Pause ¶
type Pause struct {
// Token is the opaque runtime-issued handle for this pause.
Token Token
// Reason is one of the four canonical pause reasons.
Reason Reason
// Payload is the sanitised, bounded pause payload (auth URL +
// scopes for OAuth, approval context for HITL, etc.).
Payload map[string]any
// PausedAt is the wall-clock time the pause was recorded.
PausedAt time.Time
// Identity is the (tenant, user, session) triple the pause was
// recorded under. Resume validates the resuming scope against it.
Identity identity.Identity
}
Pause is the value returned by Coordinator.Request: the opaque Token plus the sanitised reason / payload / timestamp / identity of the paused run. Payload is depth/size-bounded by the caller (the Protocol edge enforces the RFC §6.3 steering-payload bounds before a pause request ever reaches the Coordinator); this package treats Payload as opaque sanitised data.
type PausePayloadArtifactRoutedPayload ¶
type PausePayloadArtifactRoutedPayload struct {
events.SafeSealed
// Token is the opaque pause Token whose Payload was routed.
Token string
// ArtifactID is the content-addressed ID of the artifact the
// heavy Payload was materialised into.
ArtifactID string
// PayloadBytes is the serialised byte length of the routed Payload.
PayloadBytes int
// ThresholdBytes is the configured heavy-content threshold the
// PayloadBytes met or exceeded.
ThresholdBytes int
}
PausePayloadArtifactRoutedPayload is the typed payload for a `pause.payload_artifact_routed` event. SafePayload by construction: every field is the runtime's own bookkeeping — the opaque pause Token, the content-addressed artifact ref ID, and the byte sizes involved. NO caller-controlled payload bytes are carried (the heavy payload itself went to the ArtifactStore, which is the whole point of the bypass). Phase 72e (D-110, D-026).
type PauseRequest ¶
type PauseRequest struct {
// Identity is the (tenant, user, session) triple the paused run
// belongs to. Mandatory — a partial triple fails closed with
// ErrIdentityRequired (CLAUDE.md §6 rule 9).
Identity identity.Identity
// Reason is one of the four canonical pause reasons. An invalid
// reason fails closed with ErrInvalidReason.
Reason Reason
// Payload is the sanitised, bounded pause payload. Optional;
// may be nil.
Payload map[string]any
// Trajectory, when non-nil, is checkpointed alongside the pause
// record when a checkpoint store is configured. A non-serialisable
// trajectory fails Request loud with trajectory.ErrUnserializable
// — the pause is NOT half-persisted. Optional; may be nil (a pause
// with no trajectory is valid — e.g. a pre-run approval gate).
Trajectory *trajectory.Trajectory
}
PauseRequest is the input to Coordinator.Request.
type PauseRequestedPayload ¶
type PauseRequestedPayload struct {
events.SafeSealed
// Token is the opaque pause Token.
Token string
// Reason is the canonical pause reason string.
Reason string
}
PauseRequestedPayload is the typed payload for a pause.requested event. SafePayload by construction: every field is the Coordinator's own bookkeeping. The Token is opaque and carries no caller bytes; the Reason is one of four canonical enum values. The pause Payload itself is NOT carried on the event — it may carry caller-controlled data (an OAuth auth URL, approval context) and is left to the Protocol-edge projection (a later phase) to redact/bound. Observers that need the payload read it via Coordinator.Status.
type PauseResumedPayload ¶
type PauseResumedPayload struct {
events.SafeSealed
// Token is the opaque pause Token.
Token string
// Reason is the canonical pause reason string (one of the four
// canonical pause reasons — the reason the pause was *requested*).
Reason string
// Decision is the typed marker indicating *how* the pause resumed
// (approve / reject / resume / timeout). Wire consumers switch on
// this value rather than parsing `Reason` strings. See D-096.
Decision Decision
}
PauseResumedPayload is the typed payload for a pause.resumed event. SafePayload by construction — Token + Reason + Decision only, no caller bytes.
`Decision` is the load-bearing typed marker wire consumers (the Console, third-party clients, integration tests) switch on to distinguish approve from reject from generic resume from timeout — the §13 "overloaded `Reason` string" anti-pattern issue #113 / D-096 closes. `Reason` is the human-readable pause-reason classification preserved for context; `Decision` is the typed channel observers branch on.
type Reason ¶
type Reason = planner.PauseReason
Reason is the pause-reason enum carried on a pause record. It is a byte-stable typedef bridge onto the planner-side planner.PauseReason (RFC §6.3, D-047): the planner package is the truth source for the four canonical values; this package re-exports them under the pauseresume namespace so runtime call sites do not import the planner package directly. The typedef (not a fresh type) keeps the two byte-identical.
type Status ¶
type Status struct {
// State is StatusPaused or StatusResumed.
State State
// Reason is the pause reason recorded at Request time.
Reason Reason
// PausedAt is the wall-clock time the pause was recorded.
PausedAt time.Time
// ResumedAt is the wall-clock time Resume was called; the zero
// value unless State == StatusResumed.
ResumedAt time.Time
// Decision is the typed marker the pause was resumed with
// (approve / reject / resume / timeout — D-096); the zero value
// while State == StatusPaused. Lets an in-process observer (the
// steering RunLoop's timeout detection — Phase 111c / D-200)
// distinguish a sweeper-reaped timeout from a legitimate resume
// without parsing event payloads. Process-local: a rehydrated
// (restart-survival) Status never carries a Decision because a
// resumed checkpoint is deleted, not kept.
Decision Decision
}
Status is the value returned by Coordinator.Status: a read-only snapshot of a pause record's lifecycle without mutating it.
type SweeperOption ¶ added in v1.3.0
type SweeperOption func(*sweeperConfig)
SweeperOption configures RunSweeper. Options are applied in order; later options override earlier ones for the same field.
func WithSweepInterval ¶ added in v1.3.0
func WithSweepInterval(d time.Duration) SweeperOption
WithSweepInterval overrides the sweep cadence. Non-positive values fall back to DefaultSweepInterval.
func WithSweeperLogger ¶ added in v1.3.0
func WithSweeperLogger(l *slog.Logger) SweeperOption
WithSweeperLogger hands the sweeper a logger for reap / failure lines. Defaults to slog.Default().