lifecycle

package
v1.0.0-beta.85 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package lifecycle provides a substrate convention layer for workflow-shaped entities — named instances with declared phases, restart recovery, operator visibility, and rule integration.

What this is (and is not)

This package is a SUBSTRATE CONVENTION LAYER, not a workflow engine. It provides:

  • A Participant interface apps implement on their domain state structs
  • A Manager harness that handles KV storage, restart recovery, and operator API integration
  • A Transitions table that declares valid phase transitions per workflow
  • Struct-tag-based operator-writability declaration (default-deny)

It does NOT provide:

  • A runtime, DSL, or state-machine interpreter (apps own work logic)
  • A separate event bus (uses existing NATS KV primitives)
  • A process orchestrator (orchestration stays in the rule engine)
  • A replacement for components (components remain the execution layer)

See ADR-047 for the full rationale.

Participation is per-entity, not per-deployment

The lifecycle harness has ZERO dependencies on agentic-loop or any other consumer class. Apps that ship no agentic features get the full harness benefit. This package MUST NOT import any processor/agentic-* package — verified by import lint.

Participant is opt-in per ENTITY-TYPE within a single app:

  • Drone missions, sensor lifecycles, manufacturing batches, scenario executions implement Participant
  • Raw telemetry, log entries, agent loops, and transient inputs stay outside the harness and pay zero cost

Usage

Apps annotate state-struct fields with the lifecycle struct tag:

type SystemState struct {
    EntityID_   string `json:"entity_id"    lifecycle:"id"`
    Phase_      string `json:"phase"        lifecycle:"phase,readonly"`
    OwnerOrgID  string `json:"owner_org_id" lifecycle:"operator_writable"`
    InternalState string `json:"internal_state,omitempty"`
    // no tag = not operator-writable (default-deny)
}

Tag values:

  • id — marks the EntityID field (one per struct)
  • phase — marks the Phase field (one per struct)
  • readonly — never operator-writable
  • operator_writable — opt-in operator-writability via Manager.UpdateFromOperator
  • indexable — flagged for v2 secondary indexing (v1 ignores)

Apps register a Workflow type at startup with its valid transitions:

transitions := lifecycle.Transitions{
    "planning":   {"flying", "aborted"},
    "flying":     {"capturing", "landing", "aborted"},
    "capturing":  {"flying"},
    "landing":    {"completed", "failed"},
    "completed":  {},  // terminal — no out-edges
    "failed":     {},
    "aborted":    {},
}
mgr.Register("drone-survey", func() Participant {
    return &MissionState{}
}, transitions)

See ADR-047 for the complete worked example.

Package lifecycle — query-ops surface (List, Watch, History).

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrWorkflowNotRegistered is returned by Manager operations when the
	// referenced workflow type was never passed to Manager.Register. This
	// is always a programming error — registration belongs at startup,
	// before any Get/Create/Transition calls land.
	ErrWorkflowNotRegistered = errors.New("lifecycle: workflow not registered")

	// ErrWorkflowAlreadyRegistered is returned by Manager.Register when a
	// workflow type is registered twice. Registration must be idempotent
	// at startup, not a fire-and-forget — duplicate registration likely
	// indicates a wire-up bug, not a benign re-init.
	ErrWorkflowAlreadyRegistered = errors.New("lifecycle: workflow already registered")

	// ErrEntityNotFound is returned by Manager.Get when no instance exists
	// at the given EntityID. Distinct from a KV-layer error so callers can
	// branch cleanly on "doesn't exist yet" vs "KV is broken."
	ErrEntityNotFound = errors.New("lifecycle: entity not found")

	// ErrInvalidTransition is returned by Manager.Transition when the
	// requested (from → to) edge is not declared in the registered
	// Transitions table for the workflow. Surfaces misconfigured rules
	// at runtime rather than letting the state machine drift.
	ErrInvalidTransition = errors.New("lifecycle: invalid transition")

	// ErrTerminalPhase is returned by Manager.Transition when the current
	// phase has no declared out-edges (i.e. is terminal). Distinguishes
	// "you tried to transition from completed/failed" from a generic
	// invalid-edge error so operator dashboards can show the right hint.
	ErrTerminalPhase = errors.New("lifecycle: entity is in terminal phase")

	// ErrMissingIDField is returned by struct-tag parsing when a
	// registered factory produces a Participant struct with no field
	// tagged `lifecycle:"id"`. Validates app-side wiring at Register
	// time, not at first use, so the bug surfaces during startup.
	ErrMissingIDField = errors.New("lifecycle: state struct missing field with lifecycle:\"id\" tag")

	// ErrMissingPhaseField mirrors ErrMissingIDField for the phase field.
	ErrMissingPhaseField = errors.New("lifecycle: state struct missing field with lifecycle:\"phase\" tag")

	// ErrFieldNotOperatorWritable is returned by Manager.UpdateFromOperator
	// when the patch attempts to mutate a field NOT tagged
	// `lifecycle:"operator_writable"`. Default-deny: unflagged fields are
	// not operator-writable.
	ErrFieldNotOperatorWritable = errors.New("lifecycle: field is not operator_writable")

	// ErrInvalidTransitionsTable is returned by Manager.Register when the
	// Transitions table is internally inconsistent (e.g. an out-edge
	// references a phase not declared as a key). Catches typos at startup.
	ErrInvalidTransitionsTable = errors.New("lifecycle: invalid transitions table")

	// ErrUpdateRetriesExhausted is returned by Manager.Update (and
	// any Update-using caller — Transition, Complete, Fail,
	// UpdateFromOperator) when the per-call CAS-conflict retry
	// budget (updateRetries) is consumed under persistent contention.
	//
	// Operationally this signals one of:
	//   - A stuck rule writing to the same entity in a tight loop
	//   - An upstream system hammering the same key past framework
	//     capacity
	//   - Misconfigured per-entity fan-in (multiple coordinators
	//     racing on one state)
	//
	// Callers wanting application-layer retry semantics distinct
	// from the framework's bounded retry can branch on
	// errors.Is(err, lifecycle.ErrUpdateRetriesExhausted) and apply
	// their own backoff + retry policy. Operators benefit from
	// dashboards / metrics / alerts differentiating budget exhaustion
	// from other Update failure classes.
	ErrUpdateRetriesExhausted = errors.New("lifecycle: Update retry budget exhausted (persistent CAS contention)")
)

Package error sentinels. Callers compare with errors.Is.

Functions

This section is empty.

Types

type ListOptions

type ListOptions struct {
	// Phase filters to instances whose current phase equals this
	// value. Empty string means any phase.
	Phase string

	// Active=true filters to non-terminal instances. Active=false
	// (default) does not filter on terminal status. To list only
	// terminal instances, use Phase with a specific terminal phase
	// or Match on a discriminating field.
	Active bool

	// Match is a field-equality map (JSON field name → expected value).
	// Multi-tenancy and other app-specific filters live here. Field
	// resolution is reflection-based against the registered state
	// struct.
	Match map[string]any

	// Limit caps the returned slice size. 0 (default) means unlimited.
	Limit int

	// Offset skips the first Offset matching entries before applying
	// Limit. Use for paged operator-API responses.
	Offset int
}

ListOptions parameterizes Manager.List queries against a registered workflow type. All fields are optional; the zero value lists every instance in the workflow's KV bucket (subject to Limit / Offset).

Filter semantics:

  • Phase filters to entries whose Participant.Phase() equals the given value. Empty string means any phase.

  • Active=true filters to entries whose Participant.IsTerminal() returns false. Active=false (the zero value) does not filter on terminal status — to list only terminal entries, set ListOptions.Phase to a specific terminal phase OR add a Match entry on a field that distinguishes them.

  • Match is a field-equality map; key is the JSON field name (per `json:` struct tag), value is the expected value. Multi-tenancy is expressed app-side via Match{"org_id": "acme"} — the framework knows nothing about org/tenant semantics. Reflection is used to read field values at match time; this is acceptable for v1 linear-scan but flag fields as `lifecycle:"indexable"` for v2 secondary-index migration once an operator demonstrates a scaling bottleneck.

    Match comparison semantics are STRICT reflect.DeepEqual against the target field's concrete type. This means:

  • Numeric literals are type-strict: Match{"replica_count": 3} (Go int) does NOT match a field typed uint32(3). Callers constructing Match maps in Go code must use the exact field type (e.g. Match{"replica_count": uint32(3)}).

  • The zero value matches an unset omitempty field. Match {"deployed_to": ""} matches every instance whose DeployedTo field was never set. To skip a filter entirely, omit the key — there is no "match anything" sentinel.

  • String comparison is case-sensitive byte-equal. No fold, no trim, no glob.

    HTTP gateway components (PR 3 of the ADR-047 bundle) handle query-string-to-Go-type coercion at the gateway boundary — that is the right place for permissive parsing of operator-supplied filter values, NOT inside Manager.List. Keeping Manager strict makes its behavior predictable from Go callers AND keeps the wire-level coercion layer's responsibility cleanly separated.

  • Limit caps the returned slice size. 0 means unlimited (default).

  • Offset is a pagination cursor — skip the first Offset matching entries. Combine with Limit for paged operator-API responses.

v1 implementation is linear scan over the bucket; complexity is O(N) per List call where N is the total instance count. Operators hitting this cliff (~10K active instances is a reasonable guideline — file when an operator demonstrates the bottleneck) trigger the v2 secondary index work; the API stays stable across the migration.

Dropped from v1 surface — UpdatedAfter time.Time

An earlier draft exposed UpdatedAfter for incremental pagination of large active sets ("show me changes since T"). It was removed before v1 ship because plumbing the per-entity revision timestamp from the kvStore layer into the filter chain required widening kvStore.Get's return signature to leak store-layer concerns (CreatedAt) into the Participant model. Apps that need since-timestamp pagination today should store their own LastUpdatedAt unix-timestamp field on the state struct and Match against it. v2 secondary indexing will add the filter back as a first-class API when an operator demonstrates the need; the API stays additive across that migration.

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager is the framework-provided harness over Participant implementations. One Manager per process; workflow types register at startup via Manager.Register.

Concurrency model: registrations map is protected by RWMutex (read-heavy: Register at startup, every Get/Update reads). Per-entity concurrency is delegated to the kvStore's revision-CAS semantics — Manager.Update's mutator closure runs under retry- on-conflict, so concurrent updates to the same entity serialize without an in-process per-entity lock.

func NewManager

func NewManager(client *natsclient.Client, logger *slog.Logger) *Manager

NewManager constructs a Manager that talks to NATS via the given client. Logger may be nil — falls back to slog.Default.

Workflow registration happens via Manager.Register at app startup; this constructor itself does not touch NATS (per-bucket kvStore handles open lazily at Register time so deployments with no workflows registered pay no KV cost).

func (*Manager) Ancestors

func (m *Manager) Ancestors(ctx context.Context, entityID string) ([]Participant, error)

Ancestors returns Participants walking from the given entityID up the parent chain, ROOT-FIRST (the entity itself is NOT included; ancestors[0] is the immediate parent, ancestors[last] is the root). Stops when ParentEntityID() returns empty string or when a parent can't be resolved (logged + walk truncated).

Cross-workflow: each ancestor may be in a different workflow than the starting entity. v1 implementation resolves each ancestor by scanning every registered workflow's bucket for the entityID — O(workflows × bucket-size) per ancestor lookup. Apps with deep parent chains in high-cardinality workflows feel this; the v2 secondary-index work (ADR-047 § KV indexing) makes the per-ancestor lookup constant-time.

Returns empty slice (not error) for entities with no parent (already at root). Returns ErrEntityNotFound if the starting entityID itself can't be resolved.

func (*Manager) AssertRuleWritable

func (m *Manager) AssertRuleWritable(workflow, fieldJSONName string) error

AssertRuleWritable returns nil when fieldJSONName is a field on the registered workflow that the rule layer's lifecycle_transition `set` clause is allowed to mutate. Returns ErrFieldNotOperatorWritable otherwise. The rule layer enforces the SAME default-deny as the operator API (UpdateFromOperator) — fields without `lifecycle:"operator_writable"` are protected, plus `lifecycle:"id"` and `lifecycle:"phase"` fields can NEVER be set via rule actions (identity is immutable; phase is owned by Manager.Transition's transitions-table validation).

Rationale: rules and operators converge on one allowed-mutation convention. A rule definition is operator-authored config; allowing the rule path strictly MORE privileges than the operator API would create an asymmetric attack surface where a typo or attacker-influenced rule could rewrite entity_id or read-only state-machine fields the harness protects everywhere else.

Apps that want rules to mutate fields beyond operator scope must either tag those fields `lifecycle:"operator_writable"` (widening both paths) or stamp the change via add_triple/update_triple instead of through the lifecycle_transition `set` clause.

Returns ErrWorkflowNotRegistered if workflow isn't registered. The field-not-found case maps to ErrFieldNotOperatorWritable too — a typo on a JSON field name and a deliberate write to a protected field should both surface as "you can't write that," not as two distinct error classes for the same operator action.

func (*Manager) Children

func (m *Manager) Children(ctx context.Context, parentEntityID string) ([]Participant, error)

Children returns Participants whose ParentEntityID() equals the given parentEntityID, across ALL registered workflows. v1 implementation is a linear scan over every registered workflow's bucket; complexity is O(sum-of-bucket-sizes) per call.

Cross-workflow semantics: a parent in workflow A can have children in workflow B (e.g. semspec's Plan-owns-Requirements pattern — Plan and Requirement are distinct workflows). Children returns all of them merged into one slice; sort caller-side if needed.

Scaling consideration: high parent-child fan-out apps should prefer List(workflow, Match{"parent_field": parentID}) where parent_field is the JSON name of the ParentEntityID-backing struct field. That stays within one workflow's bucket and works with the v2 secondary-index path when it lands. Children is the generic helper for the cross-workflow case; List + Match is the scaling-friendly path for the common intra-workflow case.

func (*Manager) Complete

func (m *Manager) Complete(ctx context.Context, workflow, entityID string) error

Complete transitions the entity to the first terminal phase REACHABLE from its current phase (sorted iteration of declared terminals + first one with an in-edge from the current phase). Deterministic across processes — picks the same target every time given the same transitions table.

Errors with ErrTerminalPhase if the entity is already terminal, or ErrInvalidTransition if no terminal phase is reachable from the current phase (e.g. transitions table declares terminals but the current phase has no edge to any of them — wiring bug worth surfacing).

Apps with semantic terminal-name distinctions (failed vs aborted vs cancelled vs completed) should call Transition directly with the specific terminal phase rather than relying on Complete's selection. Complete is the convenience for "mark this done" in workflows where the natural success-terminal is unambiguous from the current phase's edge set.

There is a small race window between the internal Get-for-current- phase and the subsequent Transition: if a concurrent writer transitions the entity to terminal between our two operations, the Transition call surfaces ErrTerminalPhase. The benign-race outcome (entity is terminal as desired) reads as an error to the caller — acceptable trade-off for the simpler API.

func (*Manager) Create

func (m *Manager) Create(ctx context.Context, initial Participant) error

Create commits a fresh Participant to KV. Returns an error if an instance already exists at the same EntityID (create-only semantics — apps that want to overwrite must Delete first OR use Update with a fresh Get).

The initial.EntityID() value is used as both the JSON identity and the KV key derivation source (via Participant.KVKey()).

func (*Manager) Fail

func (m *Manager) Fail(ctx context.Context, workflow, entityID, reason string) error

Fail transitions the entity to a terminal phase, carrying the reason in the TransitionEvent.Note for audit. Selects the terminal phase by exact-name match against the reserved name "failed" if present; otherwise errors (apps that don't declare a "failed" phase must call Transition directly with their preferred error-state phase).

The reason argument is required (empty string rejected) — a Fail with no reason defeats the audit purpose.

func (*Manager) FieldType

func (m *Manager) FieldType(workflow, fieldJSONName string) (reflect.Type, error)

FieldType returns the reflect.Type of the named field on the registered participant, for callers (the rule executor) that need to apply typed numeric ops (increment/decrement) without reimplementing field-name resolution. Returns ErrFieldNotOperatorWritable when the field doesn't exist or isn't rule-writable, so callers can reuse the same gate before mutating.

func (*Manager) Get

func (m *Manager) Get(ctx context.Context, workflow, entityID string) (Participant, error)

Get loads the instance at entityID for the given workflow type. Returns ErrEntityNotFound when no instance exists at the key, ErrWorkflowNotRegistered when the workflow type isn't registered.

The returned Participant is a fresh instance — mutating it does NOT persist. Use Manager.Update with a mutator closure to commit changes.

Drift validation: TODO(manager) — see Participant.Phase doc-comment. Currently this just returns whatever Phase() the deserialized struct reports, even if it's not in the registered Transitions table. PR 1b's query-ops slice adds the drift check.

func (*Manager) GetWorkflowDefinition

func (m *Manager) GetWorkflowDefinition(workflow string) (WorkflowDef, error)

GetWorkflowDefinition returns the WorkflowDef for the given workflow type, suitable for operator-dashboard introspection (state-machine diagram rendering, patchable-field listing, etc.).

Returns ErrWorkflowNotRegistered if the workflow isn't registered.

func (*Manager) History

func (m *Manager) History(ctx context.Context, workflow, entityID string) ([]TransitionEvent, error)

History returns the phase-transition history for the entity at entityID, derived from KV revision replay. Each TransitionEvent represents one write to the entity's KV key.

Reconstruction semantics (v1 — best-effort from KV alone):

  • Each revision is decoded as a Participant snapshot
  • From = previous revision's Phase (empty for the first revision)
  • To = this revision's Phase
  • At = revision's CreatedAt timestamp
  • Triggered = TransitionSourceFramework (cannot recover the original source from KV alone — source/note persistence is a future enhancement that would write an audit triple alongside the state update)
  • Note = "" (same limitation as Triggered)

Returns ErrEntityNotFound if the entity has no KV history (never existed). An entity that existed and was deleted returns its history including the final delete-marker as a synthetic transition with To="<deleted>" so audit trails can show the terminal event.

Revisions where the Phase did NOT change (Update calls that mutated other fields but not Phase) are skipped — History surfaces phase-transition events specifically, not every state mutation. Apps wanting the full revision stream should drop to the kvStore primitive directly.

func (*Manager) List

func (m *Manager) List(ctx context.Context, workflow string, opts ListOptions) ([]Participant, error)

List returns instances of the given workflow type matching opts. v1 implementation is linear-scan over the workflow's bucket: every key is fetched, deserialized, and run through the filter chain. Complexity is O(N) per call where N is the bucket size.

Scaling cliff: ~10K active instances is fine for v1 linear scan; beyond that the secondary-index v2 work (ADR-047 § KV indexing) is the upgrade path. Operators hitting the cliff demonstrate the bottleneck and trigger v2 work; the ListOptions surface stays stable across the migration so callers don't need to change.

Filter ordering (cheap → expensive) so Match's reflect-based comparisons only run on candidates that already passed the cheap Phase / Active method-call filters. This is the reviewer-recommended optimization to keep reflect cost bounded.

Returned slice ownership: each Participant is a fresh instance (factory-produced + json-deserialized). Mutating returned pointers does NOT persist — use Update with a mutator closure to commit changes. Order is the bucket's ListKeys order (sorted for the mock; jetstream returns key insertion order plus pagination — operator dashboards should sort caller-side if needed).

func (*Manager) ListWorkflows

func (m *Manager) ListWorkflows() []WorkflowDef

ListWorkflows returns the WorkflowDef for every registered workflow type, sorted by workflow name for deterministic operator-dashboard output across restarts.

func (*Manager) LookupByEntityID

func (m *Manager) LookupByEntityID(ctx context.Context, entityID string) (Participant, error)

LookupByEntityID resolves an entityID to a Participant by scanning every registered workflow's bucket. Returns ErrEntityNotFound if the ID isn't present in any bucket.

O(workflows × bucket-size) per call. Suitable for the rule engine's `lifecycle_*` action path and `$entity.lifecycle.*` substitution path (low-frequency, rule-fire-driven). NOT per-message-shaped — apps with high-frequency lookup needs should track their own entityID→workflow mapping and call the workflow-typed Get directly.

func (*Manager) Register

func (m *Manager) Register(workflow string, factory func() Participant, transitions Transitions) error

Register declares a workflow type to the Manager. Must be called at app startup, before any Get / Create / Update calls land. Idempotent at the wiring level — re-registering the same workflow returns ErrWorkflowAlreadyRegistered to surface a duplicate-init wiring bug.

The factory returns a fresh zero-value Participant; Manager.Get uses it as the json.Unmarshal target so the codec can populate the app's typed state struct. Factory must return a POINTER receiver (json.Unmarshal can't populate value-typed Participant implementations).

transitions is validated against Transitions.Validate before registration commits; an invalid table is rejected with the validation error wrapped as ErrInvalidTransitionsTable.

func (*Manager) Transition

func (m *Manager) Transition(ctx context.Context, workflow, entityID, newPhase string, source TransitionSource, note string) error

Transition moves the entity at entityID from its current phase to newPhase, atomically with any CAS retries. Validates the transition against the registered Transitions table:

  • newPhase must be declared in the table (typo catcher)
  • (current → newPhase) must be a declared edge
  • current phase must NOT be terminal (terminal entities cannot transition further — surfaces ErrTerminalPhase distinctly so dashboards can show "cannot abort an already-completed mission")

source identifies what triggered the transition (rule action, operator API, component direct call, framework auto). Persisted in the History stream for audit (the History query op lands in the next commit).

Note is an optional free-text annotation; Manager.Fail uses it to carry the failure reason. Pass "" when not relevant.

func (*Manager) TransitionWith

func (m *Manager) TransitionWith(ctx context.Context, workflow, entityID, newPhase string, source TransitionSource, note string, mutator func(Participant) error) error

TransitionWith is Transition + a caller-supplied mutator that runs inside the same Update closure (and therefore the same atomic KV write) as the phase change. mutator runs AFTER transitions-table validation but BEFORE the phase field is rewritten — failures from the mutator abort the whole transition (no partial state write, CAS retry replays the mutator on the next revision).

Intended for the rule engine's `lifecycle_transition` action's `set` clause: state-field mutations + phase change live in one optimistic-concurrency-protected write. nil mutator is equivalent to Transition (and is the path Transition itself takes).

func (*Manager) Update

func (m *Manager) Update(ctx context.Context, workflow, entityID string, mutator func(Participant) error) error

Update mutates the entity at entityID via the given mutator closure, under CAS retry semantics:

  1. Get current state + revision
  2. Call mutator(participant) — mutator mutates the pointer in place
  3. Marshal + KV.Update with the revision
  4. On CAS conflict, retry from step 1 (up to updateRetries)

mutator errors abort the update immediately with the wrapped error — no retry, no KV write. The mutator must NOT call Manager methods (re-entry would deadlock the registrations RWMutex and risk CAS-loop tail-chases).

Returns ErrEntityNotFound if the entity doesn't exist at any point during the retry budget. Returns the wrapped mutator error if the closure rejected the proposed change. Returns ErrUpdateRetriesExhausted if the retry budget exhausted under persistent contention — callers wanting application-layer retry branch on errors.Is(err, ErrUpdateRetriesExhausted).

func (*Manager) UpdateFromOperator

func (m *Manager) UpdateFromOperator(ctx context.Context, workflow, entityID string, patch map[string]any) error

UpdateFromOperator applies a patch map to the entity, validating that every patched field is tagged `lifecycle:"operator_writable"` in the registered state struct. Default-deny: fields without the tag are rejected with ErrFieldNotOperatorWritable. The patch is applied atomically under Manager.Update's CAS-retry contract.

Patch values are STRICT-typed against the target field's concrete type (reflect.Type.AssignableTo); a Go-typed mismatch (e.g. int against a uint32 field) is rejected. HTTP gateway components (PR 3 of the ADR-047 bundle) handle query-string-to-Go-type coercion at the wire boundary — that is the right place for permissive parsing of operator-supplied values, NOT inside UpdateFromOperator.

Returns ErrEntityNotFound if the entity doesn't exist (same as Update). Returns ErrFieldNotOperatorWritable for the first disallowed key encountered. Returns a wrapped type-mismatch error for the first incompatible patch value.

func (*Manager) Watch

func (m *Manager) Watch(ctx context.Context, workflow string) (<-chan Participant, error)

Watch streams Participant snapshots for every write to the workflow's bucket. Bootstrap-then-live: the first batch is the snapshot of current state, then live updates as KV writes land. The returned channel closes when ctx is canceled OR the underlying watcher errors (caller treats both as "stop iterating").

Each delivered Participant is a fresh instance. Mutating it does NOT persist (same contract as Get).

Delete-marker events are NOT delivered as Participants — they're skipped (no instance to render). Apps that need to react to deletions should use the kvStore primitive directly via a follow-up tool — Watch's contract is "current Participants you can read."

CALLER MUST CANCEL ctx when done iterating. The watcher goroutine and its underlying jetstream subscription pin until ctx.Done(); a caller that simply stops reading the channel without canceling leaks the goroutine and the NATS subscription until the next write-then-block triggers buffer back-pressure. Pattern:

ctx, cancel := context.WithCancel(parentCtx)
defer cancel()
ch, err := mgr.Watch(ctx, workflow)
// ...drain ch...

type Participant

type Participant interface {
	// EntityID returns the 6-part federated graph entity ID
	// (org.platform.domain.system.type.instance). Used as the
	// canonical identifier across the graph, KV, and operator API.
	EntityID() string

	// Workflow returns the workflow type identifier (e.g.
	// "drone-survey", "csapi-system"). Must match a string passed
	// to Manager.Register at startup. Stable per implementation
	// type — never varies per instance.
	Workflow() string

	// Phase returns the current lifecycle phase (e.g. "planning",
	// "flying", "completed"). Must be a key in the Transitions
	// table registered for this Workflow.
	//
	// Drift detection: Manager.Get / Manager.List emit a Warn log
	// when Phase() returns a value not declared in the registered
	// Transitions table — the entity is silently degraded
	// (Transitions.IsTerminal defaults unknown phases to terminal
	// as a defensive fallback) but the log signal makes the
	// degradation visible to operators. Apps wanting structured
	// detection rather than log-scraping can add a Degraded-bool
	// wrapper in a future API extension; log-only is the v1
	// surface.
	Phase() string

	// IsTerminal returns true when the entity is in a phase with no
	// declared out-edges (i.e. completed, failed, aborted). Used by
	// Manager.List with ListOptions.Active to filter active vs
	// finished instances. By convention, derived from Phase() against
	// the registered Transitions table — see DefaultIsTerminal helper.
	IsTerminal() bool

	// KVBucket returns the NATS KV bucket this entity lives in
	// (e.g. "MISSIONS", "CSAPI_SYSTEMS"). Stable per implementation
	// type. The framework does NOT create the bucket — apps
	// provision their own buckets per their deployment topology.
	KVBucket() string

	// KVKey returns the KV key shape for the given entity ID within
	// the bucket. Apps choose the convention — common shapes include
	// the bare entityID, a workflow-prefixed key (e.g. "mission."+id),
	// or a multi-segment key for partitioned access.
	//
	// IMPORTANT: KVKey is a pure function of entityID. Implementations
	// MUST NOT read other fields of the Participant struct — Manager
	// caches one sample instance per registration and calls KVKey on
	// it with the resolved entityID, so per-instance struct state
	// (Phase, OwnerOrgID, etc.) is not populated at the call site.
	// Apps needing multi-field key derivation should encode the
	// extra context into the entityID itself (e.g. "<slug>.<id>")
	// or capture immutable workflow-level state in package vars at
	// Register time. This signature shape was chosen to keep
	// Manager.Get/Create/Update reflect-free in the per-message
	// hotpath — see ADR-047's "Participation is per-entity" section.
	KVKey(entityID string) string

	// ParentEntityID returns the parent workflow instance's EntityID,
	// or empty string for root workflows. Enables parent/child
	// workflow relationships (e.g. semspec's Plan-owns-Requirements
	// pattern, or a research-pack-spawns-investigators pattern).
	// Implementations with no parent-child relationships return ""
	// unconditionally.
	ParentEntityID() string
}

Participant is the contract a lifecycle-tracked entity satisfies. Apps implement this on their domain state structs to get framework infrastructure (KV storage, restart recovery, rule integration, operator API).

The interface is small by design — six required methods plus one optional parent-link method. All state and behavior beyond the declared phases lives in the implementing struct's own fields and methods; the harness only reads what it needs to operate (identity, current phase, terminal-detection, KV placement).

Implementations are expected to be plain data structs with these methods — NOT runtime objects with mutable internal state. The Manager round-trips Participant values through JSON (or another codec) on every Get/Update, so any non-serializable fields will be silently dropped on restart. Apps that need transient runtime state should keep it OUTSIDE the Participant struct.

type TransitionEvent

type TransitionEvent struct {
	// From is the phase the entity was in before the transition.
	// Empty string for the Create event (entity didn't exist before).
	From string `json:"from"`

	// To is the phase the entity entered. Equal to the entity's
	// Phase() at the time of the transition.
	To string `json:"to"`

	// At is the wallclock time the transition was committed to KV.
	// Sourced from the KV revision's metadata, not from app code
	// — so it's authoritative across restarts and clock skew.
	At time.Time `json:"at"`

	// Triggered identifies what caused the transition. Closed set
	// of values defined by TransitionSource. Operator dashboards
	// can filter / color-code by this field; audit trails
	// distinguish operator-initiated from automated transitions.
	Triggered TransitionSource `json:"triggered"`

	// Note is an optional free-text annotation. Manager.Fail uses
	// this to carry the failure reason; Manager.Transition can pass
	// arbitrary context. Empty by default.
	Note string `json:"note,omitempty"`
}

TransitionEvent is one entry in an entity's phase-transition history. Manager.History returns these in chronological order.

History is derived from KV revision replay (every write to the entity's KV key is one revision; Manager.Update synthesizes a TransitionEvent when From != To). No separate history bucket is required — the KV bucket's own revision log IS the history.

type TransitionSource

type TransitionSource string

TransitionSource identifies what caused a phase transition. Closed set of typed constants — Manager.Transition takes a TransitionSource parameter, and rule actions / operator API / component direct calls each pass the appropriate constant rather than authoring a string literal at the call site (typos like "oprator" silently break dashboard filtering otherwise).

Wire-compatible with string serialization: string(TransitionSourceRule) is "rule", round-trips through JSON / KV as-is.

const (
	// TransitionSourceRule is set when a rule action invoked
	// Manager.Transition (the common case for state-machine
	// progression driven by the rule engine).
	TransitionSourceRule TransitionSource = "rule"

	// TransitionSourceOperator is set when the operator API
	// (POST /workflows/{type}/{id}/transition) invoked
	// Manager.Transition. Audit trails distinguish operator-
	// initiated transitions from automated ones via this value.
	TransitionSourceOperator TransitionSource = "operator"

	// TransitionSourceComponent is set when a component invoked
	// Manager.Transition directly (rare; usually rules orchestrate
	// transitions and components only call Update / Complete /
	// Fail). Reserved for cases where the component's work IS the
	// transition (e.g. landing-executor → landed phase).
	TransitionSourceComponent TransitionSource = "component"

	// TransitionSourceFramework is set by Manager.Create,
	// Manager.Complete, and Manager.Fail — i.e. the harness's own
	// transition-emitting operations rather than caller-driven ones.
	TransitionSourceFramework TransitionSource = "framework"
)

Defined TransitionSource values. The set is closed; adding a new kind requires both a constant here and a Manager call site that produces it.

type Transitions

type Transitions map[string][]string

Transitions declares the valid phase transitions for a workflow type. Keys are source phases; values are slices of valid destination phases. Terminal phases have empty out-edges.

Example (drone-survey mission):

transitions := lifecycle.Transitions{
    "planning":   {"flying", "aborted"},
    "flying":     {"capturing", "landing", "aborted"},
    "capturing":  {"flying"},
    "landing":    {"completed", "failed"},
    "completed":  {},  // terminal
    "failed":     {},  // terminal
    "aborted":    {},  // terminal
}

The table is checked structurally by Validate at Register time and then consulted by Manager.Transition on every transition request. Edges not declared in the table are rejected with ErrInvalidTransition.

func (Transitions) IsTerminal

func (t Transitions) IsTerminal(phase string) bool

IsTerminal returns true when the given phase has no declared out-edges in the table (or isn't declared at all — defensive fallback to treat unknown phases as terminal so a misconfigured instance can't transition unexpectedly).

Apps that derive Participant.IsTerminal from the Transitions table can use this helper directly:

func (m *MissionState) IsTerminal() bool {
    return missionTransitions.IsTerminal(m.Phase())
}

func (Transitions) IsValidTransition

func (t Transitions) IsValidTransition(from, to string) bool

IsValidTransition returns true when (from → to) is a declared edge in the table. Used by Manager.Transition to gate transitions, but safe for app-side use too (e.g. to filter operator-facing "available next phases" UI).

func (Transitions) Phases

func (t Transitions) Phases() []string

Phases returns the set of declared phases in sorted order. Useful for operator dashboards rendering state-machine diagrams or for validation tooling enumerating the workflow's surface.

func (Transitions) TerminalPhases

func (t Transitions) TerminalPhases() []string

TerminalPhases returns the subset of declared phases that are terminal (no out-edges), sorted. Operator dashboards use this to group instances by terminal-vs-active when ListOptions.Active is not set.

func (Transitions) Validate

func (t Transitions) Validate() error

Validate checks structural consistency of the transitions table. Returns ErrInvalidTransitionsTable wrapped with a descriptive message when:

  • The table is empty (no phases declared)
  • An out-edge references a phase that isn't a key in the table (typo catcher — every reachable phase must be declared, even terminal ones with empty out-edges)
  • A phase has duplicate out-edges (e.g. {"flying": ["landing", "landing"]}) — defends against copy-paste bugs in rule packs

Returns nil when the table is internally consistent.

Note: Validate does NOT check reachability from an initial phase (apps may have multiple entry points) or absence of cycles (cycles are legitimate — e.g. capturing → flying → capturing in the drone example).

type WorkflowDef

type WorkflowDef struct {
	// Workflow is the workflow type identifier (matches what was
	// passed to Manager.Register and what Participant.Workflow returns).
	Workflow string `json:"workflow"`

	// Transitions is the declared phase-transition table registered
	// for this workflow. The state-machine diagram is derivable
	// directly from this map.
	Transitions Transitions `json:"transitions"`

	// KVBucket is the NATS KV bucket this workflow's instances live
	// in. Operator API uses this to route List/Watch operations.
	KVBucket string `json:"kv_bucket"`

	// OperatorWritableFields lists the JSON field paths that are
	// tagged `lifecycle:"operator_writable"` on the registered state
	// struct. Used by UpdateFromOperator to enforce default-deny;
	// also exposed in the operator API for clients to know which
	// fields are patchable.
	OperatorWritableFields []string `json:"operator_writable_fields"`
}

WorkflowDef describes a registered workflow type. Returned by Manager.GetWorkflowDefinition and Manager.ListWorkflows; intended primarily for operator dashboard introspection (e.g. rendering a state-machine diagram derivable directly from the Transitions table).

Apps don't construct WorkflowDef directly — the framework synthesizes it from the data passed to Manager.Register.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL