Documentation
¶
Overview ¶
Package coord is the single public entry point for bones.
Agents construct a Coord via Open, call methods to Claim file-scoped work, list Ready tasks, CloseTask, Post chat messages, Ask synchronous questions, and Close the Coord at shutdown. Substrate details — NATS subjects, fossil transactions, hold encodings — live below this boundary in internal packages and never appear in these signatures.
See docs/invariants.md for the runtime invariants every public method enforces. See docs/adr/ for the architectural commitments that shaped this surface (ADR 0001 public surface, 0002 scoped holds, 0003 substrate hiding, 0004 conflict resolution).
coord/hub.go
coord/leaf.go
Index ¶
- Variables
- type ChatMessage
- type ChatThread
- type Claim
- type CompactInput
- type CompactOptions
- type CompactResult
- type CompactedTask
- type Config
- type Coord
- func (c *Coord) Answer(ctx context.Context, ...) (func() error, error)
- func (c *Coord) Ask(ctx context.Context, recipient string, question string) (string, error)
- func (c *Coord) AskAdmin(ctx context.Context, recipient string, question string) (string, error)
- func (c *Coord) Blocked(ctx context.Context) ([]Task, error)
- func (c *Coord) Claim(ctx context.Context, taskID TaskID, ttl time.Duration) (func() error, error)
- func (c *Coord) Close() error
- func (c *Coord) CloseTask(ctx context.Context, taskID TaskID, reason string) error
- func (c *Coord) HandoffClaim(ctx context.Context, taskID TaskID, fromAgent string, ttl time.Duration) (func() error, error)
- func (c *Coord) Link(ctx context.Context, from, to TaskID, edgeType EdgeType) error
- func (c *Coord) OpenTask(ctx context.Context, title string, files []string) (TaskID, error)
- func (c *Coord) Post(ctx context.Context, thread string, msg []byte) error
- func (c *Coord) Prime(ctx context.Context) (PrimeResult, error)
- func (c *Coord) React(ctx context.Context, thread, messageID, reaction string) error
- func (c *Coord) Ready(ctx context.Context) ([]Task, error)
- func (c *Coord) Reclaim(ctx context.Context, taskID TaskID, ttl time.Duration) (func() error, error)
- func (c *Coord) Subscribe(ctx context.Context, pattern string) (<-chan Event, func() error, error)
- func (c *Coord) SubscribePattern(ctx context.Context, pattern string) (<-chan Event, func() error, error)
- func (c *Coord) WatchPresence(ctx context.Context) (<-chan Event, func() error, error)
- func (c *Coord) Who(ctx context.Context) ([]Presence, error)
- type Edge
- type EdgeType
- type Event
- type File
- type Hub
- type Leaf
- func (l *Leaf) Claim(ctx context.Context, taskID TaskID) (*Claim, error)
- func (l *Leaf) Close(ctx context.Context, claim *Claim) error
- func (l *Leaf) Commit(ctx context.Context, claim *Claim, files []File) (string, error)
- func (l *Leaf) Compact(ctx context.Context, opts CompactOptions) (CompactResult, error)
- func (l *Leaf) Metadata(key string) string
- func (l *Leaf) OpenTask(ctx context.Context, title string, files []string) (TaskID, error)
- func (l *Leaf) PostMedia(ctx context.Context, thread, mimeType string, data []byte) error
- func (l *Leaf) Stop() error
- func (l *Leaf) Tip(ctx context.Context) (string, error)
- func (l *Leaf) WT() string
- type LeafConfig
- type MediaMessage
- type Presence
- type PresenceChange
- type PrimeResult
- type Reaction
- type RevID
- type Summarizer
- type Task
- type TaskID
- type TuningConfig
Constants ¶
This section is empty.
Variables ¶
var ErrAgentMismatch = errors.New("coord: agent mismatch")
ErrAgentMismatch reports that a mutation was attempted by an agent that does not own the claim.
var ErrAgentOffline = errors.New("coord: agent offline")
ErrAgentOffline reports that AskAdmin's presence pre-flight could not find the recipient in the project's presence bucket. Distinct from ErrAskTimeout: ErrAgentOffline is a pre-flight check against a known directory (the presence KV), whereas ErrAskTimeout fires only after the reply-wait deadline elapses against an actual substrate publish. Callers that want the old "send and hope" behavior continue to use Ask; AskAdmin is the opt-in to the stronger pre-flight.
Entries can age out between the pre-flight and the publish, so a clean AskAdmin that returns ErrAskTimeout is still possible. The sentinel only narrows the "no one was listening at pre-flight time" branch.
var ErrAlreadyClaimer = errors.New("coord: caller is already the claimer")
ErrAlreadyClaimer reports that Reclaim was called by an agent that is already the current claimed_by — self-reclaim is nonsensical. ADR 0013.
var ErrAskTimeout = errors.New("coord: ask timed out")
ErrAskTimeout reports that Ask's ctx deadline elapsed before a reply arrived on the inbox subject. Per ADR 0008, ErrAskTimeout also fires when no recipient is subscribed to the ask subject — the substrate cannot distinguish "no one listening" from "listener is slow" cheaply, and the caller-observable behavior is identical either way. Callers that need presence semantics layer a registry on top (Phase 4 work). Distinct from context.Canceled: ErrAskTimeout is the reply- wait boundary; context.Canceled is upstream cancellation.
var ErrClaimTimeout = errors.New("coord: claim timed out")
ErrClaimTimeout reports that hold acquisition did not complete within the configured OperationTimeout.
var ErrClaimerLive = errors.New("coord: current claimer is still live")
ErrClaimerLive reports that Reclaim saw the current claimed_by agent as still present in coord.Who — presence staleness has not yet converged (3 × HeartbeatInterval per Invariant 19). The caller must retry after the window closes. ADR 0013.
var ErrConflict = errors.New("coord: commit conflict (planner overlap)")
ErrConflict is a defense-in-depth assertion: post-SyncNow, Leaf.Commit detected that the local tip diverged from the parent expected at commit time. Disjoint-slot orchestrator-validator contracts make this impossible in practice; if it fires at runtime the planner missed an overlap. There is no auto-recovery (fork+merge has been deleted). Callers treat it as planner failure and stop the run.
var ErrEpochStale = errors.New("coord: claim epoch is stale")
ErrEpochStale reports that a mutation from a claimed position was attempted with a stale claim_epoch view — typically a zombie writer (killed agent, partition-returning slow agent) after a peer has Reclaimed the task. Commit and CloseTask fence against this. Per ADR 0013 and Invariant 24, claim_epoch is monotonic and bumped on every Claim/Reclaim; a CAS check against the current record's epoch refuses the write. Callers should discard in-flight work; no rollback at the coord layer.
var ErrHeldByAnother = errors.New("coord: file(s) held by another agent")
ErrHeldByAnother reports that one or more requested file holds are currently owned by a different agent.
var ErrInvalidEdgeType = errors.New("coord: invalid edge type")
ErrInvalidEdgeType is returned from Link when the supplied EdgeType is not one of the defined constants. Invariant 26 (ADR 0014).
var ErrNotHeld = errors.New("coord: file(s) not held by caller")
ErrNotHeld reports that coord.Commit was called on one or more files the caller does not hold per Invariant 20. Commit is hold-gated: every file named in files must be held by cfg.AgentID at precheck time or the write is refused. Callers that see this should re-Claim the affected task or investigate lost holds.
var ErrNotImplemented = errors.New("coord: not implemented")
ErrNotImplemented is returned by Phase 1 stub methods.
var ErrTaskAlreadyClaimed = errors.New(
"coord: task already claimed",
)
ErrTaskAlreadyClaimed reports that Claim lost the task-CAS race: the task record is already claimed by another agent, or has moved out of the open status entirely (closed tasks are terminal per invariant 13 and cannot be re-claimed). Per ADR 0007, this sentinel is the race-loser signal at the task layer and is distinct from ErrHeldByAnother, which reports a hold-layer collision on the files the task declared.
var ErrTaskAlreadyClosed = errors.New("coord: task already closed")
ErrTaskAlreadyClosed reports that CloseTask was invoked on a task whose status is already closed. Invariant 13 makes closed terminal, so this is the caller-observable surrender boundary for that rule — callers that re-drive close on retry see this sentinel rather than a substrate transition error.
var ErrTaskNotClaimed = errors.New("coord: task is not claimed")
ErrTaskNotClaimed reports that Reclaim was called on a task whose status is not 'claimed' — an 'open' task wants Claim; a 'closed' task is terminal per invariant 13. ADR 0013.
var ErrTaskNotFound = errors.New("coord: task not found")
ErrTaskNotFound reports that the requested task does not exist.
var ErrTooManySubscribers = errors.New(
"coord: too many subscribers",
)
ErrTooManySubscribers reports that Subscribe was called when the number of active subscribers on this Coord already equals Config.MaxSubscribers. Per ADR 0008 and the invariant-9 bound on MaxSubscribers, this is an operator-config-shaped error returned at the Subscribe entry; the caller may retry after an existing subscription's close closure has run.
Functions ¶
This section is empty.
Types ¶
type ChatMessage ¶
type ChatMessage struct {
// contains filtered or unexported fields
}
ChatMessage is the coord.Subscribe surface for a notify chat message. Fields are narrowed to the subset ADR 0008 promises — callers cannot see notify.Message directly, consistent with ADR 0003. Additional fields (Priority, Actions, Media, FromName) are deferred until a Phase 3+ consumer asks for them; adding fields to ChatMessage later is source-compatible, removing them would not be.
All fields are unexported to keep the Coord public surface migration- ready (parallel with Task / taskFromRecord in types.go).
func (ChatMessage) Body ¶
func (m ChatMessage) Body() string
Body returns the message payload as a string.
func (ChatMessage) From ¶
func (m ChatMessage) From() string
From returns the agent identifier of the message sender.
func (ChatMessage) MessageID ¶
func (m ChatMessage) MessageID() string
MessageID returns the substrate-assigned identifier for this message. Opaque to callers — consumers pass it back to coord.React as the target identifier without interpreting the contents. Added in Phase 4 per ADR 0009; source-compatible extension.
func (ChatMessage) ReplyTo ¶
func (m ChatMessage) ReplyTo() string
ReplyTo returns the parent message ID when the message is a reply, or the empty string for a top-level post.
func (ChatMessage) Thread ¶
func (m ChatMessage) Thread() string
Thread returns the 8-character short thread identifier the message was posted under. This matches coord.Post's caller-visible thread identity, which collapses to notify's ThreadShort form; later phases may add a separate accessor for the full UUID-shaped ID if a consumer needs it.
func (ChatMessage) Timestamp ¶
func (m ChatMessage) Timestamp() time.Time
Timestamp returns the UTC wall-clock time the message was created.
type ChatThread ¶
type ChatThread = chat.ThreadSummary
ChatThread is a read-only view of a chat thread for PrimeResult. It is a type alias for chat.ThreadSummary — the two types are identical at the reflect layer, eliminating the translation loop in Prime and the duplicate field-for-field copy.
type Claim ¶
type Claim struct {
// contains filtered or unexported fields
}
Claim is the handle a Leaf returns from Claim. It carries the TaskID and the release closure so callers can rel() at end-of-scope. The release closure is idempotent.
type CompactInput ¶
type CompactInput struct {
TaskID TaskID
Title string
Files []string
Context map[string]string
CreatedAt time.Time
ClosedAt time.Time
ClosedBy string
ClosedReason string
CompactLevel uint8
}
CompactInput is the read-only view of a task that the Summarizer sees.
type CompactOptions ¶
type CompactOptions struct {
MinAge time.Duration
Limit int
Now func() time.Time
Summarizer Summarizer
Prune bool
}
CompactOptions parameterizes a Compact run.
type CompactResult ¶
type CompactResult struct {
Tasks []CompactedTask
}
CompactResult aggregates the per-task results from a single Compact invocation.
type CompactedTask ¶
CompactedTask is the per-task output of Compact.
type Config ¶
type Config struct {
// AgentID identifies this coord instance across the substrate.
AgentID string
// NATSURL is the URL coord.Open dials to reach the substrate. It
// never appears in any coord public method signature per ADR 0003;
// it lives on Config because it is operator-supplied input.
NATSURL string
// ChatFossilRepoPath is the filesystem path at which coord.Open
// creates or opens this agent's chat Fossil repo. The operator owns
// cleanup; coord never calls RemoveAll. In tests, pass t.TempDir().
ChatFossilRepoPath string
// CheckoutRoot is the absolute directory under which per-agent
// working-copy checkouts live per ADR 0010. Coord writes to
// CheckoutRoot/<AgentID>/. In tests, pass t.TempDir().
CheckoutRoot string
// Tuning carries substrate knobs. Zero means "use sane defaults".
Tuning TuningConfig
}
Config is the operator-supplied configuration for a Coord instance. Only the four identity/routing fields are required; Tuning is zero-safe — Open fills missing fields from defaultTuning.
func (Config) Validate ¶
Validate checks every Config field against its documented bounds and returns the first violation as an error. Validate is pure; it does not panic on bad operator input per invariant 9.
Callers must apply defaultTuning before Validate; Open does this automatically. Direct Validate callers (tests) should call defaultTuning themselves if they rely on zero-value Tuning fields.
type Coord ¶
type Coord struct {
// contains filtered or unexported fields
}
Coord is the public entry point for bones. Construct one via Open and Close it at shutdown. All coordination — hold acquisition, task ready queries, chat messaging, presence — flows through methods on *Coord.
Methods are safe to call concurrently; the closed-state check is mutex-guarded so Close may race with in-flight calls without a data race.
Substrate-backed Managers live on an unexported substrate aggregate (see substrate.go). ADR 0008 foreshadowed this refactor; ADR 0009's presence work was the trigger. Accessors within the coord package use c.sub.<mgr>; external callers see only method names per ADR 0001.
func Open ¶
Open constructs a Coord and validates its configuration per invariant 9. The returned *Coord must be Closed by the caller at shutdown. An invalid Config aborts Open with a wrapped error; a nil ctx is a programmer error and panics.
Open dials the NATS substrate and opens the holds KV bucket. If any step fails mid-construction, earlier steps are torn down before returning so no substrate resources leak.
func (*Coord) Answer ¶
func (c *Coord) Answer( ctx context.Context, handler func(ctx context.Context, question string) (string, error), ) (func() error, error)
Answer registers handler as the responder for peer questions addressed to this agent. The subject subscribed on is <proj>.ask.<c.cfg.AgentID>: peers call Ask with this agent's ID as the recipient and their request lands in handler. The handler returns either (reply, nil) to send a reply or ("", err) to drop the request — ADR 0008 does not model error payloads, so a handler error is indistinguishable from "no handler registered" from the Ask caller's side (both surface as ErrAskTimeout).
The handler receives context.Background, not the Ask caller's ctx: the notify/NATS substrate has no per-message ctx to thread through, and fabricating one with a timeout would introduce a second deadline unrelated to the caller's. Handlers that need bounded work should construct their own ctx inside.
The returned closure is an idempotent unsubscribe: the first call tears down the subscription; subsequent calls are no-ops and return nil. sync.Once-guarded so concurrent defer sites cannot double-close. Safe to call after Coord.Close: the underlying chat Manager will have been torn down by then and the closure silently no-ops on the already-closed subscription.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed). handler non-nil likewise panics.
Operator errors returned:
Any substrate error from chat.Respond — e.g. a NATS subscribe
failure — surfaces wrapped with the coord.Answer prefix.
func (*Coord) Ask ¶
Ask sends a synchronous question to a peer agent and waits for a reply on the <proj>.ask.<recipient> subject. The reply, when it arrives, is returned as a string. Recipient is opaque: coord does not verify that anyone is listening before sending — if no peer has Answer registered for the subject, the ctx deadline elapses and Ask returns ErrAskTimeout. Per ADR 0008 this is deliberate: the substrate cannot cheaply distinguish "no one listening" from "listener is slow", and surfacing them identically keeps the caller's retry strategy honest (both cases want another attempt or a give-up).
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed). Recipient and question non-empty preconditions likewise panic.
Operator errors returned:
ErrAskTimeout — ctx deadline elapsed, or no responder answered
within the deadline. Distinct from context.Canceled.
context.Canceled — ctx was canceled (not deadlined) before or
during the request. Surfaces wrapped with the coord.Ask
prefix. Distinct from ErrAskTimeout.
Any other substrate error — e.g. a NATS publish failure — is
wrapped with the coord.Ask prefix and returned verbatim.
func (*Coord) AskAdmin ¶
AskAdmin sends a synchronous question to a peer agent after first pre-flighting the recipient against the project's presence bucket. If the pre-flight finds no live presence entry for recipient, AskAdmin returns ErrAgentOffline without touching the chat substrate. When the pre-flight succeeds, AskAdmin delegates to the same chat.Request path as Ask — same subject shape (<proj>.ask.<recipient>), same reply-wait semantics, same error translation table.
The "Admin" naming is a holdover from ADR 0008's scope-out language around admin-override semantics. Any agent may call AskAdmin; there is no role-based restriction in Phase 4. Role-based auth is deferred to Phase 6+.
ErrAgentOffline vs. ErrAskTimeout: the pre-flight sentinel narrows the "no one was listening at call time" branch to a directory-based answer, so callers that want to distinguish "recipient does not exist" from "recipient is slow" have a machine-checkable boundary. A successful pre-flight that then times out returns ErrAskTimeout as usual — presence entries can age out between pre-flight and publish, and the timeout path remains the source of truth for substrate-observed non-delivery. Callers that do not want the pre-flight cost continue to use Ask.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed). Recipient and question non-empty preconditions likewise panic.
Operator errors returned:
ErrAgentOffline — presence pre-flight found no live entry for
recipient in this project. Distinct from ErrAskTimeout.
ErrAskTimeout — pre-flight succeeded but the reply-wait
deadline elapsed before a response arrived. Same semantics
as Ask's ErrAskTimeout.
context.Canceled — ctx was canceled (not deadlined) before or
during the call. Surfaces wrapped with the coord.AskAdmin
prefix. Distinct from ErrAskTimeout.
Any other substrate error — e.g. a presence Get failure or a
NATS publish failure — is wrapped with the coord.AskAdmin
prefix and returned verbatim.
func (*Coord) Blocked ¶
Blocked returns open, unclaimed tasks that are currently blocked by at least one non-closed task via an incoming `blocks` edge. Results are sorted oldest-first and capped by Config.MaxReadyReturn.
func (*Coord) Claim ¶
func (c *Coord) Claim( ctx context.Context, taskID TaskID, ttl time.Duration, ) (func() error, error)
Claim atomically acquires a task for this agent. Reads the task record from NATS KV; CAS-updates it to status=claimed, claimed_by=cfg.AgentID; then acquires file-scoped holds on every file declared in the record. If the CAS loses (task already claimed by another agent, or closed), returns ErrTaskAlreadyClaimed and does not attempt holds. If any hold fails, the task CAS is undone before the error return.
The returned release closure is idempotent (invariant 7) and symmetric with Claim: it CAS-un-claims the task record (status back to open, claimed_by cleared) AND releases every hold. A task that was concurrently closed by the claimer via CloseTask will NOT be un-claimed by release — the closed state is terminal. Callers should defer release; it is safe to defer even if CloseTask has already run.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 2 (TaskID non-empty), 5 (ttl > 0 and <= HoldTTLMax), 8 (Coord not closed). Invariant 16 governs the release closure.
Operator errors returned:
ErrTaskNotFound, ErrTaskAlreadyClaimed, ErrHeldByAnother.
func (*Coord) Close ¶
Close shuts down the Coord. Safe to call more than once; subsequent calls are no-ops and return nil. Close itself never panics once the receiver is non-nil (invariant 8 governs method calls after Close, not Close itself).
Release closures returned by Claim remain callable after Close; they silently no-op (see releaseClosure). This keeps defer-style shutdown from racing the Coord lifecycle.
func (*Coord) CloseTask ¶
CloseTask marks a task closed with an explanatory reason. The caller must be the current claimed_by on the task record (invariant 12); a mismatch returns ErrAgentMismatch. Transition rules follow invariant 13 (open→closed and claimed→closed allowed; closed→closed returns ErrTaskAlreadyClosed; no other transitions are legal).
Because invariant 11 couples claimed_by to status, an open task has ClaimedBy == "" and the identity check ("" != AgentID) fires as ErrAgentMismatch. This is intentional: Phase 2 has no admin override, so only the agent holding the claim may close. Operators that need to close an un-claimed task must first claim it.
Invariants asserted (panic on violation — programmer errors):
1 (ctx non-nil), 2 (TaskID non-empty), 8 (Coord not closed).
Operator errors returned:
ErrTaskNotFound, ErrAgentMismatch, ErrTaskAlreadyClosed.
Other errors from the CAS write path are returned wrapped.
func (*Coord) HandoffClaim ¶
func (c *Coord) HandoffClaim( ctx context.Context, taskID TaskID, fromAgent string, ttl time.Duration, ) (func() error, error)
HandoffClaim transfers an already-claimed task from fromAgent to the caller. Unlike Reclaim, the current claimer may still be live; this is an intentional cooperative handoff used by the dispatch harness so a worker process can own Commit/CloseTask itself.
Preconditions:
- task must exist and be in claimed status
- current claimed_by must match fromAgent
- caller must not already be the claimer
On success the task record stays claimed, claimed_by becomes the caller's AgentID, claim_epoch is bumped, old holds are released, and new holds are acquired under the caller. The returned release closure is identical in behavior to Claim/Reclaim's release closure.
func (*Coord) Link ¶
Link records an outgoing typed edge from one task to another per ADR 0014. Any agent may Link; no claimed_by check (Phase 6 posture).
Preconditions:
- edgeType must be one of EdgeBlocks, EdgeDiscoveredFrom, EdgeSupersedes, EdgeDuplicates. Other values return ErrInvalidEdgeType (invariant 26).
- from and to must both exist. The to task may be in any status, including closed (supersedes/duplicates are valid against closed targets).
Link is idempotent on (from, to, edgeType): a second call with the same triple is a no-op (invariant 25). CAS-retry is inherited from tasks.Manager.Update — concurrent Link calls converge without caller involvement.
func (*Coord) OpenTask ¶
OpenTask creates a new task record with status=open and returns its generated TaskID. Files must be non-empty, bounded by cfg.MaxTaskFiles, every path absolute, and are sorted+deduplicated here before Create (callers need not pre-sort).
Title is the human-readable summary; it must be non-empty. No upper bound is enforced beyond internal/tasks's MaxValueSize check on the encoded record.
Invariants asserted (panics on violation — programmer errors):
1 (ctx non-nil), 4 (files shape), 11 (status=open iff claimed_by empty), 15 (generated TaskID matches ADR 0005 shape).
Other underlying errors are returned wrapped; a CAS-Create collision on a generated ID is a broken-generator condition that panics via assert.Postcondition rather than silently retrying.
func (*Coord) Post ¶
Post publishes a message body to a chat thread via the internal chat.Manager, which routes through EdgeSync notify per ADR 0008. Persistence is delegated to notify's Fossil backing — coord itself owns no chat-message state.
ctx is pre-checked inside chat.Send before any repo or NATS work, so a canceled ctx short-circuits cleanly. Once notify.Service.Send is entered, it runs to completion: the upstream API takes no ctx and cannot be interrupted mid-write. ADR 0008 documents the limitation; observed write latency is sub-millisecond in normal operation.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed). The thread-non-empty precondition is likewise a programmer error and panics.
Operator errors returned:
context.Canceled / context.DeadlineExceeded — ctx finalized
before chat.Send entered notify; surfaces wrapped with the
coord.Post prefix.
chat.ErrClosed — the chat manager was closed underneath (usually
via Coord.Close racing with an in-flight Post).
Any substrate error from notify — e.g. a NATS publish or Fossil
write failure — surfaces wrapped with the coord.Post prefix.
func (*Coord) Prime ¶
func (c *Coord) Prime(ctx context.Context) (PrimeResult, error)
Prime returns a full snapshot of the workspace: open tasks, tasks ready for this agent, tasks claimed by this agent, recent chat threads this agent participates in, and live peers.
Prime is safe to call concurrently. It is the recommended entry point for session-start context recovery (ADR 0015).
func (*Coord) React ¶
React posts a reaction to the message identified by messageID in the given thread. The reaction payload is opaque to coord — emoji, text, or arbitrary bytes all pass through untouched — and is surfaced to peers subscribed to the same thread as a Reaction event on the coord.Subscribe channel.
Per ADR 0009 reactions piggyback on the chat substrate: they are encoded in-band as a notify.Message body with the REACT prefix and routed through the same chat.Send path as Post. No new KV bucket, no new NATS subject. The encoding is a substrate detail and never appears on the public surface — callers emit reactions through React and receive them through Subscribe as Reaction events.
messageID is whatever ChatMessage.MessageID returned for the target message. Coord does not verify that messageID corresponds to a real message on the thread: a reaction to a non-existent message will still publish, and consumers receive it as a Reaction event with a target that matches no visible chat. This is deliberate — the alternative would couple reactions to a per-message lookup on the write path, which the chat substrate was designed to avoid.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed). Thread, messageID, and reaction non-empty preconditions likewise panic.
Operator errors returned: any substrate error from chat.Send, wrapped with the coord.React prefix.
func (*Coord) Ready ¶
Ready returns open, unclaimed tasks eligible to be worked on, sorted oldest-first and capped by Config.MaxReadyReturn. A task is eligible iff ALL of the following hold:
- status == open
- claimed_by == "" (not held by another agent)
- no incoming blocks edge from a non-closed task (ADR 0014)
- no incoming supersedes edge from a non-closed task
- no incoming duplicates edge from a non-closed task
- no non-closed task names it as Parent (parent waits on children)
Cost is O(N+E) where N is the task count and E is the total edge count across non-closed tasks; the reverse index is rebuilt on every call. If this becomes a bottleneck, a cached reverse index is a future optimization (see ADR 0014 §Consequences).
discovered-from edges are stored but intentionally ignored by the filter — they are audit metadata, not a ready-blocker.
func (*Coord) Reclaim ¶
func (c *Coord) Reclaim( ctx context.Context, taskID TaskID, ttl time.Duration, ) (func() error, error)
Reclaim transfers an abandoned claim from a crashed or unreachable agent to the caller. Preconditions per ADR 0013:
- Task must exist and be in 'claimed' status — Reclaim on an 'open' task returns ErrTaskNotClaimed (the caller wants Claim).
- The current claimed_by agent must be absent from coord.Who — otherwise ErrClaimerLive. Presence entries expire after 3 × HeartbeatInterval per Invariant 19.
- Caller must not be the current claimed_by — self-reclaim is nonsensical; returns ErrAlreadyClaimer.
On success: the task record is CAS-re-claimed with the caller's AgentID, claim_epoch bumped (Invariant 24), holds re-acquired under the caller's ID, and a single-line reclaim notice posted to the task's chat thread best-effort.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 2 (TaskID non-empty), 5 (ttl > 0 and <= HoldTTLMax), 8 (Coord not closed).
Operator errors returned:
ErrTaskNotFound, ErrTaskNotClaimed, ErrClaimerLive, ErrAlreadyClaimer, ErrHeldByAnother. Other substrate errors are wrapped with the coord.Reclaim prefix.
func (*Coord) Subscribe ¶
func (c *Coord) Subscribe( ctx context.Context, pattern string, ) (<-chan Event, func() error, error)
Subscribe returns a channel of coord.Event values for messages that match pattern, plus a close closure the caller must invoke to tear the subscription down. Phase 3 ships ChatMessage as the only concrete event type; consumers read with a type switch per ADR 0008.
pattern is the coord-facing thread filter. An empty pattern selects every thread in this agent's project (the documented project-wide path, used by the 2w5 smoke test); the non-empty pattern is a caller-supplied thread name, mapped deterministically to a notify ThreadShort via SHA-256 — two Coords watching the same name subscribe to the same stream (see ADR 0008's 2026-04-19 update). Glob patterns are a Phase 4 follow-up.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed). pattern is NOT asserted non-empty — empty is valid and selects the project-wide stream.
Runtime enforcement:
ErrTooManySubscribers — returned if the live-subscription count on
this Coord already equals Config.MaxSubscribers at entry. The
caller may retry after an existing subscription's close closure
has run (invariant 17 ensures decrement happens exactly once).
The returned close closure cancels an internal ctx derived from the caller's ctx, waits for the relay goroutine to drain, and decrements the live-subscription counter. The Event channel itself is closed by the relay goroutine as it exits, so both the explicit-close and the caller-ctx-canceled paths funnel through the same close site — no second close(chan) call is needed here, and the channel is never double-closed. sync.Once-guarded so subsequent calls return nil and take no action (invariant 17).
func (*Coord) SubscribePattern ¶
func (c *Coord) SubscribePattern( ctx context.Context, pattern string, ) (<-chan Event, func() error, error)
SubscribePattern returns a channel of coord.Event values for every thread whose ThreadShort matches the given NATS subject-segment pattern, plus a close closure the caller must invoke to tear the subscription down. Patterns use NATS subject-wildcard syntax: "*" matches every ThreadShort (equivalent to Subscribe(ctx, "") on the project-wide path), ">" matches every ThreadShort plus any hypothetical subsegments, and a literal pattern matches a single ThreadShort — usefully, the string returned by ChatMessage.Thread() on an earlier event, so a consumer can bootstrap a pattern subscription from observed traffic without round-tripping through the thread-name hash.
Unlike Subscribe, SubscribePattern does NOT hash its pattern — callers supply a raw substrate-level pattern. This is the deliberate substrate leak called out in ADR 0009's Open Questions (ticket agent-infra-6wv): option 1 lands a Phase-4 glob-Subscribe commitment without the new KV registry bucket and per-Post write cost of option 3. Callers that need name-level pattern matching (e.g. "deploy.*" matching thread names) must layer a registry themselves or wait for a future option-3 ticket.
pattern is asserted non-empty to keep the "project-wide" path on Subscribe(ctx, "") and the "pattern" path on SubscribePattern. Callers wanting everything should prefer Subscribe's empty-pattern shorthand over SubscribePattern("*") — the two subscriptions are behaviorally identical but Subscribe is the documented entry point for the wildcard-all case.
Invariants asserted: 1 (ctx non-nil), 8 (Coord not closed). Pattern non-empty panics before any substrate work.
Runtime enforcement:
ErrTooManySubscribers — returned if the live-subscription count on
this Coord already equals Config.MaxSubscribers at entry.
Shared with Subscribe; the slot is freed by the close closure
(invariant 17).
The returned close closure shares the cancel/drain/decrement shape of Subscribe's closer — sync.Once-guarded and safe to call from multiple goroutines.
func (*Coord) WatchPresence ¶
WatchPresence returns a channel of coord.Event values that fire whenever an agent comes online or goes offline in this Coord's project. Concrete type is PresenceChange. Consumers discriminate via the standard Event type switch (ADR 0008); mixing chat and presence events on the same switch is permitted because the sealed interface carries both types.
The initial snapshot is NOT replayed: Watch starts from the moment of subscription. Use Who for a snapshot; use WatchPresence for deltas. Consumers that want both sequence them explicitly.
The returned close closure is idempotent per invariant 17 — the first call cancels the internal ctx, waits for the relay goroutine to drain, and closes the event channel; subsequent calls return nil and take no action.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed).
Operator errors returned: any substrate error from the KV watch path, wrapped with the coord.WatchPresence prefix.
func (*Coord) Who ¶
Who returns the live-presence snapshot for this Coord's project. Each Presence describes one agent whose heartbeat KV entry is still current (not yet TTL-expired). The list includes this Coord itself — Open writes our initial entry before returning, so the caller's own Coord is always visible in Who by the time Who can be called.
Project scoping matches the Post/Ask scheme: agents in other projects are not surfaced. This is a fresh read; presence state is read-through the KV with no client-side caching.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), 8 (Coord not closed).
Operator errors returned: any substrate error from the KV list/scan path, wrapped with the coord.Who prefix.
type Edge ¶
Edge is a typed outgoing relationship between two tasks. Using a type definition (not alias) keeps tasks.Edge out of coord's diagnostic output per Ousterhout review #11.
type EdgeType ¶
type EdgeType string
EdgeType names a typed outgoing relationship from one task to another. Using a type definition (not alias) keeps tasks.EdgeType out of coord's diagnostic output per Ousterhout review #11.
type Event ¶
type Event interface {
// contains filtered or unexported methods
}
Event is the type of value delivered on coord.Subscribe's channel. Phase 3 ships ChatMessage as the only concrete type; later phases may add task-state or hold-state events carried on the same channel.
The interface is sealed by an unexported eventTag method so external packages cannot implement Event — ADR 0003's substrate-hiding rule applies to the event interface itself, not only to struct fields. Callers discriminate the concrete type with a type switch:
for e := range events {
switch m := e.(type) {
case coord.ChatMessage:
// handle
}
}
type File ¶
File is a single file body paired with its logical path. The path is whatever naming scheme the caller uses for its holds (typically absolute). Content is the raw bytes to commit. Fossil stores both verbatim — coord applies no path normalization.
type Hub ¶
type Hub struct {
// contains filtered or unexported fields
}
Hub owns the orchestrator's hub fossil repo + HTTP xfer endpoint and exposes the leaf.Agent's embedded NATS mesh as the hub's NATS bus.
Hub wraps a *leaf.Agent with serve flags set (Pull=false, Push=false, Autosync=AutosyncOff). The agent's serve_http handler exposes repo.XferHandler() so a stock fossil client can pull/push. The agent's mesh runs a standalone NATS server (no upstream); peer leaves solicit it via NATSUpstream and publish/subscribe land directly on the hub's mesh — single-hop subject-interest propagation. See EdgeSync PR #77 (MeshClientURL/MeshLeafAddr).
func OpenHub ¶
OpenHub starts a hub at workdir/hub.fossil that serves HTTP on httpAddr (e.g. "127.0.0.1:8765") and runs the embedded leaf.Agent mesh NATS as the hub's NATS bus. The hub is a passive receiver of pushes from peer leaves; it never client-syncs out.
workdir is created if missing; hub.fossil is created if missing. Caller owns workdir and is responsible for cleanup.
func (*Hub) HTTPAddr ¶
HTTPAddr returns the hub's HTTP listen address, suitable as the hubHTTPAddr argument to OpenLeaf.
func (*Hub) LeafUpstream ¶
LeafUpstream returns the URL remote agents pass as NATSUpstream to peer their meshes into the hub's mesh as leaf nodes. The hub mesh accepts these solicits on its leaf-node port (separate from the client port returned by NATSURL).
type Leaf ¶
type Leaf struct {
// contains filtered or unexported fields
}
Leaf is a per-slot wrapper around leaf.Agent + a *Coord for claim/task scheduling. Each Leaf owns a libfossil repo at workdir/<slotID>/leaf.fossil and a worktree at workdir/<slotID>/wt. Sync flows through the agent's NATS upstream + HTTP pull; claim/task records flow through Coord's NATS KV.
Leaf is a deep type: its public API (OpenLeaf, Stop, Tip, WT, plus the Claim/Commit/Close/Compact/PostMedia methods landed in Tasks 3-5+10) hides the agent's many config knobs.
Architectural invariant: there is exactly one *libfossil.Repo handle to the leaf.fossil file in this process — l.agent.Repo(). All write paths route through it. The substrate (l.coord.sub) does NOT carry its own fossil field.
func OpenLeaf ¶
func OpenLeaf(ctx context.Context, cfg LeafConfig) (*Leaf, error)
OpenLeaf starts a leaf at cfg.Workdir/<cfg.SlotID>/leaf.fossil that joins cfg.Hub's mesh as a leaf-node and uses cfg.Hub's NATS client URL for coord's claim/task KV traffic. Clones leaf.fossil from cfg.Hub's HTTP endpoint at open time.
The slot's worktree is at cfg.Workdir/<cfg.SlotID>/wt.
func (*Leaf) Claim ¶
Claim atomically acquires taskID for this leaf. The returned *Claim carries an idempotent release closure. Delegates to the underlying Coord; Phase 1 keeps the existing claim semantics intact.
func (*Leaf) Close ¶
Close marks the claimed task closed. Delegates to the underlying Coord.CloseTask; the *Claim's release closure is also called so any remaining file holds drop. After Close returns, the *Claim should not be reused.
func (*Leaf) Commit ¶
Commit writes files into the leaf's libfossil repo as a new checkin authored by the slot, then triggers a sync round (SyncNow).
The hold-gate (Invariant 20) and epoch-gate (Invariant 24) are enforced via the leaf's Coord: every File.Path must be held by this leaf at call time, and the *Claim's TaskID must still be active with the same claim_epoch the leaf last observed.
All writes route through l.agent.Repo() — there is no second *libfossil.Repo handle to leaf.fossil in this process. Per architectural invariant: one *libfossil.Repo per fossil file, owned by leaf.Agent.
On success, returns the manifest UUID of the new checkin.
func (*Leaf) Compact ¶
func (l *Leaf) Compact(ctx context.Context, opts CompactOptions) (CompactResult, error)
Compact summarizes eligible closed tasks and writes one artifact per task into the leaf's libfossil repo as a new checkin authored by the slot. Eligibility: status=closed, CompactLevel=0, ClosedAt older than opts.MinAge. The summary body is produced by opts.Summarizer and the artifact path is `compaction/tasks/<TaskID>/level-<N>.md`.
All writes route through l.agent.Repo() — the only *libfossil.Repo handle to leaf.fossil in this process. After each commit, SyncNow triggers a sync round so the artifact propagates to the hub.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), receiver non-nil, opts.Limit > 0, opts.Summarizer non-nil. Operator errors from substrate reads (tasks.List) and the per-task summarize/commit/update/archive paths surface wrapped.
func (*Leaf) Metadata ¶
Metadata returns the value associated with key in the harness-supplied metadata map (from LeafConfig.Metadata). Returns "" if the key is absent or if no metadata was provided.
func (*Leaf) OpenTask ¶
OpenTask is a thin shim onto the leaf's substrate Coord so harnesses and Phase 1 callers can open tasks without reaching into private fields. Phase 2 may relocate task lifecycle entirely onto Leaf.
func (*Leaf) PostMedia ¶
PostMedia stores opaque bytes in the leaf's libfossil repo and publishes an in-band media reference on the chat thread. Subscribers receive a MediaMessage event.
All writes route through l.agent.Repo() — the only *libfossil.Repo handle to leaf.fossil in this process. After the commit, SyncNow triggers a sync round so the artifact propagates to the hub before the chat reference is published; subscribers can then resolve the reference against any peer that has caught up.
Invariants asserted (panics on violation — programmer errors): 1 (ctx non-nil), receiver non-nil, thread/mimeType non-empty, data non-empty.
Operator errors returned: any error from the libfossil commit, the JSON envelope encode, or the chat substrate Send — surfaced wrapped with the coord.Leaf.PostMedia prefix.
type LeafConfig ¶
type LeafConfig struct {
// Hub is the hub this leaf peers against. Required.
Hub *Hub
// Workdir is the root directory for per-slot state. Required.
Workdir string
// SlotID is the unique identifier for this leaf slot. Required.
SlotID string
// ClaimTTL overrides Tuning.HoldTTLDefault for this leaf's claims.
// Zero means use the substrate's default (30s).
ClaimTTL time.Duration
// FossilUser overrides the fossil user set on commits and sync
// handshakes for this leaf. When empty, SlotID is used as the
// commit author and the clone is performed as unauthenticated
// "nobody" (required — SlotID isn't in the hub's user table so
// passing it during clone would fail authentication).
FossilUser string
// PollInterval overrides the leaf.Agent poll cadence (default 5s).
// Zero means use the agent default. Lower for tight-loop tests,
// higher for human-cadence work.
PollInterval time.Duration
// Metadata is opaque key=value pairs the harness wants to attach
// to the leaf for its own bookkeeping. Not used by coord; stored
// on *Leaf so harnesses can call l.Metadata("foo").
Metadata map[string]string
}
LeafConfig is the configuration passed to OpenLeaf. Hub is required and provides all three URL fields (LeafUpstream, NATSURL, HTTPAddr) so callers do not need to thread them individually.
type MediaMessage ¶
type MediaMessage struct {
// contains filtered or unexported fields
}
func (MediaMessage) From ¶
func (m MediaMessage) From() string
func (MediaMessage) MIMEType ¶
func (m MediaMessage) MIMEType() string
func (MediaMessage) Path ¶
func (m MediaMessage) Path() string
func (MediaMessage) Rev ¶
func (m MediaMessage) Rev() RevID
func (MediaMessage) Size ¶
func (m MediaMessage) Size() int
func (MediaMessage) Thread ¶
func (m MediaMessage) Thread() string
func (MediaMessage) Timestamp ¶
func (m MediaMessage) Timestamp() time.Time
type Presence ¶
type Presence struct {
// contains filtered or unexported fields
}
Presence is a read-only view of an agent's liveness entry as returned by coord.Who. Callers obtain Presences via coord.Who and inspect state via accessor methods; direct field access is not possible by design so future schema migrations can evolve the internal shape without breaking callers. Parallel to Task above.
func (Presence) LastSeen ¶
LastSeen returns the UTC wall-clock time of the agent's most recent heartbeat observed in the presence substrate.
type PresenceChange ¶
type PresenceChange struct {
// contains filtered or unexported fields
}
PresenceChange is the Event fired on coord.WatchPresence when an agent appears in or disappears from the presence substrate. Up is true when the agent became reachable (first heartbeat or return after an outage); false when the agent went away (clean shutdown deletes the entry; missed heartbeats expire it via KV TTL).
All fields are unexported to keep the Coord public surface migration-ready (parallel with ChatMessage above and Task/Presence in types.go).
func (PresenceChange) AgentID ¶
func (p PresenceChange) AgentID() string
AgentID returns the identifier of the agent whose presence changed.
func (PresenceChange) Project ¶
func (p PresenceChange) Project() string
Project returns the project segment of the agent.
func (PresenceChange) Timestamp ¶
func (p PresenceChange) Timestamp() time.Time
Timestamp returns the wall-clock time the watcher observed the change, NOT the original event time at the substrate. Observed-time semantics match notify's delivery model.
func (PresenceChange) Up ¶
func (p PresenceChange) Up() bool
Up reports the direction of the change: true for became-online, false for went-offline. A Down event with no prior Up on the same consumer's watch is possible — the watch started after the initial Up — and a consumer that cares about the full picture should seed state from coord.Who before WatchPresence.
type PrimeResult ¶
type PrimeResult struct {
OpenTasks []Task
ReadyTasks []Task
ClaimedTasks []Task
Threads []ChatThread
Peers []Presence
}
PrimeResult is a snapshot of the workspace state for agent context recovery. Returned by coord.Prime; consumed by the agent-tasks prime CLI and Claude Code SessionStart/PreCompact hooks.
type Reaction ¶
type Reaction struct {
// contains filtered or unexported fields
}
Reaction is the coord.Subscribe surface for a peer's reaction to another message. It carries the target messageID (whatever coord. React was called with, opaque substrate identifier) plus the caller- provided reaction string. Reactions are delivered on the same event channel as ChatMessage; consumers discriminate via the Event type switch per ADR 0008.
Reactions piggyback on the chat substrate — no separate KV bucket, no new NATS subject. The in-band encoding is a substrate detail and never appears on the public surface.
func (Reaction) Body ¶
Body returns the reaction payload as a string. Coord does not validate or normalize the content — emoji, text, or arbitrary bytes all pass through untouched.
func (Reaction) Target ¶
Target returns the MessageID of the chat message this reaction applies to. Opaque — the caller passed it into coord.React and received it back unchanged, consistent with ADR 0003's substrate- hiding rule for identifiers.
type RevID ¶
type RevID string
RevID is the opaque identifier of a committed revision in the code-artifact Fossil substrate per ADR 0010. Treated as opaque by coord callers: equality and display are supported, structural parsing is not. Under the hood it is Fossil's 40-character SHA-1 UUID.
type Summarizer ¶
type Summarizer interface {
Summarize(context.Context, CompactInput) (string, error)
}
Summarizer compresses an eligible closed task into a textual summary that becomes the body of the compaction artifact. The Summarizer is caller-supplied so the compaction policy stays orthogonal to the commit substrate.
type Task ¶
type Task struct {
// contains filtered or unexported fields
}
Task is a read-only view of a task record. Callers obtain Tasks from coord.Ready and inspect state via accessor methods; direct field access is not possible by design so future schema migrations can evolve the internal shape without breaking callers.
func (Task) ClaimedBy ¶
ClaimedBy returns the AgentID currently holding the claim, or the empty string when the task is unclaimed.
type TaskID ¶
type TaskID string
TaskID uniquely identifies a task within the substrate. ADR 0005 pins the shape to `<proj>-<8char lowercase alnum>`; generation lives in coord (see open_task.go).
type TuningConfig ¶
type TuningConfig struct {
// HoldTTLDefault is the default TTL callers pass to Claim when they
// do not supply a TTL of their own. Default: 30s.
HoldTTLDefault time.Duration
// HoldTTLMax is the upper bound on any Claim's TTL. Default: 5m.
HoldTTLMax time.Duration
// MaxHoldsPerClaim caps the file count on a single Claim call per
// invariant 4. Default: 16.
MaxHoldsPerClaim int
// MaxSubscribers caps the number of in-flight chat subscribers.
// Default: 8.
MaxSubscribers int
// MaxTaskFiles caps the number of files a single task may touch.
// Default: 16.
MaxTaskFiles int
// MaxReadyReturn caps the number of tasks coord.Ready returns in a
// single call. Default: 32.
MaxReadyReturn int
// MaxTaskValueSize is the upper bound on a task record's serialized
// JSON value, in bytes. Default: 16384.
MaxTaskValueSize int
// TaskHistoryDepth is the per-key JetStream KV history depth for the
// tasks bucket. Default: 8.
TaskHistoryDepth uint8
// HeartbeatInterval is the cadence at which coord refreshes its
// liveness signal. Default: 5s.
HeartbeatInterval time.Duration
// NATSReconnectWait is the delay between NATS reconnection attempts.
// Default: 100ms.
NATSReconnectWait time.Duration
// NATSMaxReconnects caps NATS reconnection attempts. Default: 10.
NATSMaxReconnects int
// Buggify is a test-only flag; zero in production.
Buggify int
}
TuningConfig holds substrate knobs that almost no real caller varies. Zero value in any field means "use sane defaults" — Open applies defaultTuning before Validate so callers only set what they need.