Documentation
¶
Overview ¶
Package hub starts and stops the embedded Fossil hub repo and the embedded NATS JetStream server that together form the orchestrator substrate documented in ADR 0023.
The package replaces the previous bash hub-bootstrap.sh / hub-shutdown.sh scripts. Callers no longer need `fossil` or `nats-server` on PATH; the servers run in-process via libfossil and nats-server/v2/server.
Two entry points:
Start(ctx, root, opts...) — idempotent. Creates .bones/ if missing, seeds the hub from git-tracked files on first run, starts both servers, and writes pid files. With WithDetach(true) returns once both servers are accepting connections; otherwise blocks until ctx is canceled. Stop(root) — sends SIGTERM to the pids written by Start and removes the pid files. Idempotent: missing or stale pid files are not an error.
Index ¶
- Variables
- func FossilURL(root string) string
- func HubFossilPath(root string) string
- func IsRunning(root string) (int, bool)
- func NATSURL(root string) string
- func Start(ctx context.Context, root string, options ...Option) (err error)
- func Stop(root string, options ...StopOption) (err error)
- type LeaseWatcherInfo
- type LeaseWatcherLogger
- type LeaseWatcherStartFunc
- type Option
- type StopOption
Constants ¶
This section is empty.
Variables ¶
var ErrActiveSlots = errors.New("hub: active swarm slots present")
ErrActiveSlots is returned by Stop when one or more swarm slots have live leaf processes that would be silently disconnected by tearing the hub down (#157). Use WithForce to override.
var ErrSeedPrecondition = errors.New(
"no git-tracked files to seed hub fossil from; commit at least one " +
"file (`git add . && git commit -m init`) before running " +
"bones hub start")
ErrSeedPrecondition is returned by Start when the workspace has no git-tracked files for seedHubRepo to commit. Surfaced before the parent spawns the detached child so the user sees the real error (and an actionable next step) without waiting out the TCP-probe readyTimeout. See #138 item 9.
var ErrStaleClaudeWorktrees = errors.New(
"bones: refusing to start: legacy .claude/worktrees/agent-*/ dirs " +
"present (run `bones cleanup --all-worktrees` to migrate; ADR 0050)",
)
ErrStaleClaudeWorktrees is returned by checkStaleClaudeWorktrees when one or more legacy `.claude/worktrees/agent-*/` directories exist under the workspace root. ADR 0050 §"Migration: refuse-to- start on stale `.claude/worktrees/`": `bones hub start` refuses to proceed until those dirs are cleaned, because pre-ADR-0050 isolation no longer matches the synthetic slot machinery.
Mirrors swarm.ErrStaleClaudeWorktrees: the duplication exists because hub cannot import swarm (swarm depends on workspace which depends on hub — import cycle), and the migration check has to gate hub.Start before any side effect lands. swarm's package keeps the canonical definition for the bones-up call site; hub keeps its own copy for the bones-hub-start call site.
Functions ¶
func FossilURL ¶ added in v0.4.1
FossilURL returns the hub Fossil HTTP URL recorded for the workspace at root, or "" if no hub is currently running there.
Consumers (`bones up`, `swarm join/commit/close`, `tasks status`, etc.) read this rather than hardcoding 127.0.0.1:8765 so two bones workspaces can run concurrently with port allocations of their own.
func HubFossilPath ¶ added in v0.6.0
HubFossilPath returns the on-disk path of the hub fossil for the given workspace root. Use this rather than building the path literally in cli/ so verbs survive future layout changes. Honors BONES_DIR (issue #291) when set.
func IsRunning ¶ added in v0.7.1
IsRunning reports whether a hub for the workspace at root is currently running. Returns (pid, true) when hub.pid exists and names a live process; (0, false) otherwise.
Read-only. Used by cli/up to print accurate post-scaffold status without spawning anything (per ADR 0041 the hub is started lazily on first verb, not by `bones up`).
func NATSURL ¶ added in v0.4.1
NATSURL returns the hub NATS URL recorded for the workspace at root, or "" if no hub is running.
func Start ¶
Start brings up the orchestrator hub: a Fossil repository at .bones/hub.fossil seeded from git-tracked files, a Fossil HTTP server on the chosen port, and an embedded NATS JetStream server.
Idempotent: if hub.pid exists and the recorded process is alive, Start returns nil immediately.
With WithDetach(true) the calling process fork-execs itself in "foreground" mode, waits for both servers to become reachable, and returns. The child outlives the caller and owns the servers; hub.pid references the child. This is what `bones hub start --detach` uses so a shell can fire-and-forget the hub.
Without detach, Start blocks on ctx.Done(): the calling process is the hub. hub.pid references the calling process. On cancellation, both servers shut down cleanly and hub.pid is removed.
func Stop ¶
func Stop(root string, options ...StopOption) (err error)
Stop terminates the process recorded in hub.pid (written by Start) and removes the pid file. SIGTERM first; if the process is still alive after stopGrace, SIGKILL. The pid file is only removed once the process is confirmed dead so a follow-up Start cannot mistake an orphan-still-alive for "nothing to clean up" (#138).
A missing or stale hub.pid is not an error: Stop is idempotent so callers can shut down without first checking whether Start ran.
As a safety, Stop will not signal the calling process. If the recorded pid matches os.Getpid(), Stop only removes the pid file. The foreground Start has its own ctx-cancellation path; signaling self would terminate the caller before it could clean up.
Active-slot guard (#157): without WithForce, Stop refuses if any .bones/swarm/<slot>/leaf.pid points at a live process. That guard surfaces ErrActiveSlots wrapped with the slot names so the operator can close them first or pass --force.
URL files (.bones/hub-{fossil,nats}-url) are preserved across Stop so the next Start re-reads the previously-bound port via resolvePorts. When the port is still free, the new hub binds the same port and active leaves' cached NATS URLs keep working. Full teardown (bones down) clears the URL files separately by removing the entire .bones directory.
Types ¶
type LeaseWatcherInfo ¶ added in v0.11.0
type LeaseWatcherInfo struct {
// WorkspaceDir is the bones workspace root (absolute path).
// The watcher uses this to remove `.bones/swarm/<slot>/wt/`
// when a lease expires.
WorkspaceDir string
// NATSURL is the URL of the in-process NATS server the hub
// just started. The watcher dials this to read the swarm-
// sessions bucket.
NATSURL string
// Logger is the hub.log writer the watcher emits to. Infof
// fires on each reap; Warnf surfaces transient substrate
// errors. The watcher MUST be silent when nothing is stale
// (no log spam on the happy path).
Logger LeaseWatcherLogger
}
LeaseWatcherInfo is the dependency packet handed to a lease-TTL watcher start function on hub bring-up. The hub package itself does not import swarm (cycle would arise via swarm→workspace→hub); the CLI layer wires the swarm-aware watcher implementation in via WithLeaseWatcher.
type LeaseWatcherLogger ¶ added in v0.11.0
type LeaseWatcherLogger interface {
Infof(format string, args ...any)
Warnf(format string, args ...any)
}
LeaseWatcherLogger is the contract the hub exposes to the lease-watcher hook. Mirrors the swarm package's WatcherLogger shape so the hookup is one struct conversion away in the CLI wiring.
type LeaseWatcherStartFunc ¶ added in v0.11.0
type LeaseWatcherStartFunc func( ctx context.Context, info LeaseWatcherInfo, ) (stop func(), err error)
LeaseWatcherStartFunc constructs and runs a lease-TTL watcher against info, returning a stop function the hub invokes at shutdown. The watcher MUST run in a goroutine; the hub does not block on it. Errors at startup time are surfaced via the returned error and degrade gracefully — the hub continues to serve.
type Option ¶
type Option func(*opts)
Option configures Start.
func WithCoordPort ¶ added in v0.11.0
WithCoordPort pins the coord client port. Zero means "let the hub allocate per-workspace" (default behavior).
func WithDetach ¶
WithDetach controls Start's blocking behavior. When true, Start returns as soon as both readiness probes succeed; the servers continue running in goroutines until the process exits or Stop is called. When false (the default), Start blocks on ctx.Done() and shuts both servers down cleanly when ctx is canceled.
func WithDrainTimeout ¶ added in v0.7.3
WithDrainTimeout bounds how long runForeground waits for the embedded NATS server and the Fossil child to drain after ctx is canceled. Without a bound, a stuck leaf or fossil checkpoint can keep the hub process alive indefinitely (#158). On timeout the wait is abandoned, a stderr log line records the forced exit, and Start returns a non-nil error so the parent CLI exits non-zero. A zero or negative value falls back to defaultDrainTimeout.
func WithLeaseWatcher ¶ added in v0.11.0
func WithLeaseWatcher(start LeaseWatcherStartFunc) Option
WithLeaseWatcher installs the lease-TTL watcher hook. The hub invokes start(ctx, info) once it is ready (after NATS + Fossil are accepting connections, before the ctx-cancellation wait). The returned stop function runs at hub shutdown, before drain. The hook is optional — the hub package's invariants don't depend on it (the JetStream KV bucket TTL evicts stale records at the substrate level regardless), so a hub started without the option (e.g. the detached parent's spawnDetachedChild path) still works.
The CLI layer wires this in via cli/hub.go so the swarm import stays out of internal/hub and the hub→swarm→workspace→hub cycle never forms.
func WithRepoPort ¶ added in v0.11.0
WithRepoPort pins the repo HTTP port. Zero means "let the hub allocate per-workspace" (default behavior).
type StopOption ¶ added in v0.7.3
type StopOption func(*stopOpts)
StopOption configures Stop. Passed variadically so existing callers using Stop(root) continue to compile without change.
func WithForce ¶ added in v0.7.3
func WithForce(force bool) StopOption
WithForce overrides Stop's safety check that refuses a teardown while any swarm slot has a live leaf process (#157). With force=true, Stop proceeds regardless and the operator owns any active leaves that lose their cached NATS URL when the hub restarts on a different port. bones down passes WithForce(true) since it is an explicit destructive teardown that has already confirmed with the operator.