hub

package
v0.15.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Overview

Hub log entry contract per #322. hub.log is the operator-facing audit trail of "what did the hub do, when, and why" — distinct from the agent-facing task event log (#319) and the JSON CLI output envelope (#321). Storage is NDJSON; each LogEntry value marshals to one line on disk.

Package hub starts and stops the embedded Fossil hub repo and the embedded NATS JetStream server that together form the orchestrator substrate documented in ADR 0023.

The package replaces the previous bash hub-bootstrap.sh / hub-shutdown.sh scripts. Callers no longer need `fossil` or `nats-server` on PATH; the servers run in-process via libfossil and nats-server/v2/server.

Two entry points:

Start(ctx, root, opts...) — idempotent. Creates .bones/ if missing,
  seeds the hub from git-tracked files on first run, starts both servers,
  and writes pid files. With WithDetach(true) returns once both servers
  are accepting connections; otherwise blocks until ctx is canceled.
Stop(root) — sends SIGTERM to the pids written by Start and removes the
  pid files. Idempotent: missing or stale pid files are not an error.

Index

Constants

View Source
const (
	// EventRPC is one state-mutating-or-read RPC handler invocation.
	// Carries rpc/agent/task/took_ms/result_count/err.
	EventRPC = "rpc"

	// EventHook is one Claude Code hook firing the hub observed.
	// Carries hook/session/matcher/result fields via Msg.
	EventHook = "hook"

	// EventLifecycle is a hub bring-up / shutdown / drain marker.
	// Carries Msg only.
	EventLifecycle = "lifecycle"

	// EventError is reserved for hub-side errors that don't fit the
	// rpc / lifecycle path. Most rpc errors ride the rpc kind with
	// an Err field; this constant exists so a hub-internal failure
	// (recovery loop crash, watcher stop) has a place to land.
	EventError = "error"
)

Event kind constants. The set is closed; new event kinds want a new constant + a comment explaining what the entry means.

Variables

View Source
var ErrActiveSlots = errors.New("hub: active swarm slots present")

ErrActiveSlots is returned by Stop when one or more swarm slots have live leaf processes that would be silently disconnected by tearing the hub down (#157). Use WithForce to override.

View Source
var ErrSeedPrecondition = errors.New(
	"no git-tracked files to seed hub fossil from; commit at least one " +
		"file (`git add . && git commit -m init`) before running " +
		"bones hub start")

ErrSeedPrecondition is returned by Start when the workspace has no git-tracked files for seedHubRepo to commit. Surfaced before the parent spawns the detached child so the user sees the real error (and an actionable next step) without waiting out the TCP-probe readyTimeout. See #138 item 9.

View Source
var ErrStaleClaudeWorktrees = errors.New(
	"bones: refusing to start: legacy .claude/worktrees/agent-*/ dirs " +
		"present (run `bones cleanup --all-worktrees` to migrate; ADR 0050)",
)

ErrStaleClaudeWorktrees is returned by checkStaleClaudeWorktrees when one or more legacy `.claude/worktrees/agent-*/` directories exist under the workspace root. ADR 0050 §"Migration: refuse-to- start on stale `.claude/worktrees/`": `bones hub start` refuses to proceed until those dirs are cleaned, because pre-ADR-0050 isolation no longer matches the synthetic slot machinery.

Mirrors swarm.ErrStaleClaudeWorktrees: the duplication exists because hub cannot import swarm (swarm depends on workspace which depends on hub — import cycle), and the migration check has to gate hub.Start before any side effect lands. swarm's package keeps the canonical definition for the bones-up call site; hub keeps its own copy for the bones-hub-start call site.

Functions

func FossilURL added in v0.4.1

func FossilURL(root string) string

FossilURL returns the hub Fossil HTTP URL recorded for the workspace at root, or "" if no hub is currently running there.

Consumers (`bones up`, `swarm join/commit/close`, `tasks status`, etc.) read this rather than hardcoding 127.0.0.1:8765 so two bones workspaces can run concurrently with port allocations of their own.

func HubFossilPath added in v0.6.0

func HubFossilPath(root string) string

HubFossilPath returns the on-disk path of the hub fossil for the given workspace root. Use this rather than building the path literally in cli/ so verbs survive future layout changes. Honors BONES_DIR (issue #291) when set.

func IsRunning added in v0.7.1

func IsRunning(root string) (int, bool)

IsRunning reports whether a hub for the workspace at root is currently running. Returns (pid, true) when hub.pid exists and names a live process; (0, false) otherwise.

Read-only. Used by cli/up to print accurate post-scaffold status without spawning anything (per ADR 0041 the hub is started lazily on first verb, not by `bones up`).

func NATSURL added in v0.4.1

func NATSURL(root string) string

NATSURL returns the hub NATS URL recorded for the workspace at root, or "" if no hub is running.

func Start

func Start(ctx context.Context, root string, options ...Option) (err error)

Start brings up the orchestrator hub: a Fossil repository at .bones/hub.fossil seeded from git-tracked files, a Fossil HTTP server on the chosen port, and an embedded NATS JetStream server.

Idempotent: if hub.pid exists and the recorded process is alive, Start returns nil immediately.

With WithDetach(true) the calling process fork-execs itself in "foreground" mode, waits for both servers to become reachable, and returns. The child outlives the caller and owns the servers; hub.pid references the child. This is what `bones hub start --detach` uses so a shell can fire-and-forget the hub.

Without detach, Start blocks on ctx.Done(): the calling process is the hub. hub.pid references the calling process. On cancellation, both servers shut down cleanly and hub.pid is removed.

func Stop

func Stop(root string, options ...StopOption) (err error)

Stop terminates the process recorded in hub.pid (written by Start) and removes the pid file. SIGTERM first; if the process is still alive after stopGrace, SIGKILL. The pid file is only removed once the process is confirmed dead so a follow-up Start cannot mistake an orphan-still-alive for "nothing to clean up" (#138).

A missing or stale hub.pid is not an error: Stop is idempotent so callers can shut down without first checking whether Start ran.

As a safety, Stop will not signal the calling process. If the recorded pid matches os.Getpid(), Stop only removes the pid file. The foreground Start has its own ctx-cancellation path; signaling self would terminate the caller before it could clean up.

Active-slot guard (#157): without WithForce, Stop refuses if any .bones/swarm/<slot>/leaf.pid points at a live process. That guard surfaces ErrActiveSlots wrapped with the slot names so the operator can close them first or pass --force.

URL files (.bones/hub-{fossil,nats}-url) are preserved across Stop so the next Start re-reads the previously-bound port via resolvePorts. When the port is still free, the new hub binds the same port and active leaves' cached NATS URLs keep working. Full teardown (bones down) clears the URL files separately by removing the entire .bones directory.

Types

type LeaseWatcherInfo added in v0.11.0

type LeaseWatcherInfo struct {
	// WorkspaceDir is the bones workspace root (absolute path).
	// The watcher uses this to remove `.bones/swarm/<slot>/wt/`
	// when a lease expires.
	WorkspaceDir string

	// NATSURL is the URL of the in-process NATS server the hub
	// just started. The watcher dials this to read the swarm-
	// sessions bucket.
	NATSURL string

	// Logger is the hub.log writer the watcher emits to. Infof
	// fires on each reap; Warnf surfaces transient substrate
	// errors. The watcher MUST be silent when nothing is stale
	// (no log spam on the happy path).
	Logger LeaseWatcherLogger
}

LeaseWatcherInfo is the dependency packet handed to a lease-TTL watcher start function on hub bring-up. The hub package itself does not import swarm (cycle would arise via swarm→workspace→hub); the CLI layer wires the swarm-aware watcher implementation in via WithLeaseWatcher.

type LeaseWatcherLogger added in v0.11.0

type LeaseWatcherLogger interface {
	Infof(format string, args ...any)
	Warnf(format string, args ...any)
}

LeaseWatcherLogger is the contract the hub exposes to the lease-watcher hook. Mirrors the swarm package's WatcherLogger shape so the hookup is one struct conversion away in the CLI wiring.

type LeaseWatcherStartFunc added in v0.11.0

type LeaseWatcherStartFunc func(
	ctx context.Context, info LeaseWatcherInfo,
) (stop func(), err error)

LeaseWatcherStartFunc constructs and runs a lease-TTL watcher against info, returning a stop function the hub invokes at shutdown. The watcher MUST run in a goroutine; the hub does not block on it. Errors at startup time are surfaced via the returned error and degrade gracefully — the hub continues to serve.

type LogEntry added in v0.13.0

type LogEntry struct {
	// Ts is the wall-clock instant the entry was emitted. UTC RFC3339
	// per the Logged policy.
	Ts timefmt.LoggedTime `json:"ts"`

	// Level is the severity, wire-encoded as "DEBUG"/"INFO"/"WARN"/
	// "ERROR" via LogLevel.MarshalJSON.
	Level LogLevel `json:"level"`

	// Event names the entry kind: "rpc", "hook", "lifecycle", or
	// "error". Operators grep by event when narrowing an investigation.
	Event string `json:"event"`

	// RPC is the dotted RPC name (e.g. "tasks.create", "tasks.claim").
	// Empty for non-rpc events.
	RPC string `json:"rpc,omitempty"`

	// Agent is the inbound caller identity, or "system" for
	// hub-internal calls. Empty when not applicable (lifecycle).
	Agent string `json:"agent,omitempty"`

	// Task is the task ID this entry is scoped to, when applicable.
	Task string `json:"task,omitempty"`

	// Session is the Claude Code session ID (or other session token)
	// the entry is scoped to, when applicable. Set on hook firings.
	Session string `json:"session,omitempty"`

	// Hook names the Claude Code hook event for event="hook" entries
	// (e.g. "SessionStart", "PreCompact"). Empty otherwise.
	Hook string `json:"hook,omitempty"`

	// Matcher is the hook matcher value (e.g. "compact" for the
	// post-#320 SessionStart-with-matcher form). Empty otherwise.
	Matcher string `json:"matcher,omitempty"`

	// TookMs is the handler duration in milliseconds. Set on rpc
	// entries; zero (omitted) elsewhere.
	TookMs int64 `json:"took_ms,omitempty"`

	// ResultCount is the size of a list-shape RPC result, when
	// available. Zero (omitted) for non-list RPCs.
	ResultCount int `json:"result_count,omitempty"`

	// Err is the error message (if any) the handler returned. Empty
	// on success. Errors always log regardless of configured level
	// per #322's level policy.
	Err string `json:"err,omitempty"`

	// Msg is a free-form message used by lifecycle entries (the boot
	// banner, address lines, ready/stopping/stopped) and hook entries
	// (result summary). Other event kinds prefer typed fields and
	// leave this empty.
	Msg string `json:"msg,omitempty"`
}

LogEntry is the single shape every hub.log row carries. Fields are optional except Ts, Level, Event — those three are present on every row. The struct uses NDJSON tags with omitempty so the on-disk row stays compact: a lifecycle line emits {ts, level, event, msg}; an rpc line emits {ts, level, event, rpc, agent, took_ms}; etc.

Per #324 the Ts field is timefmt.LoggedTime so the marshal path emits UTC RFC3339 with a Z suffix — matching every other persisted timestamp in bones (event log, --json payloads, up.log).

Per #321 this struct does NOT use the {schema, data} envelope: it is internal log format, not CLI output. Operator tooling reads hub.log directly with `bones logs --hub` (which strips/renders) or jq.

type LogLevel added in v0.13.0

type LogLevel uint8

LogLevel names the four standard severities used by hub.log. Order matters: Debug < Info < Warn < Error. The level filter (--log-level / BONES_HUB_LOG_LEVEL) compares numeric rank rather than string identity so callers can express ">= INFO" without a switch statement.

const (
	LevelDebug LogLevel = iota
	LevelInfo
	LevelWarn
	LevelError
)

Numeric ranks let logger.shouldEmit compare configured filter level against the entry level via simple <= ordering. The four-level set is closed; --log-level=trace was rejected during scoping (only the standard four).

func (LogLevel) MarshalJSON added in v0.13.0

func (l LogLevel) MarshalJSON() ([]byte, error)

MarshalJSON emits the wire-form string ("INFO" etc.) so the on-disk "level" field is operator-readable rather than a numeric rank.

func (LogLevel) String added in v0.13.0

func (l LogLevel) String() string

String returns the canonical wire-form of the level — uppercase short token written into the NDJSON "level" field. Operators grepping hub.log see "INFO" / "WARN" / "ERROR" / "DEBUG", not Go constant names. parseLevel inverts.

func (*LogLevel) UnmarshalJSON added in v0.13.0

func (l *LogLevel) UnmarshalJSON(data []byte) error

UnmarshalJSON accepts the wire-form string and decodes back to the numeric rank. Used by the round-trip test and by `bones logs --hub`.

type Option

type Option func(*opts)

Option configures Start.

func WithCoordPort added in v0.11.0

func WithCoordPort(p int) Option

WithCoordPort pins the coord client port. Zero means "let the hub allocate per-workspace" (default behavior).

func WithDetach

func WithDetach(d bool) Option

WithDetach controls Start's blocking behavior. When true, Start returns as soon as both readiness probes succeed; the servers continue running in goroutines until the process exits or Stop is called. When false (the default), Start blocks on ctx.Done() and shuts both servers down cleanly when ctx is canceled.

func WithDrainTimeout added in v0.7.3

func WithDrainTimeout(d time.Duration) Option

WithDrainTimeout bounds how long runForeground waits for the embedded NATS server and the Fossil child to drain after ctx is canceled. Without a bound, a stuck leaf or fossil checkpoint can keep the hub process alive indefinitely (#158). On timeout the wait is abandoned, a stderr log line records the forced exit, and Start returns a non-nil error so the parent CLI exits non-zero. A zero or negative value falls back to defaultDrainTimeout.

func WithLeaseWatcher added in v0.11.0

func WithLeaseWatcher(start LeaseWatcherStartFunc) Option

WithLeaseWatcher installs the lease-TTL watcher hook. The hub invokes start(ctx, info) once it is ready (after NATS + Fossil are accepting connections, before the ctx-cancellation wait). The returned stop function runs at hub shutdown, before drain. The hook is optional — the hub package's invariants don't depend on it (the JetStream KV bucket TTL evicts stale records at the substrate level regardless), so a hub started without the option (e.g. the detached parent's spawnDetachedChild path) still works.

The CLI layer wires this in via cli/hub.go so the swarm import stays out of internal/hub and the hub→swarm→workspace→hub cycle never forms.

func WithLogLevel added in v0.13.0

func WithLogLevel(level string) Option

WithLogLevel pins hub.log's minimum entry level for this hub start. Per #322 the four standard severities are accepted: debug, info, warn, error (case-insensitive). Errors always log regardless of the floor. The empty string defers to BONES_HUB_LOG_LEVEL, which in turn defaults to INFO.

Flag wins over env var: when both --log-level and BONES_HUB_LOG_LEVEL are set, the flag value takes precedence.

func WithRepoPort added in v0.11.0

func WithRepoPort(p int) Option

WithRepoPort pins the repo HTTP port. Zero means "let the hub allocate per-workspace" (default behavior).

type StopOption added in v0.7.3

type StopOption func(*stopOpts)

StopOption configures Stop. Passed variadically so existing callers using Stop(root) continue to compile without change.

func WithForce added in v0.7.3

func WithForce(force bool) StopOption

WithForce overrides Stop's safety check that refuses a teardown while any swarm slot has a live leaf process (#157). With force=true, Stop proceeds regardless and the operator owns any active leaves that lose their cached NATS URL when the hub restarts on a different port. bones down passes WithForce(true) since it is an explicit destructive teardown that has already confirmed with the operator.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL