daemon

package
v0.17.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2026 License: MIT Imports: 20 Imported by: 0

README

pkg/daemon

Background sync daemon for provider transcripts: Claude Code transcript JSONL, Codex rollout JSONL, or an OpenCode session materialized from its local SQLite DB. One daemon runs per active Claude session, Codex root tree, or OpenCode root session.

Files

File Role
daemon.go Daemon struct, Run loop, sync cycles, shutdown, inbox I/O, parent monitoring. Parent-PID liveness lives in a dedicated monitorParent goroutine that ticks at parentCheckInterval (5s; var so tests can override) and closes parentDeathCh on death; the main loop's select drains that and shuts down with reason "parent process exited". The goroutine runs under a context.WithCancel(ctx) deferred-cancel so it exits on every Run() return path, not just when the caller's ctx cancels. For OpenCode (d.providerName == provider.NameOpencode) also starts/stops the root provider.OpenCodeCollector goroutine (backed by provider.OpenCodeDBReader) and derives the materialized transcript path. Holds the shared dbReader, childCollectorBase context, childCollectorCancel, and childCollectors map used by the CF-538 subagent sidechain logic in opencode_children.go. Carries configDir (from Config.ConfigDir, set by the SessionStart hook); binding() resolves it via provider.BindingFor and tryInit reads the backend via config.EnsureAuthenticatedFor, so a custom config dir syncs to its own backend (kata hpec) — a missing binding surfaces as not-authenticated (retry; never falls back to the default backend).
opencode_children.go CF-538 OpenCode subagent sidechain capture: opencodeChildCollector (per-descendant cancel/done handles), opencodeRegistrar (the provider.OpencodeDescendantRegistrar implementation injected via engine.SetDescendantRegistrar), startChildCollector (idempotent goroutine spawn under the daemon's childCollectorBase context), childCollectorDones (snapshot for shutdown to wait on), and waitForCollectors (single shared timeout for root + children).
state.go State persistence (~/.confab/sync/{provider}/{id}.json, with legacy flat-path fallback), process liveness checks, listing. Path builders are thin wrappers over pkg/confabpath. (*State).DeleteWithInbox removes both the state file and the inbox file together — used by both shutdown and the reaper so the two-file cleanup stays consistent.
reaper.go ReapStaleStates() — provider-agnostic sweep that removes state + inbox files whose PID is no longer alive. Files younger than reapMinAge (5s) are skipped to protect freshly-spawned daemons. Called as a goroutine from cmd/hook_sessionstart.go on every session-start so cleanup is opportunistic and invisible to the user (CF-549 F-up A).

Lifecycle

spawn ──> waitForTranscript (poll 2s, timeout 60s)
              │
              ▼
         save state file
              │
              ▼
         sync loop ◄──────────────────┐
           │                          │
           ├── tryInit (lazy auth)    │
           ├── SyncAll (engine)       │
           ├── check parent alive     │
           └── sleep(30s ± 5s jitter)─┘
              │
              ▼ (stop signal / parent dead / context cancel)
         shutdown
           ├── read inbox events (SessionEnd payload)
           ├── final sync (with 30s timeout)
           ├── send session_end event
           ├── delete state file
           └── delete inbox file

Key Types

  • Config — Daemon configuration: external ID, transcript path, CWD, parent PID, sync interval/jitter
  • Daemon — Runtime state: engine, stop/done channels, consecutive error counter
  • State — Persisted to disk: external ID, paths, PIDs, start time, backend session ID

How to Extend

Adding daemon behavior during sync: Hook into the sync loop in Run(). New behavior should go after the tryInit() / engine.SyncAll() calls. Follow the existing error handling pattern — log errors, don't crash.

Adding a new inbox event type: Add the type string constant. writeInboxEvent() and readInboxEvents() are generic — they serialize/deserialize InboxEvent structs. Handle the new type in shutdown() where inbox events are processed.

Adding new state fields: Add to the State struct in state.go. The state is JSON-serialized, so new fields are backwards-compatible with omitempty.

Invariants

  • State directory permissions are 0700. ~/.confab/sync/ is created with restrictive permissions since state files may contain session metadata.
  • Signal channel buffer is 2 to avoid dropping signals when both SIGINT and SIGTERM arrive in quick succession.
  • Shutdown goroutine has panic recovery to ensure state file cleanup even if shutdown logic panics.
  • State file must be deleted on exit. If a state file exists with a dead PID, it blocks future daemon spawns until cleanup. The panic recovery handler also deletes the state file.
  • Shutdown must have a timeout (shutdownTimeout, default 30s). The backend may be unresponsive, and the daemon must not hang forever.
  • Parent PID monitoring uses signal(0), not /proc. os.FindProcess + Signal(0) works on both macOS and Linux. /proc is Linux-only.
  • Daemon must be resilient to backend unavailability. Never crash on network errors. Log the error and retry on the next sync interval.
  • Inbox file must be cleaned up on shutdown. Stale inbox files don't cause bugs but are unnecessary clutter.
  • Stop() is idempotent (uses sync.Once). Multiple callers (signal handler, parent monitor, explicit stop) can all call Stop() safely.
  • Consecutive 404 detection. After 3 consecutive 404 errors (maxConsecutiveNotFound), the daemon shuts down — the session was deleted from the backend.
  • Auth recovery. On ErrUnauthorized, the engine is reset to force config re-read on the next cycle. This allows users to fix their API key without restarting the daemon.
  • Codex: one daemon per root tree, not per rollout. The hook handler walks every Codex SessionStart event up to its top-most root before spawning, so state files are keyed by root UUID. The running root daemon calls provider descendant discovery each sync cycle and uploads verified subagent rollouts as sidechain files. SessionStart events for already-running trees become no-ops.
  • OpenCode: collector materializes the data source. OpenCode has no transcript file, so when d.providerName == provider.NameOpencode the daemon derives ~/.confab/opencode/<id>/messages.jsonl (via openCodeMaterializedPath), points transcriptPath at it, and runs a provider.OpenCodeCollector goroutine. The collector reads OpenCode's local SQLite DB via provider.NewOpenCodeDBReader(provider.OpenCodeDBPath()) (path is CONFAB_OPENCODE_DB$XDG_DATA_HOME/opencode/opencode.db~/.local/share/opencode/opencode.db) and polls at d.syncInterval — so the same CONFAB_SYNC_INTERVAL_MS knob tunes both backend sync + the SQLite poll. The collector is started after the no-op waitForTranscript (the file does not exist yet) and backendSyncEnabled() gates Init/SyncAll on the file existing — so no empty backend session is created before the first complete message. Root-session subagents never reach here: Opencode.ShouldSpawnForInput refuses them at spawn time.
  • OpenCode subagent sidechain capture (CF-538, in opencode_children.go). Alongside the root collector, the daemon owns a childCollectors pool of per-descendant OpenCodeCollector goroutines. opencodeRegistrar wraps *sync.FileTracker, satisfies provider.OpencodeDescendantRegistrar, and is injected via engine.SetDescendantRegistrar inside tryInit (rebuilt fresh after auth-failure reset). Each SyncAll cycle the OpenCode provider's DiscoverDescendants calls RegisterOpencodeChild(childID, localPath); the registrar checks engine.OpencodeChildFilesAllowed() (the opencode_subagent_files capability flag, paired with CF-539), registers the child file (backend file_name = opencode/<child>/messages.jsonl, file_type = agent) via FileTracker.RegisterSidechainFile, and idempotently spawns a collector goroutine through startChildCollector. Children share the daemon's *OpenCodeDBReader instance and the childCollectorBase context (a child of the daemon's main ctx). shutdown() cancels the root + every child collector and waits for all done channels under a single 2s ceiling (waitForCollectors) before the final sync; a wedged collector logs Warn but cannot block shutdown indefinitely. Vanished children (deleted in OpenCode mid-session) keep their collectors running — the collector's 1-Warn-per-minute reconcile-error cadence surfaces the stuck state.

Design Decisions

Lazy authentication. The daemon starts immediately when the provider launches a session, but the user may not have authenticated yet. tryInit() defers backend communication until the first sync cycle, and handles auth failures gracefully.

Jittered sync interval. The base interval is 30s with ±5s random jitter. This prevents thundering herd when multiple sessions start simultaneously. The jitter is applied per-cycle, not just at startup.

State files with PID-based liveness check. The state file stores the daemon PID. IsDaemonRunning() sends signal 0 to check if the process is still alive. This is more reliable than lock files (which can be orphaned) and simpler than IPC.

Panic recovery deletes state file. If the daemon panics, the recovery handler logs the panic and deletes the state file. This prevents a corrupt daemon from permanently blocking future spawns. A clean restart is preferred over trying to recover from unknown state.

Inbox file for IPC. The sync stop command needs to pass the SessionEnd hook payload to the running daemon. Rather than building an IPC mechanism (socket, pipe), the stop command appends the event to an inbox JSONL file, then sends SIGTERM. The daemon reads the inbox during shutdown. This is simple and reliable.

Testing

go test ./pkg/daemon/...
Unit tests (daemon_test.go, state_test.go)
  • Stop/cancel behavior, idempotency
  • Inbox write/read/cleanup
  • Auth recovery on 401
  • State CRUD, process checks, listing
Integration tests (integration_test.go)

Full lifecycle tests with mock HTTP backend. Key scenarios:

  • Sync cycle (init + upload)
  • Retry on backend errors
  • Agent discovery (including late-appearing agents)
  • Incremental sync (only new lines)
  • Multiple sync cycles with appended content
  • Late-appearing transcript file
  • Shutdown with final sync
  • Concurrent startup protection
  • File truncation resilience
  • Large files and chunk size limits

Override shutdownTimeout (package var) in tests for fast execution. Use CONFAB_CLAUDE_DIR / CONFAB_CODEX_DIR to isolate test directories per provider.

Dependencies

Uses: pkg/sync, pkg/config, pkg/confabpath, pkg/http, pkg/types, pkg/logger

Used by: cmd/ (spawn, sync start/stop, status)

Documentation

Index

Constants

View Source
const (
	// DefaultSyncInterval is the base interval for syncing files
	DefaultSyncInterval = 30 * time.Second
)

Variables

This section is empty.

Functions

func GetInboxPathForProvider added in v0.16.0

func GetInboxPathForProvider(provider, externalID string) (string, error)

GetInboxPathForProvider returns the namespaced inbox file path.

func GetStatePathForProvider added in v0.16.0

func GetStatePathForProvider(provider, externalID string) (string, error)

GetStatePathForProvider returns the namespaced state file path.

func GetSyncDir

func GetSyncDir() (string, error)

GetSyncDir returns the path to the sync state directory

func ReapStaleStates added in v0.17.0

func ReapStaleStates() (reaped int, err error)

ReapStaleStates walks every provider subdirectory under ~/.confab/sync and removes state + inbox files whose daemon PID is no longer alive. Provider-agnostic: the signal-0 liveness check is OS-level, not provider-specific, so one pass covers Claude / Codex / OpenCode.

Returns the number of state files removed and the first error from the directory walk; per-file removal errors are logged at debug and skipped. Intended to be called in a goroutine from session-start handlers so the cleanup is invisible to the user (CF-549 F-up A).

func StopDaemon

func StopDaemon(externalID string, hookInput *types.ClaudeHookInput) error

StopDaemon sends SIGTERM to a running daemon by external ID. If hookInput is provided, it writes a session_end event to the daemon's inbox before signaling, so the daemon can access the full SessionEnd payload.

func StopDaemonForProvider added in v0.16.0

func StopDaemonForProvider(providerName, externalID string, hookInput *types.ClaudeHookInput) error

StopDaemonForProvider sends SIGTERM to a running daemon by provider and external ID.

Types

type Config

type Config struct {
	Provider           string
	ExternalID         string
	TranscriptPath     string
	CWD                string
	ConfigDir          string // canonical provider config dir; "" = default binding (kata hpec)
	ParentPID          int    // Claude Code process ID to monitor (0 to disable)
	SyncInterval       time.Duration
	SyncIntervalJitter time.Duration // 0 to disable jitter (for testing)
}

Config holds daemon configuration

type Daemon

type Daemon struct {
	// contains filtered or unexported fields
}

Daemon is the background sync process.

The daemon is resilient to backend unavailability - it will keep running and retry connecting to the backend on each sync interval. Once connected, it will sync any accumulated changes.

If ParentPID is set, the daemon monitors the parent process and shuts down gracefully when it exits. This handles cases where Claude Code crashes or is killed without firing the SessionEnd hook.

For OpenCode, transcriptPath starts empty and is set lazily to the materialized file path once the SQLite-backed collector goroutine starts. The collector reads from OpenCode's local SQLite DB; the daemon does not hold a per-session server URL.

func New

func New(cfg Config) *Daemon

New creates a new daemon instance

func (*Daemon) Run

func (d *Daemon) Run(ctx context.Context) error

Run starts the daemon and blocks until stopped

func (*Daemon) Stop

func (d *Daemon) Stop()

Stop signals the daemon to stop. Safe to call multiple times.

type State

type State struct {
	Provider        string    `json:"provider,omitempty"`
	ExternalID      string    `json:"external_id"`
	TranscriptPath  string    `json:"transcript_path"`
	CWD             string    `json:"cwd"`
	PID             int       `json:"pid"`
	ParentPID       int       `json:"parent_pid,omitempty"` // Claude Code process ID
	InboxPath       string    `json:"inbox_path"`           // Path to event inbox (JSONL)
	StartedAt       time.Time `json:"started_at"`
	ConfabSessionID string    `json:"confab_session_id,omitempty"` // Backend session ID (set after Init)
}

State represents the daemon's persistent state

func ListAllStates

func ListAllStates() ([]*State, error)

ListAllStates returns all active sync states

func LoadStateForProvider added in v0.16.0

func LoadStateForProvider(provider, externalID string) (*State, error)

LoadStateForProvider reads a provider-namespaced state file. Claude Code falls back to the legacy flat path so old daemons and existing hooks keep working.

func NewStateForProvider added in v0.16.0

func NewStateForProvider(provider, externalID, transcriptPath, cwd string, parentPID int) *State

NewStateForProvider creates a daemon state under a provider namespace.

func (*State) Delete

func (s *State) Delete() error

Delete removes the state file from disk

func (*State) DeleteWithInbox added in v0.17.0

func (s *State) DeleteWithInbox() error

DeleteWithInbox removes both the state file and the per-state inbox file. Best-effort: both deletes are attempted even if one fails. The first non-nil error is returned so the caller can log it; both deletes are tried regardless. Idempotent — missing files are not errors.

Used by daemon shutdown and the reaper (CF-549 F-up A) so the two-file cleanup is consistent and a single failure can't strand the other file.

func (*State) IsDaemonRunning

func (s *State) IsDaemonRunning() bool

IsDaemonRunning checks if the daemon process is still alive

func (*State) IsParentRunning

func (s *State) IsParentRunning() bool

IsParentRunning checks if the parent Claude Code process is still alive

func (*State) Save

func (s *State) Save() error

Save writes the state to disk

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL