workerheal

package
v1.22.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package workerheal detects and recovers worker units stuck in systemd's "failed" state. The detector is deliberately cheap — it walks the existing batched unit-state cache shared with the dashboard, so polling stays free even on busy installs. The healer is a single primitive: reset-failed + start. It never writes .lerd.yaml or rewrites unit files; that belongs to `lerd worker add/remove/start/stop` and `lerd init`.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HealUnit

func HealUnit(unit string) error

HealUnit clears any failed state and starts the named worker unit. The single "fix this" primitive — every surface (CLI / UI / TUI / MCP) goes through here. Crucially, it does NOT touch .lerd.yaml or rewrite the unit file: a failed worker is a transient runtime condition, not a change of user intent. The reset-failed step is implicit: on Linux, systemd.DBusStartUnit calls DBusResetFailed first; on macOS launchd's bootstrap path replaces the job entirely.

func Summary

func Summary(r Result) string

Summary renders a one-line CLI-friendly summary of a Result.

Types

type Event

type Event struct {
	Phase string `json:"phase"` // "starting" | "healed" | "failed" | "done"
	Site  string `json:"site,omitempty"`
	Unit  string `json:"unit,omitempty"`
	Error string `json:"error,omitempty"`
}

Event is one line in the streaming heal report. Dashboard, MCP, and TUI all consume these so progress is visible without polling.

type Failure

type Failure struct {
	Worker UnhealthyWorker `json:"worker"`
	Err    string          `json:"error"`
}

Failure is one heal attempt that errored.

type Result

type Result struct {
	Healed []UnhealthyWorker `json:"healed"`
	Failed []Failure         `json:"failed"`
}

Result is the aggregate report for non-streaming callers.

func HealAll

func HealAll(emit func(Event)) (Result, error)

HealAll detects every unhealthy worker and heals them in order. emit, when non-nil, receives one Event per phase transition so the dashboard's banner and the MCP tool can stream progress instead of blocking on a final summary.

type UnhealthyWorker

type UnhealthyWorker struct {
	Site      string `json:"site"`
	Worker    string `json:"worker"`
	Unit      string `json:"unit"`
	State     string `json:"state"` // "failed" today; reserve for future "start-limit-hit", "expected-but-stopped"
	LastError string `json:"last_error,omitempty"`
}

UnhealthyWorker is a single failing/stuck worker unit.

func Detect

func Detect() ([]UnhealthyWorker, error)

Detect returns every worker unit systemd considers "failed". Cheap by design: it reads only the existing batched unit-state cache (one systemctl call per 3s, shared with the dashboard's enrichment path) plus sites.yaml. No per-site .lerd.yaml or composer.json reads, no extra subprocess calls. Safe to invoke from a hot endpoint.

Heuristic kept narrow on purpose: worker units that hit Restart= rate limits or crash repeatedly land in "failed" and stay there until something resets them. "Inactive" is too broad — users routinely stop individual workers on purpose, and we can't tell intent apart from drift without an explicit per-worker desired-state field.

func Enrich

func Enrich(in []UnhealthyWorker) []UnhealthyWorker

Enrich populates LastError on every entry by reading the journal once per unit. Walks in slice order until the per-call budget is hit, leaving any remaining entries' LastError empty. Safe with a nil or empty slice. Intended for the dashboard pre-serialization step where there are typically 0–3 entries, so the budget is rarely exercised.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL