workerheal

package

v1.22.1 Latest Latest Go to latest Published: May 27, 2026 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/geodro/lerd

Links

Documentation ¶

Rendered for

Overview ¶

Package workerheal detects and recovers worker units stuck in systemd's "failed" state. The detector is deliberately cheap — it walks the existing batched unit-state cache shared with the dashboard, so polling stays free even on busy installs. The healer is a single primitive: reset-failed + start. It never writes .lerd.yaml or rewrites unit files; that belongs to `lerd worker add/remove/start/stop` and `lerd init`.

Index ¶

func HealUnit(unit string) error
func Summary(r Result) string
type Event
type Failure
type Result
- func HealAll(emit func(Event)) (Result, error)
type UnhealthyWorker
- func Detect() ([]UnhealthyWorker, error)
- func Enrich(in []UnhealthyWorker) []UnhealthyWorker

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func HealUnit ¶

func HealUnit(unit string) error

HealUnit clears any failed state and starts the named worker unit. The single "fix this" primitive — every surface (CLI / UI / TUI / MCP) goes through here. Crucially, it does NOT touch .lerd.yaml or rewrite the unit file: a failed worker is a transient runtime condition, not a change of user intent. The reset-failed step is implicit: on Linux, systemd.DBusStartUnit calls DBusResetFailed first; on macOS launchd's bootstrap path replaces the job entirely.

func Summary ¶

func Summary(r Result) string

Summary renders a one-line CLI-friendly summary of a Result.

Types ¶

type Event ¶

type Event struct {
	Phase string `json:"phase"` // "starting" | "healed" | "failed" | "done"
	Site  string `json:"site,omitempty"`
	Unit  string `json:"unit,omitempty"`
	Error string `json:"error,omitempty"`
}

Event is one line in the streaming heal report. Dashboard, MCP, and TUI all consume these so progress is visible without polling.

type Failure ¶

type Failure struct {
	Worker UnhealthyWorker `json:"worker"`
	Err    string          `json:"error"`
}

Failure is one heal attempt that errored.

type Result ¶

type Result struct {
	Healed []UnhealthyWorker `json:"healed"`
	Failed []Failure         `json:"failed"`
}

Result is the aggregate report for non-streaming callers.

func HealAll ¶

func HealAll(emit func(Event)) (Result, error)

HealAll detects every unhealthy worker and heals them in order. emit, when non-nil, receives one Event per phase transition so the dashboard's banner and the MCP tool can stream progress instead of blocking on a final summary.

type UnhealthyWorker ¶

type UnhealthyWorker struct {
	Site      string `json:"site"`
	Worker    string `json:"worker"`
	Unit      string `json:"unit"`
	State     string `json:"state"` // "failed" today; reserve for future "start-limit-hit", "expected-but-stopped"
	LastError string `json:"last_error,omitempty"`
}

UnhealthyWorker is a single failing/stuck worker unit.

func Detect ¶

func Detect() ([]UnhealthyWorker, error)

Detect returns every worker unit systemd considers "failed". Cheap by design: it reads only the existing batched unit-state cache (one systemctl call per 3s, shared with the dashboard's enrichment path) plus sites.yaml. No per-site .lerd.yaml or composer.json reads, no extra subprocess calls. Safe to invoke from a hot endpoint.

Heuristic kept narrow on purpose: worker units that hit Restart= rate limits or crash repeatedly land in "failed" and stay there until something resets them. "Inactive" is too broad — users routinely stop individual workers on purpose, and we can't tell intent apart from drift without an explicit per-worker desired-state field.

func Enrich ¶

func Enrich(in []UnhealthyWorker) []UnhealthyWorker

Enrich populates LastError on every entry by reading the journal once per unit. Walks in slice order until the per-call budget is hit, leaving any remaining entries' LastError empty. Safe with a nil or empty slice. Intended for the dashboard pre-serialization step where there are typically 0–3 entries, so the budget is rarely exercised.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL