Documentation
¶
Overview ¶
Package lifecycle defines the shared vocabulary used by long-running toolsets (MCP servers, remote MCP, LSP servers): typed error sentinels, a State enum, a Tracker, and a Supervisor that drives a Connector through connect / watch / restart / stop.
Index ¶
- Variables
- func Classify(err error) error
- func IsPermanent(err error) bool
- func IsTransient(err error) bool
- type Backoff
- type Connector
- type Policy
- type Restart
- type Session
- type State
- type StateInfo
- type Supervisor
- func (s *Supervisor) IsReady() bool
- func (s *Supervisor) MarkReadyForTesting()
- func (s *Supervisor) RestartAndWait(ctx context.Context, timeout time.Duration) error
- func (s *Supervisor) Restarted() <-chan struct{}
- func (s *Supervisor) Start(ctx context.Context) error
- func (s *Supervisor) State() StateInfo
- func (s *Supervisor) Stop(ctx context.Context) error
- type Tracker
Constants ¶
This section is empty.
Variables ¶
var ( // ErrTransport is a transport-level failure (connection lost or never // established). Usually restartable. ErrTransport = errors.New("transport failure") // (binary missing, immediate EOF on stdin, connection refused). // Restartable on a slower cadence. ErrServerUnavailable = errors.New("server unavailable") // ErrServerCrashed means the process started but exited unexpectedly. // Restartable per policy. ErrServerCrashed = errors.New("server crashed") // ErrInitTimeout means the initialize handshake did not complete // within the configured deadline. ErrInitTimeout = errors.New("initialize timed out") // ErrInitNotification means the server accepted initialize but the // client failed to send the followup "initialized" notification. // Retryable transient documented upstream. ErrInitNotification = errors.New("failed to send initialized notification") // ErrCapabilityMissing means the server doesn't advertise a required // capability. Restarting won't help; supervisor should not retry. ErrCapabilityMissing = errors.New("capability not supported") // ErrAuthRequired means OAuth (or similar) is required. Supervisor // should park, not loop; resumption happens after the user authenticates. ErrAuthRequired = errors.New("authentication required") // ErrSessionMissing means the server lost the client's session // (e.g. a remote MCP server restarted). Force a reconnect. ErrSessionMissing = errors.New("session missing") // ErrNotStarted means an operation was attempted on a toolset that has // not yet successfully started. ErrNotStarted = errors.New("toolset not started") )
Sentinel errors used to classify failures across MCP and LSP transports.
Concrete transports wrap their underlying SDK errors with these (via Classify) so supervisors can decide policy via errors.Is rather than substring matching. New error categories should be added here rather than as ad-hoc strings.
Functions ¶
func Classify ¶
Classify maps a transport-level error (stdio MCP, remote MCP, LSP) to one of the typed sentinels in this package, wrapping it so errors.Is matches both the sentinel and the original error.
Already-classified errors (any wrapping a sentinel via errors.Is) are returned unchanged. Unknown errors are returned as-is so callers can decide their own policy.
Substring matching is used as a last resort because some upstream SDKs wrap their errors with %v (which drops the chain).
func IsPermanent ¶
IsPermanent reports whether err wraps a sentinel that must NOT be retried (currently ErrCapabilityMissing and ErrAuthRequired).
func IsTransient ¶
IsTransient reports whether err wraps a sentinel that warrants a retry.
Types ¶
type Backoff ¶
type Backoff struct {
Initial time.Duration // first wait (default 1s)
Max time.Duration // cap (default 32s)
Multiplier float64 // (default 2.0)
Jitter float64 // 0..1 fraction; 0 disables (default)
}
Backoff parameters for restart attempts. Zero values default to 1s..32s exponential (matching historical MCP behaviour).
type Connector ¶
type Connector interface {
// Connect establishes a new underlying connection (e.g. spawns a
// process, dials HTTP, runs the initialize handshake). The returned
// Session is owned by the supervisor; the supervisor calls Close on
// it. Errors should be classified via Classify so the supervisor can
// apply policy via errors.Is.
Connect(ctx context.Context) (Session, error)
}
Connector creates new sessions for a Supervisor. Implementations are transport-specific: stdio MCP, remote MCP, LSP stdio.
type Policy ¶
type Policy struct {
Restart Restart // see Restart constants; default RestartOnFailure
MaxAttempts int // 0 = default (5); negative = unlimited
Backoff Backoff // zero fields use Backoff defaults
// OnDisconnect is called when the session ends, with Wait()'s result.
// Useful for cache invalidation.
OnDisconnect func(err error)
// OnRestart is called after each successful reconnect. Useful for
// re-fetching server-side state (tools, prompts).
OnRestart func()
// OnFailed is called once when the supervisor enters StateFailed.
OnFailed func(err error)
// Logger is used for lifecycle logs. Defaults to slog.Default().
Logger *slog.Logger
}
Policy controls how a Supervisor manages a connection over time. The zero value gives the historical mcp.Toolset behaviour: RestartOnFailure, 5 attempts, 1s..32s backoff, no jitter, no callbacks.
type Restart ¶
type Restart int
Restart controls how the Supervisor reacts to an unexpected disconnect.
const ( // RestartOnFailure reconnects after a non-nil Wait result or a forced // reconnect via RestartAndWait. Default; matches historical mcp.Toolset. RestartOnFailure Restart = iota // RestartNever transitions to Failed when the session ends. RestartNever // RestartAlways reconnects even after a clean (nil) Wait result. RestartAlways )
type Session ¶
Session is the supervisor's view of an active connection. Wait blocks until the session ends; Close terminates it. Close must be idempotent and safe to call concurrently with an in-flight Wait.
type State ¶
type State int32
State is the high-level lifecycle state of a toolset, surfaced in logs, the TUI, and OTel attributes.
State machine:
Stopped ──Start()──▶ Starting ──ok──▶ Ready
▲ │ err │ Wait()/Close()
│ ▼ ▼
└─────── Stop() ── Failed ◀──── Restarting ──ok──▶ Ready
▲ │
└── budget ────┘
Degraded is a transient state used when a Ready toolset starts failing health checks but has not yet been demoted by the supervisor.
const ( // StateStopped is the initial state and the post-Stop state. StateStopped State = iota // StateStarting is set during the first connect/initialize handshake. StateStarting // StateReady means the toolset is connected, initialized, and serving. StateReady // StateDegraded means usable but the last health check or call failed. StateDegraded // StateRestarting means the supervisor is reconnecting after a failure. StateRestarting // StateFailed means the supervisor has given up restarting. StateFailed )
func (State) IsTerminal ¶
IsTerminal reports whether s requires external action (Start/Restart/Stop) to leave.
type Supervisor ¶
type Supervisor struct {
// contains filtered or unexported fields
}
Supervisor manages the lifecycle of a single connection: initial connect, watcher goroutine, restart with backoff, graceful Stop. It is the shared implementation for MCP (stdio + remote) and LSP transports; per-transport behaviour is captured in the Connector.
Supervisor is safe for concurrent use.
func New ¶
func New(name string, connector Connector, policy Policy) *Supervisor
New returns a Supervisor that drives connector with policy. The name is used in lifecycle log messages and should uniquely identify the toolset.
func (*Supervisor) IsReady ¶
func (s *Supervisor) IsReady() bool
IsReady reports whether the supervisor is in a state that should serve requests (Ready or Degraded).
func (*Supervisor) MarkReadyForTesting ¶
func (s *Supervisor) MarkReadyForTesting()
MarkReadyForTesting forces the supervisor into StateReady without going through Connect. Test-only backdoor; production code must not call this.
func (*Supervisor) RestartAndWait ¶
RestartAndWait closes the current session (if any) so the watcher reconnects, then blocks until the next successful reconnect, ctx cancellation, supervisor shutdown (Stop or Failed), or timeout.
RestartAndWait does NOT recover from a terminal state (Stopped/Failed): callers that want "restart even if Failed" should consult State() and call Start when terminal. The Toolset.Restart wrappers in pkg/tools/mcp and pkg/tools/builtin do exactly that.
func (*Supervisor) Restarted ¶
func (s *Supervisor) Restarted() <-chan struct{}
Restarted returns a channel closed the next time the supervisor completes a successful restart. The channel is replaced after each restart, so callers should re-read it on each new wait.
func (*Supervisor) Start ¶
func (s *Supervisor) Start(ctx context.Context) error
Start performs the initial connect. On Connector error the supervisor stays in StateStopped and the caller is expected to retry. On success the watcher goroutine is launched (if not already alive) and state moves to Ready. Concurrent Start calls serialize.
func (*Supervisor) State ¶
func (s *Supervisor) State() StateInfo
State returns a snapshot of the supervisor's current state.
type Tracker ¶
type Tracker struct {
// contains filtered or unexported fields
}
Tracker is a small thread-safe state holder shared by all transports. It records the current state, transition time, last error, and a restart counter. Transition validity is the supervisor's job.
The zero value is valid and starts in StateStopped.
func NewTracker ¶
func NewTracker() *Tracker
NewTracker returns a Tracker initialised in StateStopped.
func (*Tracker) Fail ¶
Fail transitions to s and records err as the last error. Use this when entering Failed/Restarting after a failure so the supervisor's snapshot surfaces a useful message.
func (*Tracker) IncRestarts ¶
IncRestarts increments the restart counter and returns the new value.
func (*Tracker) LastError ¶
LastError returns the most recent error recorded by Fail, or nil if a clean Set has happened since.
func (*Tracker) ResetRestarts ¶
func (t *Tracker) ResetRestarts()
ResetRestarts zeroes the restart counter. Called after a sustained Ready period to forget transient incidents.
func (*Tracker) Set ¶
Set transitions to s, recording the transition time and clearing the last error. Same-state Set is a no-op (preserves Since/LastError).