lifecycle

package
v1.54.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package lifecycle defines the shared vocabulary used by long-running toolsets (MCP servers, remote MCP, LSP servers): typed error sentinels, a State enum, a Tracker, and a Supervisor that drives a Connector through connect / watch / restart / stop.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrTransport is a transport-level failure (connection lost or never
	// established). Usually restartable.
	ErrTransport = errors.New("transport failure")

	// ErrServerUnavailable means the server could not be reached at all
	// (binary missing, immediate EOF on stdin, connection refused).
	// Restartable on a slower cadence.
	ErrServerUnavailable = errors.New("server unavailable")

	// ErrServerCrashed means the process started but exited unexpectedly.
	// Restartable per policy.
	ErrServerCrashed = errors.New("server crashed")

	// ErrInitTimeout means the initialize handshake did not complete
	// within the configured deadline.
	ErrInitTimeout = errors.New("initialize timed out")

	// ErrInitNotification means the server accepted initialize but the
	// client failed to send the followup "initialized" notification.
	// Retryable transient documented upstream.
	ErrInitNotification = errors.New("failed to send initialized notification")

	// ErrCapabilityMissing means the server doesn't advertise a required
	// capability. Restarting won't help; supervisor should not retry.
	ErrCapabilityMissing = errors.New("capability not supported")

	// ErrAuthRequired means OAuth (or similar) is required. Supervisor
	// should park, not loop; resumption happens after the user authenticates.
	ErrAuthRequired = errors.New("authentication required")

	// ErrSessionMissing means the server lost the client's session
	// (e.g. a remote MCP server restarted). Force a reconnect.
	ErrSessionMissing = errors.New("session missing")

	// ErrNotStarted means an operation was attempted on a toolset that has
	// not yet successfully started.
	ErrNotStarted = errors.New("toolset not started")
)

Sentinel errors used to classify failures across MCP and LSP transports.

Concrete transports wrap their underlying SDK errors with these (via Classify) so supervisors can decide policy via errors.Is rather than substring matching. New error categories should be added here rather than as ad-hoc strings.

Functions

func Classify

func Classify(err error) error

Classify maps a transport-level error (stdio MCP, remote MCP, LSP) to one of the typed sentinels in this package, wrapping it so errors.Is matches both the sentinel and the original error.

Already-classified errors (any wrapping a sentinel via errors.Is) are returned unchanged. Unknown errors are returned as-is so callers can decide their own policy.

Substring matching is used as a last resort because some upstream SDKs wrap their errors with %v (which drops the chain).

func IsPermanent

func IsPermanent(err error) bool

IsPermanent reports whether err wraps a sentinel that must NOT be retried (currently ErrCapabilityMissing and ErrAuthRequired).

func IsTransient

func IsTransient(err error) bool

IsTransient reports whether err wraps a sentinel that warrants a retry.

Types

type Backoff

type Backoff struct {
	Initial    time.Duration // first wait (default 1s)
	Max        time.Duration // cap (default 32s)
	Multiplier float64       // (default 2.0)
	Jitter     float64       // 0..1 fraction; 0 disables (default)
}

Backoff parameters for restart attempts. Zero values default to 1s..32s exponential (matching historical MCP behaviour).

type Connector

type Connector interface {
	// Connect establishes a new underlying connection (e.g. spawns a
	// process, dials HTTP, runs the initialize handshake). The returned
	// Session is owned by the supervisor; the supervisor calls Close on
	// it. Errors should be classified via Classify so the supervisor can
	// apply policy via errors.Is.
	Connect(ctx context.Context) (Session, error)
}

Connector creates new sessions for a Supervisor. Implementations are transport-specific: stdio MCP, remote MCP, LSP stdio.

type Policy

type Policy struct {
	Restart     Restart // see Restart constants; default RestartOnFailure
	MaxAttempts int     // 0 = default (5); negative = unlimited
	Backoff     Backoff // zero fields use Backoff defaults

	// OnDisconnect is called when the session ends, with Wait()'s result.
	// Useful for cache invalidation.
	OnDisconnect func(err error)
	// OnRestart is called after each successful reconnect. Useful for
	// re-fetching server-side state (tools, prompts).
	OnRestart func()
	// OnFailed is called once when the supervisor enters StateFailed.
	OnFailed func(err error)

	// Logger is used for lifecycle logs. Defaults to slog.Default().
	Logger *slog.Logger
}

Policy controls how a Supervisor manages a connection over time. The zero value gives the historical mcp.Toolset behaviour: RestartOnFailure, 5 attempts, 1s..32s backoff, no jitter, no callbacks.

type Restart

type Restart int

Restart controls how the Supervisor reacts to an unexpected disconnect.

const (
	// RestartOnFailure reconnects after a non-nil Wait result or a forced
	// reconnect via RestartAndWait. Default; matches historical mcp.Toolset.
	RestartOnFailure Restart = iota
	// RestartNever transitions to Failed when the session ends.
	RestartNever
	// RestartAlways reconnects even after a clean (nil) Wait result.
	RestartAlways
)

type Session

type Session interface {
	Wait() error
	Close(ctx context.Context) error
}

Session is the supervisor's view of an active connection. Wait blocks until the session ends; Close terminates it. Close must be idempotent and safe to call concurrently with an in-flight Wait.

type State

type State int32

State is the high-level lifecycle state of a toolset, surfaced in logs, the TUI, and OTel attributes.

State machine:

Stopped ──Start()──▶ Starting ──ok──▶ Ready
   ▲                    │ err          │ Wait()/Close()
   │                    ▼              ▼
   └─────── Stop() ── Failed ◀──── Restarting ──ok──▶ Ready
                       ▲              │
                       └── budget ────┘

Degraded is a transient state used when a Ready toolset starts failing health checks but has not yet been demoted by the supervisor.

const (
	// StateStopped is the initial state and the post-Stop state.
	StateStopped State = iota
	// StateStarting is set during the first connect/initialize handshake.
	StateStarting
	// StateReady means the toolset is connected, initialized, and serving.
	StateReady
	// StateDegraded means usable but the last health check or call failed.
	StateDegraded
	// StateRestarting means the supervisor is reconnecting after a failure.
	StateRestarting
	// StateFailed means the supervisor has given up restarting.
	StateFailed
)

func (State) IsTerminal

func (s State) IsTerminal() bool

IsTerminal reports whether s requires external action (Start/Restart/Stop) to leave.

func (State) IsUsable

func (s State) IsUsable() bool

IsUsable reports whether the toolset can serve requests in this state (Ready or Degraded).

func (State) String

func (s State) String() string

String returns a short, lowercase human-readable name. Out-of-range or negative State values fall back to "state(N)" instead of panicking on the lookup array.

type StateInfo

type StateInfo struct {
	State        State
	Since        time.Time
	LastError    error
	RestartCount int
}

StateInfo is a copyable snapshot of a Tracker.

type Supervisor

type Supervisor struct {
	// contains filtered or unexported fields
}

Supervisor manages the lifecycle of a single connection: initial connect, watcher goroutine, restart with backoff, graceful Stop. It is the shared implementation for MCP (stdio + remote) and LSP transports; per-transport behaviour is captured in the Connector.

Supervisor is safe for concurrent use.

func New

func New(name string, connector Connector, policy Policy) *Supervisor

New returns a Supervisor that drives connector with policy. The name is used in lifecycle log messages and should uniquely identify the toolset.

func (*Supervisor) IsReady

func (s *Supervisor) IsReady() bool

IsReady reports whether the supervisor is in a state that should serve requests (Ready or Degraded).

func (*Supervisor) MarkReadyForTesting

func (s *Supervisor) MarkReadyForTesting()

MarkReadyForTesting forces the supervisor into StateReady without going through Connect. Test-only backdoor; production code must not call this.

func (*Supervisor) RestartAndWait

func (s *Supervisor) RestartAndWait(ctx context.Context, timeout time.Duration) error

RestartAndWait closes the current session (if any) so the watcher reconnects, then blocks until the next successful reconnect, ctx cancellation, supervisor shutdown (Stop or Failed), or timeout.

RestartAndWait does NOT recover from a terminal state (Stopped/Failed): callers that want "restart even if Failed" should consult State() and call Start when terminal. The Toolset.Restart wrappers in pkg/tools/mcp and pkg/tools/builtin do exactly that.

func (*Supervisor) Restarted

func (s *Supervisor) Restarted() <-chan struct{}

Restarted returns a channel closed the next time the supervisor completes a successful restart. The channel is replaced after each restart, so callers should re-read it on each new wait.

func (*Supervisor) Start

func (s *Supervisor) Start(ctx context.Context) error

Start performs the initial connect. On Connector error the supervisor stays in StateStopped and the caller is expected to retry. On success the watcher goroutine is launched (if not already alive) and state moves to Ready. Concurrent Start calls serialize.

func (*Supervisor) State

func (s *Supervisor) State() StateInfo

State returns a snapshot of the supervisor's current state.

func (*Supervisor) Stop

func (s *Supervisor) Stop(ctx context.Context) error

Stop tears the supervisor down. Idempotent. Blocks until the underlying session is closed.

type Tracker

type Tracker struct {
	// contains filtered or unexported fields
}

Tracker is a small thread-safe state holder shared by all transports. It records the current state, transition time, last error, and a restart counter. Transition validity is the supervisor's job.

The zero value is valid and starts in StateStopped.

func NewTracker

func NewTracker() *Tracker

NewTracker returns a Tracker initialised in StateStopped.

func (*Tracker) Fail

func (t *Tracker) Fail(s State, err error)

Fail transitions to s and records err as the last error. Use this when entering Failed/Restarting after a failure so the supervisor's snapshot surfaces a useful message.

func (*Tracker) IncRestarts

func (t *Tracker) IncRestarts() int

IncRestarts increments the restart counter and returns the new value.

func (*Tracker) LastError

func (t *Tracker) LastError() error

LastError returns the most recent error recorded by Fail, or nil if a clean Set has happened since.

func (*Tracker) ResetRestarts

func (t *Tracker) ResetRestarts()

ResetRestarts zeroes the restart counter. Called after a sustained Ready period to forget transient incidents.

func (*Tracker) Set

func (t *Tracker) Set(s State)

Set transitions to s, recording the transition time and clearing the last error. Same-state Set is a no-op (preserves Since/LastError).

func (*Tracker) Snapshot

func (t *Tracker) Snapshot() StateInfo

Snapshot returns a point-in-time copy of the tracker.

func (*Tracker) State

func (t *Tracker) State() State

State returns the current state.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL