supervisor

package

v0.0.0-...-8fcbe07 Latest Latest Go to latest Published: Nov 22, 2025 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/monogon/monogon

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func LogTree(ctx context.Context) *logtree.LogTree
func Logger(ctx context.Context) logging.Leveled
func MustSubLogger(ctx context.Context, name string) logging.Leveled
func New(ctx context.Context, rootRunnable Runnable, opts ...SupervisorOpt) *supervisor
func RawLogger(ctx context.Context) io.Writer
func Run(ctx context.Context, name string, runnable Runnable) error
func RunCommand(ctx context.Context, cmd *exec.Cmd, opts ...RunCommandOption) error
func RunGroup(ctx context.Context, runnables map[string]Runnable) error
func Signal(ctx context.Context, signal SignalType)
func SubLogger(ctx context.Context, name string) (logging.Leveled, error)
func TestHarness(t testing.TB, r func(ctx context.Context) error) (context.CancelFunc, *logtree.LogTree)
func WithPropagatePanic(s *supervisor)
type DNState
type InMemoryMetrics
- func (m *InMemoryMetrics) DNs() map[string]DNState
- func (m *InMemoryMetrics) NotifyNodeState(dn string, state NodeState)
type Metrics
type MetricsPrometheus
- func NewMetricsPrometheus(registry *prometheus.Registry) *MetricsPrometheus
- func (m *MetricsPrometheus) NotifyNodeState(dn string, state NodeState)
type NodeState
- func (s NodeState) String() string
type RunCommandOption
- func ParseKLog() RunCommandOption
- func SignalChan(s <-chan os.Signal) RunCommandOption
type Runnable
- func GRPCServer(srv *grpc.Server, lis net.Listener, graceful bool) Runnable
type SignalType
type SupervisorOpt
- func WithExistingLogtree(lt *logtree.LogTree) SupervisorOpt
- func WithMetrics(m Metrics) SupervisorOpt

Examples ¶

New

Constants ¶

This section is empty.

Variables ¶

View Source

var NodeStates = []NodeState{
	NodeStateNew,
	NodeStateHealthy,
	NodeStateDead,
	NodeStateDone,
	NodeStateCanceled,
}

NodeStates is a list of all possible values of a NodeState.

Functions ¶

func LogTree ¶

func LogTree(ctx context.Context) *logtree.LogTree

LogTree returns the LogTree used by this supervisor instance. This should only be used for reading logs. For writing logs use SubLogger instead.

func Logger ¶

func Logger(ctx context.Context) logging.Leveled

func MustSubLogger ¶

func MustSubLogger(ctx context.Context, name string) logging.Leveled

MustSubLogger is a wrapper around SubLogger which panics on error. Errors should only happen due to invalid names, so as long as the given name is compile-time constant and valid, this function is safe to use.

func New ¶

func New(ctx context.Context, rootRunnable Runnable, opts ...SupervisorOpt) *supervisor

New creates a new supervisor with its root running the given root runnable. The given context can be used to cancel the entire supervision tree.

For tests, we reccomend using TestHarness instead, which will also stream logs to stderr and take care of propagating root runnable errors to the test output.

Example ¶

// Minimal runnable that is immediately done.
childC := make(chan struct{})
child := func(ctx context.Context) error {
	Signal(ctx, SignalHealthy)
	close(childC)
	Signal(ctx, SignalDone)
	return nil
}

// Start a supervision tree with a root runnable.
ctx, ctxC := context.WithCancel(context.Background())
defer ctxC()
New(ctx, func(ctx context.Context) error {
	err := Run(ctx, "child", child)
	if err != nil {
		return fmt.Errorf("could not run 'child': %w", err)
	}
	Signal(ctx, SignalHealthy)

	t := time.NewTicker(time.Second)
	defer t.Stop()

	// Do something in the background, and exit on context cancel.
	for {
		select {
		case <-t.C:
			fmt.Printf("tick!")
		case <-ctx.Done():
			return ctx.Err()
		}
	}
})

// root.child will close this channel.
<-childC

func RawLogger ¶

func RawLogger(ctx context.Context) io.Writer

func Run ¶

func Run(ctx context.Context, name string, runnable Runnable) error

Run starts a single runnable in its own group.

func RunCommand ¶

func RunCommand(ctx context.Context, cmd *exec.Cmd, opts ...RunCommandOption) error

RunCommand will create a Runnable that starts a long-running command, whose exit is determined to be a failure. cmd should be created with exec.CommandContext so that it will be killed when the context is canceled.

func RunGroup ¶

func RunGroup(ctx context.Context, runnables map[string]Runnable) error

RunGroup starts a set of runnables as a group. These runnables will run together, and if any one of them quits unexpectedly, the result will be canceled and restarted. The context here must be an existing Runnable context, and the spawned runnables will run under the node that this context represents.

func Signal ¶

func Signal(ctx context.Context, signal SignalType)

Signal tells the supervisor that the calling runnable has reached a certain state of its lifecycle. All runnables should SignalHealthy when they are ready with set up, running other child runnables and are now 'serving'.

func SubLogger ¶

func SubLogger(ctx context.Context, name string) (logging.Leveled, error)

SubLogger returns a LeveledLogger for a given name. The name is used to placed that logger within the logtree hierarchy. For example, if the runnable `root.foo` requests a SubLogger for name `bar`, the returned logger will log to `root.foo.bar` in the logging tree.

An error is returned if the given name is invalid or conflicts with a child runnable of the current runnable. In addition, whenever a node uses a sub-logger with a given name, that name also becomes unavailable for use as a child runnable (no runnable and sub-logger may ever log into the same logtree DN).

func TestHarness ¶

func TestHarness(t testing.TB, r func(ctx context.Context) error) (context.CancelFunc, *logtree.LogTree)

TestHarness runs a supervisor in a harness designed for unit testing runnables and runnable trees.

The given runnable will be run in a new supervisor, and the logs from this supervisor will be streamed to stderr. If the runnable returns a non-context error, the harness will throw a test error, but will not abort the test.

The harness also returns a context cancel function that can be used to terminate the started supervisor early. Regardless of manual cancellation, the supervisor will always be terminated up at the end of the test/benchmark it's running in. The supervision tree will also be cleaned up and the test will block until all runnables have exited.

The second returned value is the logtree used by this supervisor. It can be used to assert some log messages are emitted in tests that exercise some log-related functionality.

func WithPropagatePanic ¶

func WithPropagatePanic(s *supervisor)

WithPropagatePanic prevents the Supervisor from catching panics in runnables and treating them as failures. This is useful to enable for testing and local debugging.

Types ¶

type DNState ¶

type DNState struct {
	// State is the current state of the runnable.
	State NodeState
	// Transition is the time at which the runnable reached its State.
	Transition time.Time
}

DNState is the state of a supervisor runnable, recorded alongside a timestamp of when the State changed.

type InMemoryMetrics ¶

type InMemoryMetrics struct {
	// contains filtered or unexported fields
}

InMemoryMetrics is a simple Metrics implementation that keeps an in-memory mirror of the state of all DNs in the supervisor. The zero value for InMemoryMetrics is ready to use.

func (*InMemoryMetrics) DNs ¶

func (m *InMemoryMetrics) DNs() map[string]DNState

DNs returns a copy (snapshot in time) of the recorded DN states, in a map from DN to DNState. The returned value can be mutated.

func (*InMemoryMetrics) NotifyNodeState ¶

func (m *InMemoryMetrics) NotifyNodeState(dn string, state NodeState)

type Metrics ¶

type Metrics interface {
	// NotifyNodeState is called whenever a given runnable at a given DN changes
	// state. Called synchronously from the supervisor's processor loop, so must not
	// block, but is also guaranteed to only be called from a single goroutine.
	NotifyNodeState(dn string, state NodeState)
}

Metrics is an interface from the supervisor to any kind of metrics-collecting component.

type MetricsPrometheus ¶

type MetricsPrometheus struct {
	// contains filtered or unexported fields
}

MetricsPrometheus is a Metrics implementation which exports the supervisor metrics over some prometheus registry.

This structure must be constructed with NewMetricsPrometheus.

The metrics exported are:

monogon_supervisor_dn_state_total
monogon_superfisor_dn_state_transition_count

func NewMetricsPrometheus ¶

func NewMetricsPrometheus(registry *prometheus.Registry) *MetricsPrometheus

NewMetricsPrometheus initializes Supervisor metrics in a prometheus registry and return a Metrics instance to be used with WithMetrics.

This should only be called once for a given registry.

func (*MetricsPrometheus) NotifyNodeState ¶

func (m *MetricsPrometheus) NotifyNodeState(dn string, state NodeState)

type NodeState ¶

type NodeState int

NodeState is the state of a runnable within a node, and in a way the node itself. This follows the state diagram from go/supervision.

const (
	// A node that has just been created, and whose runnable has been started
	// already but hasn't signaled anything yet.
	NodeStateNew NodeState = iota
	// A node whose runnable has signaled being healthy - this means it's ready
	// to serve/act.
	NodeStateHealthy
	// A node that has unexpectedly returned or panicked.
	NodeStateDead
	// A node that has declared that its done with its work and should not be
	// restarted, unless a supervision tree failure requires that.
	NodeStateDone
	// A node that has returned after being requested to cancel.
	NodeStateCanceled
)

func (NodeState) String ¶

func (s NodeState) String() string

type RunCommandOption ¶

type RunCommandOption struct {
	// contains filtered or unexported fields
}

func ParseKLog ¶

func ParseKLog() RunCommandOption

ParseKLog signals that the command being run will return klog-compatible logs to stdout and/or stderr, and these will be re-interpreted as structured logging and emitted to the supervisor's logger.

func SignalChan ¶

func SignalChan(s <-chan os.Signal) RunCommandOption

SignalChan takes a channel which can be used to send signals to the supervised process.

The given channel will be read from as long as the underlying process is running. If the process doesn't start successfully the channel will not be read. When the process exits, the channel will stop being read.

With the above in mind, and also taking into account the inherent lack of reliability in delivering any process-handled signals in POSIX/Linux, it is recommended to use unbuffered channels, always write to them in a non-blocking fashion (eg. in a select { ... default: } block), and to not rely only on the signal delivery mechanism for the intended behaviour.

For example, if the signals are used to trigger some configuration reload, these configuration reloads should either be verified and signal delivery should be retried until confirmed successful, or there should be a backup periodic reload performed by the target process independently of signal-based reload triggers.

Another example: if the signal delivered is a SIGTERM used to gracefully terminate some process, it should be attempted to be delivered a number of times before finally SIGKILLing the process.

type Runnable ¶

type Runnable func(ctx context.Context) error

A Runnable is a function that will be run in a goroutine, and supervised throughout its lifetime. It can in turn start more runnables as its children, and those will form part of a supervision tree. The context passed to a runnable is very important and needs to be handled properly. It will be live (non-errored) as long as the runnable should be running, and canceled (ctx.Err() will be non-nil) when the supervisor wants it to exit. This means this context is also perfectly usable for performing any blocking operations.

func GRPCServer ¶

func GRPCServer(srv *grpc.Server, lis net.Listener, graceful bool) Runnable

GRPCServer creates a Runnable that serves gRPC requests as longs as it's not canceled. If graceful is set to true, the server will be gracefully stopped instead of plain stopped. This means all pending RPCs will finish, but also requires streaming gRPC handlers to check their context liveliness and exit accordingly. If the server code does not support this, `graceful` should be false and the server will be killed violently instead.

type SignalType ¶

type SignalType int

const (
	// The runnable is healthy, done with setup, done with spawning more
	// Runnables, and ready to serve in a loop.  The runnable needs to check
	// the parent context and ensure that if that context is done, the runnable
	// exits.
	SignalHealthy SignalType = iota
	// The runnable is done - it does not need to run any loop. This is useful
	// for Runnables that only set up other child runnables. This runnable will
	// be restarted if a related failure happens somewhere in the supervision
	// tree.
	SignalDone
)

type SupervisorOpt ¶

type SupervisorOpt func(s *supervisor)

SupervisorOpt are runtime configurable options for the supervisor.

func WithExistingLogtree ¶

func WithExistingLogtree(lt *logtree.LogTree) SupervisorOpt

func WithMetrics ¶

func WithMetrics(m Metrics) SupervisorOpt

WithMetrics makes the Supervisor export per-DN metrics into a given Metrics implementation. This can be called repeatedly to export the same data into multiple Metrics implementations.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL