Documentation
¶
Overview ¶
Package manager implements the Stream Manager — the failover engine. It monitors input health (bitrate, FPS, packet loss, timeout) and seamlessly switches to the best available input when the active one degrades or fails. Failover is handled entirely in Go — FFmpeg is never restarted for this purpose.
Concurrency model ¶
Each stream has its own [streamState] protected by state.mu. The global Service.mu is an RWMutex that guards only the streams map and is never held while performing I/O or while acquiring state.mu (consistent lock ordering: Service.mu → state.mu prevents deadlocks).
RecordPacket (hot path) does a global RLock to locate the state pointer, then a per-stream Lock to update LastPacketAt and Status. At typical packet rates (< 100/s per stream) the per-packet mutex overhead is negligible while being completely race-free.
Failover state machine ¶
StatusIdle → StatusActive (first packet arrives on the active input) StatusActive → StatusDegraded (timeout detected by monitor, or error from ingestor) StatusDegraded → StatusIdle (background probe succeeds; input is a failback candidate) StatusIdle → StatusActive (ingestor switches to this input after tryFailover)
Index ¶
- type InputHealth
- type InputHealthSnapshot
- type RuntimeStatus
- type Service
- func (s *Service) IsRegistered(streamID domain.StreamCode) bool
- func (s *Service) RecordPacket(streamID domain.StreamCode, inputPriority int)
- func (s *Service) Register(ctx context.Context, stream *domain.Stream, bufferWriteID domain.StreamCode) error
- func (s *Service) RegisteredStreams() []domain.StreamCode
- func (s *Service) ReportInputError(streamID domain.StreamCode, inputPriority int, err error)
- func (s *Service) RuntimeStatus(streamID domain.StreamCode) (RuntimeStatus, bool)
- func (s *Service) SetExhaustedCallback(fn func(domain.StreamCode))
- func (s *Service) SetRestoredCallback(fn func(domain.StreamCode))
- func (s *Service) SwitchInput(streamID domain.StreamCode, priority int) error
- func (s *Service) Unregister(streamID domain.StreamCode)
- func (s *Service) UpdateBufferWriteID(streamID domain.StreamCode, newBufID domain.StreamCode)
- func (s *Service) UpdateInputs(streamID domain.StreamCode, added, removed, updated []domain.Input)
- type SwitchEvent
- type SwitchReason
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type InputHealth ¶
type InputHealth struct {
Input domain.Input
LastPacketAt time.Time
Bitrate float64 // kbps (reserved for future bitrate estimator)
PacketLoss float64 // percent (reserved)
Status domain.StreamStatus
// Errors is a bounded rolling history (newest at index 0, max maxInputErrorHistory).
// Persists for the lifetime of the manager registration — cleared only when
// the stream pipeline is stopped/restarted (Unregister drops the whole state).
Errors []domain.ErrorEntry
}
InputHealth tracks the runtime health of one input source. All mutable fields are protected by the parent streamState.mu.
type InputHealthSnapshot ¶
type InputHealthSnapshot struct {
InputPriority int `json:"input_priority"`
LastPacketAt time.Time `json:"last_packet_at"`
BitrateKbps float64 `json:"bitrate_kbps"`
PacketLoss float64 `json:"packet_loss"`
Status domain.StreamStatus `json:"status"`
Errors []domain.ErrorEntry `json:"errors,omitempty"`
}
InputHealthSnapshot is a serialisable copy of one input's health. Errors is a bounded rolling history (newest first, max maxInputErrorHistory) of human-readable failure reasons (packet timeout, ingestor error, …). History persists for the lifetime of the manager registration and is only cleared when the stream pipeline is stopped. Frontend should treat each message as a diagnostic string, not a code.
type RuntimeStatus ¶
type RuntimeStatus struct {
Status domain.StreamStatus `json:"status"`
PipelineActive bool `json:"pipeline_active"`
ActiveInputPriority int `json:"active_input_priority"`
OverrideInputPriority *int `json:"override_input_priority,omitempty"`
Exhausted bool `json:"exhausted"`
Inputs []InputHealthSnapshot `json:"inputs"`
// Switches is the rolling history of active-input changes (newest at
// index 0, capped at maxSwitchHistory). Stream-level — switches happen
// BETWEEN inputs, so this lives next to Inputs rather than inside one.
Switches []SwitchEvent `json:"switches,omitempty"`
Transcoder *transcoder.RuntimeStatus `json:"transcoder,omitempty"`
Publisher *publisher.RuntimeStatus `json:"publisher,omitempty"`
}
RuntimeStatus is the single "runtime" envelope returned by the API for a stream. Sub-systems contribute their own sections — manager owns input health; transcoder owns per-profile state — but the API exposes one shape so clients have a single root for all live data.
Status and PipelineActive are populated by the API handler from the coordinator (not by manager itself). Transcoder and Publisher are also handler-populated and named after the subsystem they wrap, so frontend reads `runtime.transcoder.profiles[]` and `runtime.publisher.pushes[]` — no collision with the persisted `transcoder` / `push` config fields on domain.Stream.
Exhausted is true when every input has degraded and no failover candidate remains — the stream is effectively offline at the source.
type Service ¶
type Service struct {
// contains filtered or unexported fields
}
Service monitors all streams and orchestrates source failover.
func NewForTesting ¶ added in v0.0.23
func NewForTesting(bus events.Bus, ing ingestorDep, m *metrics.Metrics, packetTimeoutSec int) *Service
NewForTesting builds a Service from pre-constructed deps. Used by unit tests that stub the ingestor (no real pull workers needed) and skip the DI plumbing. packetTimeoutSec mirrors ManagerConfig.InputPacketTimeoutSec (defaults to 30s when ≤ 0).
func (*Service) IsRegistered ¶
func (s *Service) IsRegistered(streamID domain.StreamCode) bool
IsRegistered reports whether streamID is under manager supervision.
func (*Service) RecordPacket ¶
func (s *Service) RecordPacket(streamID domain.StreamCode, inputPriority int)
RecordPacket updates the last-seen timestamp for an input. Called by the ingestor per packet — must be fast and contention-free.
func (*Service) Register ¶
func (s *Service) Register(ctx context.Context, stream *domain.Stream, bufferWriteID domain.StreamCode) error
Register starts health monitoring for a stream and begins ingesting on the best input. bufferWriteID is the Buffer Hub slot for ingest writes; empty defaults to stream.Code.
func (*Service) RegisteredStreams ¶ added in v0.0.31
func (s *Service) RegisteredStreams() []domain.StreamCode
RegisteredStreams returns the codes of every stream currently under manager supervision (snapshot — caller is free to mutate the slice).
func (*Service) ReportInputError ¶
func (s *Service) ReportInputError(streamID domain.StreamCode, inputPriority int, err error)
ReportInputError marks an input degraded immediately and triggers failover. Called by the ingestor on non-retriable source errors.
func (*Service) RuntimeStatus ¶
func (s *Service) RuntimeStatus(streamID domain.StreamCode) (RuntimeStatus, bool)
RuntimeStatus returns a snapshot of runtime input health, or ok=false if not registered.
func (*Service) SetExhaustedCallback ¶
func (s *Service) SetExhaustedCallback(fn func(domain.StreamCode))
SetExhaustedCallback registers a function called when all inputs for a stream are degraded and no failover candidate is available.
func (*Service) SetRestoredCallback ¶
func (s *Service) SetRestoredCallback(fn func(domain.StreamCode))
SetRestoredCallback registers a function called when a failover succeeds after a period where all inputs were exhausted.
func (*Service) SwitchInput ¶
func (s *Service) SwitchInput(streamID domain.StreamCode, priority int) error
SwitchInput forces the stream to use the given input priority regardless of its configured priority value. The override persists until the input degrades permanently, at which point the manager reverts to normal priority-based selection. Switching to a different priority replaces any previous override.
func (*Service) Unregister ¶
func (s *Service) Unregister(streamID domain.StreamCode)
Unregister stops ingest and health monitoring for streamID.
func (*Service) UpdateBufferWriteID ¶
func (s *Service) UpdateBufferWriteID(streamID domain.StreamCode, newBufID domain.StreamCode)
UpdateBufferWriteID changes the buffer slot where the ingestor writes packets. Used when adding/removing the transcoder (buffer topology change). The active ingestor is restarted to write to the new buffer.
func (*Service) UpdateInputs ¶
func (s *Service) UpdateInputs( streamID domain.StreamCode, added, removed, updated []domain.Input, )
UpdateInputs patches the live input routing table while the monitor is running.
- removed: deleted from state.inputs; if the active input is removed, failover is triggered.
- added: inserted as StatusIdle; if a higher-priority input is added, failover is triggered.
- updated: Input field replaced; if the active input is updated, the ingestor is restarted.
type SwitchEvent ¶ added in v0.0.31
type SwitchEvent struct {
At time.Time `json:"at"`
From int `json:"from"`
To int `json:"to"`
Reason SwitchReason `json:"reason"`
// Detail is human-readable extra context (error message, timeout
// duration, …). Empty for reasons that have no extra context (manual,
// failback, input_added, input_removed).
Detail string `json:"detail,omitempty"`
}
SwitchEvent records one active-input switch for the rolling history. From = -1 is reserved for "no previous active" (only the very first activation in Register would qualify, but Register intentionally does not record one — history shows real switches only).
type SwitchReason ¶ added in v0.0.31
type SwitchReason string
SwitchReason names why the active input changed. The set is closed — every callsite of tryFailover must pick one so the rolling history is always interpretable in the UI.
const ( // SwitchReasonError — ingestor reported a non-recoverable error on // the active input (RTMP disconnect, HLS playlist 404, SRT broken …). SwitchReasonError SwitchReason = "error" // SwitchReasonTimeout — no packet from the active input for the full // packet-timeout window (default 30s). SwitchReasonTimeout SwitchReason = "timeout" // SwitchReasonManual — operator forced the switch via the // /streams/{code}/inputs/switch API. SwitchReasonManual SwitchReason = "manual" // SwitchReasonFailback — a higher-priority input recovered after // having been degraded; manager switched back to honour priority. SwitchReasonFailback SwitchReason = "failback" // SwitchReasonRecovery — every input had degraded (exhausted state) // and one probed clean; pipeline restarted on it. May land on the // same priority as before in the single-input case. SwitchReasonRecovery SwitchReason = "recovery" // SwitchReasonInputAdded — UpdateInputs added an input with a // priority lower (= higher precedence) than the current active one. SwitchReasonInputAdded SwitchReason = "input_added" // SwitchReasonInputRemoved — UpdateInputs removed the active input; // manager promoted the next-best candidate. SwitchReasonInputRemoved SwitchReason = "input_removed" )
SwitchReason values. Add a new constant before introducing a new trigger path so the UI legend and frontend filter dropdowns can stay in sync (currently maintained manually).