Documentation
¶
Overview ¶
Package metrics is a tiny, dependency-free Prometheus exposition backend tailored to VoiceServer's needs. It provides:
- counters (monotonic, label-keyed)
- gauges (up/down, label-keyed)
- summary-style histograms with P50/P90/P95/P99 quantiles
We deliberately avoid pulling in prometheus/client_golang — it adds ~100 transitive deps for features (proto, gRPC, exemplars, …) we don't use. The text exposition format is small and stable.
Concurrency: every method on Registry is safe for concurrent use. Latency cost per observation: one atomic add (counters / gauges) or one RWLock + append (histograms) — both below 200 ns on modern x86, which is irrelevant compared to the network latencies we measure.
Index ¶
- Constants
- Variables
- func ASRError(transport string)
- func AsyncDroppedCount() uint64
- func BargeIn(transport string)
- func CallEnded(transport, status string)
- func CallStarted(transport string)
- func DialogReconnect(transport, outcome string)
- func Handler() http.Handler
- func LabelsCall(transport, status string) map[string]string
- func LabelsDialogOutcome(transport, outcome string) map[string]string
- func ObserveAsync(name, help string, v float64)
- func ObserveE2EFirstByte(ms int)
- func ObserveLLMFirstByte(ms int)
- func ObserveTTSFirstByte(ms int)
- func RegisterLabels(metric string, keys ...string)
- func TTSError(transport string)
- type Registry
- func (r *Registry) AddCounter(name, help string, labels map[string]string, n uint64)
- func (r *Registry) AddGauge(name, help string, labels map[string]string, v float64)
- func (r *Registry) IncCounter(name, help string, labels map[string]string)
- func (r *Registry) Observe(name, help string, v float64)
- func (r *Registry) ObserveN(name, help string, v float64, maxSamples int)
- func (r *Registry) SetGauge(name, help string, labels map[string]string, v float64)
- func (r *Registry) WritePromText(w io.Writer)
Constants ¶
const ( // Calls. MetricActiveCalls = "voiceserver_active_calls" MetricCallsTotal = "voiceserver_calls_total" // Recognizer / synthesizer errors. MetricASRErrors = "voiceserver_asr_errors_total" MetricTTSErrors = "voiceserver_tts_errors_total" // User-interrupts-AI events. MetricBargeInTotal = "voiceserver_barge_in_total" // Latencies (milliseconds). MetricE2EFirstByteMs = "voiceserver_e2e_first_byte_ms" MetricTTSFirstByteMs = "voiceserver_tts_first_byte_ms" MetricLLMFirstByteMs = "voiceserver_llm_first_byte_ms" // Dialog plane. MetricDialogReconnectTotal = "voiceserver_dialog_reconnect_total" )
Metric name constants. Kept in one place so dashboards can grep for a single source of truth. Names follow Prometheus convention: `<namespace>_<subsystem>_<name>_<unit>`.
const MetricObserveDroppedTotal = "voiceserver_metrics_observe_dropped_total"
MetricObserveDroppedTotal counts samples lost because the async Observe buffer was full. If this is non-zero in production the drain goroutine isn't keeping up — usually a downstream stall rather than a real load issue.
const MetricUnknownLabelTotal = "voiceserver_metrics_unknown_label_total"
MetricUnknownLabelTotal counts soft-whitelist violations. Visible via /metrics so on-call can spot "someone is shipping a metric the declared whitelist doesn't cover" without grepping logs.
Variables ¶
var ( LabelsTransportSIP = map[string]string{"transport": "sip"} LabelsTransportWebRTC = map[string]string{"transport": "webrtc"} )
LabelsTransportSIP / LabelsTransportWebRTC are the two transports we use today. The whitelist for any metric labelled by transport should be: RegisterLabels(metric, "transport").
var Default = NewRegistry()
Default is the process-wide registry. Use this for application-level metrics so a single /metrics handler serves everything.
Functions ¶
func ASRError ¶
func ASRError(transport string)
ASRError bumps the ASR error counter. Called from the recognizer error callback in the gateway client.
func AsyncDroppedCount ¶
func AsyncDroppedCount() uint64
AsyncDroppedCount returns the total samples dropped since process start. Exposed for tests and self-observability tooling.
func BargeIn ¶
func BargeIn(transport string)
BargeIn counts how often the VAD interrupted the AI's TTS because the user started talking. Good predictor of conversation health — a high rate usually means the AI is too verbose or VAD is too twitchy.
func CallEnded ¶
func CallEnded(transport, status string)
CallEnded mirrors CallStarted. status is a short classification like "ok", "dialog-hangup", "ice-failed", "pipeline-error" — use the same vocabulary you use in call_events.kind so dashboards line up.
func CallStarted ¶
func CallStarted(transport string)
CallStarted increments the active-calls gauge and the calls_total counter for the given transport. Call at the moment the session becomes "live" (ASR/TTS wired + dialog plane connected).
func DialogReconnect ¶
func DialogReconnect(transport, outcome string)
DialogReconnect counts reconnect attempts to the dialog plane regardless of outcome. A growing counter means the dialog app is flaky; pair with the ok/fail counters for success rate.
func Handler ¶
Handler returns an http.Handler that writes the Default registry in Prometheus text exposition format. Mount at /metrics — no auth by default; add middleware if the listener is internet-exposed.
func LabelsCall ¶
LabelsCall composes a 2-key label set for the common (transport, status) shape used by voiceserver_calls_total. We pre-build the known combinations rather than allocating per-call. Add more statuses here if dashboards need to slice on them.
Return type is map[string]string to fit the existing API; pointer identity is preserved across calls so map-key dedupe inside the registry stays cheap.
func LabelsDialogOutcome ¶
LabelsDialogOutcome is used by DialogReconnect — bounded set of outcomes per the original API contract.
func ObserveAsync ¶
ObserveAsync queues a histogram sample on the global async drain. Hot-path safe: non-blocking, zero allocation, drops on full (incrementing the dropped-samples counter).
This is the recommended call for any observation that fires more than ~10x/sec per process. For one-off latencies (per turn, per call) the synchronous Default.Observe is fine and slightly more accurate (no buffering reorder concerns).
func ObserveE2EFirstByte ¶
func ObserveE2EFirstByte(ms int)
ObserveE2EFirstByte records the user-perceived latency from ASR final to first audible AI byte. Only meaningful values (>0) should be passed — 0 means "no ASR final preceded this turn" which shouldn't skew the distribution.
func ObserveLLMFirstByte ¶
func ObserveLLMFirstByte(ms int)
ObserveLLMFirstByte records the dialog app's reported time to first LLM token (ms). Comes from CommandMeta.LLMFirstMs on tts.speak.
func ObserveTTSFirstByte ¶
func ObserveTTSFirstByte(ms int)
ObserveTTSFirstByte records Speak -> first PCM frame latency (ms). Measures the TTS engine's cold-start / TTFB across all turns.
func RegisterLabels ¶
RegisterLabels declares the allowed label keys for a metric. Subsequent updates with extra keys will have those keys dropped (soft defense). Calling RegisterLabels twice for the same metric REPLACES the whitelist (last write wins) — intended for tests.
Safe to call from init().
Types ¶
type Registry ¶
type Registry struct {
// contains filtered or unexported fields
}
Registry is the single source of truth for VoiceServer process-level metrics. A call-site imports the package, mutates the Default registry via helpers like IncCounter(), and a single HTTP handler serialises the registry to Prometheus text format on /metrics scrape.
func NewRegistry ¶
func NewRegistry() *Registry
NewRegistry returns an empty, ready-to-use registry.
func (*Registry) AddCounter ¶
AddCounter adds `n` to the counter. n must be >= 0 (Prometheus counters are monotonic); negative values are silently ignored so a buggy call site doesn't corrupt the series.
Labels are filtered through the cardinality whitelist registered via RegisterLabels (see labels.go). Unknown keys are dropped and reported via metrics_unknown_label_total.
func (*Registry) AddGauge ¶
AddGauge increments (v > 0) / decrements (v < 0) a gauge atomically. Labels run through the cardinality whitelist (see labels.go).
func (*Registry) IncCounter ¶
IncCounter bumps a counter by 1. Safe to call from hot paths.
func (*Registry) Observe ¶
Observe records one sample into a histogram. The registry keeps at most `maxSamples` most recent observations to bound memory; older values are dropped in FIFO order. Quantiles are computed at scrape time from the live buffer, so a /metrics request is O(n log n) in buffer size — perfectly fine for n up to a few thousand.
func (*Registry) ObserveN ¶
ObserveN is Observe with a custom buffer cap. Use when you want finer control over memory vs resolution (e.g. 8192 for a hot call latency signal you scrape every 10s).
func (*Registry) SetGauge ¶
SetGauge stores a value for a gauge. Labels run through the cardinality whitelist (see labels.go).
func (*Registry) WritePromText ¶
WritePromText serialises the registry in Prometheus text exposition format (v0.0.4). Safe to call concurrently with metric updates; snapshot is point-in-time per metric.