Documentation
¶
Overview ¶
Package voice provides transport-agnostic voice capabilities for AI calls.
It is deliberately independent of WebSocket, WebRTC, SIP, or any other media transport. Transports feed PCM (or encoded audio decoded upstream) into a voice session and receive synthesized audio back through callbacks.
Layering:
protocol/voice/dialog session orchestration + Event/Command contract protocol/voice/gateway dialog-plane WebSocket client protocol/voice/webrtc browser WebRTC (HTTP SDP + SRTP/Opus) protocol/voice/xiaozhi xiaozhi-esp32 / browser WebSocket (pipeline + realtime) protocol/voice/transport per-call SessionFactory wiring protocol/voice/asr uplink pipeline components protocol/voice/tts downlink synthesis + cache
Uplink order matters: VAD runs on raw microphone PCM before echo suppression so barge-in still works while the recognizer feed is silenced. An optional Denoiser (RNNoise, WebRTC AEC3, hardware AEC) may run after decode and before VAD. PlaybackGate tracks streaming, queued TTS, and a post-playback tail for room-echo suppression when true AEC is unavailable.
Conversation logic (LLM, tools, business rules) lives outside this package. The dialog subpackage defines the event/command contract between the voice plane and an external Dialog application.
Typical integration:
sess, _ := dialog.NewSession(ctx, dialog.Config{
CallID: "call-1",
Engine: recognizerEngine,
TTSService: tts.FromSynthesisEngine(synth),
OnAudioOut: transport.SendDownlink,
OnEvent: dialogApp.HandleEvent,
})
sess.Start(ctx)
transport.OnUplink(func(pcm []byte) { sess.ProcessAudio(ctx, pcm) })
dialogApp.OnCommand(sess.HandleCommand)
Directories
¶
| Path | Synopsis |
|---|---|
|
Package webrtc terminates 1v1 WebRTC AI voice calls over HTTP SDP signaling.
|
Package webrtc terminates 1v1 WebRTC AI voice calls over HTTP SDP signaling. |
|
Package xiaozhi implements the xiaozhi-esp32 WebSocket voice protocol.
|
Package xiaozhi implements the xiaozhi-esp32 WebSocket voice protocol. |