Documentation
¶
Overview ¶
Package gateway provides a provider-agnostic interface for voice gateways.
A Gateway handles full-duplex voice calls via telephony providers like Twilio or Telnyx. It manages HTTP webhooks, WebSocket media streams, and voice processing pipelines.
Pipeline Modes ¶
The gateway supports two pipeline modes:
Text Pipeline (PipelineModeText):
Phone → Twilio WS → [STT → LLM (text) → TTS] → Twilio WS → Phone Latency: ~500-1000ms (STT + LLM + TTS)
Realtime Pipeline (PipelineModeRealtime):
Phone → Twilio WS → [realtime.Provider] → Twilio WS → Phone Latency: ~100-200ms (native voice-to-voice)
Audio Format Conversion ¶
Telephony providers typically use mulaw 8kHz, while realtime providers use:
- OpenAI Realtime: PCM16 24kHz mono
- Gemini Live: PCM16 16kHz input, 24kHz output
The gateway handles format conversion automatically.
Implementations ¶
- github.com/plexusone/omni-twilio/omnivoice/gateway
- github.com/plexusone/omni-telnyx/omnivoice/gateway
- github.com/plexusone/omni-vonage/omnivoice/gateway
- github.com/plexusone/omni-plivo/omnivoice/gateway
- github.com/plexusone/omni-livekit/omnivoice/gateway
Index ¶
- Variables
- type AudioConverter
- type AudioFormat
- type BridgeConfig
- type CallHandler
- type CallInfo
- type Config
- type Event
- type EventType
- type Gateway
- type LLMProvider
- type Metrics
- type ParticipantHandler
- type ParticipantInfo
- type PipelineMode
- type ProviderName
- type RealtimeBridge
- func (b *RealtimeBridge) AudioOut() <-chan []byte
- func (b *RealtimeBridge) Close() error
- func (b *RealtimeBridge) Events() <-chan Event
- func (b *RealtimeBridge) Interrupt()
- func (b *RealtimeBridge) Metrics() Metrics
- func (b *RealtimeBridge) SendAudio(audio []byte) error
- func (b *RealtimeBridge) Start(ctx context.Context) error
- func (b *RealtimeBridge) Transcript() []Turn
- type RealtimeConfig
- type RealtimeProviderFactory
- type RoomInfo
- type Session
- type ToolCall
- type Turn
- type WebRTCConfig
- type WebRTCGateway
- type WebRTCSession
Constants ¶
This section is empty.
Variables ¶
var ( // AudioFormatTwilio is Twilio's native format (mulaw 8kHz mono). AudioFormatTwilio = format.Twilio // AudioFormatTelnyx is Telnyx's native format (mulaw 8kHz mono). AudioFormatTelnyx = format.Telnyx // AudioFormatOpenAI is OpenAI Realtime's format (PCM16 24kHz mono). AudioFormatOpenAI = format.OpenAI // AudioFormatGeminiInput is Gemini Live's input format (PCM16 16kHz mono). AudioFormatGeminiInput = format.GeminiInput // AudioFormatGeminiOutput is Gemini Live's output format (PCM16 24kHz mono). AudioFormatGeminiOutput = format.GeminiOutput )
Common audio formats - re-exported from format package for convenience.
Functions ¶
This section is empty.
Types ¶
type AudioConverter ¶ added in v0.12.0
type AudioConverter interface {
// Convert converts audio from the source format to the target format.
Convert(audio []byte, from, to format.AudioFormat) ([]byte, error)
}
AudioConverter converts audio between formats. Used to bridge telephony audio (mulaw 8kHz) with realtime providers (PCM16 16/24kHz).
type AudioFormat ¶ added in v0.12.0
type AudioFormat = format.AudioFormat
AudioFormat is an alias for format.AudioFormat. Deprecated: Use format.AudioFormat directly.
type BridgeConfig ¶ added in v0.12.0
type BridgeConfig struct {
// Provider is the realtime provider to use.
Provider realtime.Provider
// ProcessConfig is passed to the provider's ProcessAudioStream.
ProcessConfig realtime.ProcessConfig
// FromFormat is the telephony audio format (e.g., format.Twilio).
FromFormat format.AudioFormat
// ToFormat is the realtime provider's format (e.g., format.OpenAI).
ToFormat format.AudioFormat
// BufferSize is the size of audio channel buffers (default: 100).
BufferSize int
}
BridgeConfig configures a RealtimeBridge.
type CallHandler ¶
CallHandler is called when a new call is received. Return nil to accept the call, or an error to reject it.
type Config ¶
type Config struct {
// Server configuration
ListenAddr string // e.g., ":8080"
PublicURL string // e.g., "https://your-server.com"
// Pipeline mode selection
// Default: PipelineModeText if RealtimeProvider is nil
Mode PipelineMode
// Text pipeline configuration (used when Mode == PipelineModeText)
STTProvider string
STTAPIKey string
STTModel string
STTLanguage string
TTSProvider string
TTSAPIKey string
TTSVoiceID string
TTSModel string
LLMProvider string
LLMAPIKey string
LLMModel string
LLMSystemPrompt string
// Realtime pipeline configuration (used when Mode == PipelineModeRealtime)
// Provide either RealtimeProvider directly, or RealtimeConfig to create one.
RealtimeProvider realtime.Provider
RealtimeConfig *RealtimeConfig
// Session configuration
MaxSessionDuration time.Duration
InterruptionMode string // "immediate", "after_sentence", "disabled"
// Logging
Logger *slog.Logger
}
Config provides common configuration for voice gateways.
type Event ¶
type Event struct {
Type EventType `json:"type"`
Timestamp time.Time `json:"timestamp"`
Data any `json:"data,omitempty"`
Error error `json:"error,omitempty"`
}
Event represents a session event.
type EventType ¶
type EventType string
EventType identifies the type of session event.
const ( EventSessionStarted EventType = "session_started" EventSessionEnded EventType = "session_ended" EventUserSpeechStart EventType = "user_speech_start" EventUserSpeechEnd EventType = "user_speech_end" EventUserTranscript EventType = "user_transcript" EventAgentThinking EventType = "agent_thinking" EventAgentSpeechStart EventType = "agent_speech_start" EventAgentSpeechEnd EventType = "agent_speech_end" EventAgentTranscript EventType = "agent_transcript" EventToolCall EventType = "tool_call" EventInterruption EventType = "interruption" EventError EventType = "error" EventAudioReceived EventType = "audio_received" EventAudioSent EventType = "audio_sent" )
type Gateway ¶
type Gateway interface {
// Name returns the provider name.
Name() ProviderName
// Start starts the gateway server.
Start(ctx context.Context) error
// Stop gracefully shuts down the gateway.
Stop() error
// OnCall sets the handler for incoming calls.
OnCall(handler CallHandler)
// MakeCall initiates an outbound call.
MakeCall(ctx context.Context, to string) (Session, error)
// GetSession retrieves an active session by call ID.
GetSession(callID string) (Session, bool)
// ListSessions returns all active sessions.
ListSessions() []Session
}
Gateway defines the interface for voice gateway providers. Implementations handle provider-specific webhooks, media streams, and call control.
type LLMProvider ¶
type LLMProvider interface {
// Generate produces a response given user input and conversation history.
Generate(ctx context.Context, input string, history []Turn) (response string, toolCalls []ToolCall, err error)
}
LLMProvider defines the interface for LLM integration with voice gateways. Used in text pipeline mode (PipelineModeText).
type Metrics ¶
type Metrics struct {
SessionDurationMs int `json:"session_duration_ms"`
TurnCount int `json:"turn_count"`
UserSpeechDurationMs int `json:"user_speech_duration_ms"`
AgentSpeechDurationMs int `json:"agent_speech_duration_ms"`
AvgSTTLatencyMs int `json:"avg_stt_latency_ms"`
AvgLLMLatencyMs int `json:"avg_llm_latency_ms"`
AvgTTSLatencyMs int `json:"avg_tts_latency_ms"`
AvgTotalLatencyMs int `json:"avg_total_latency_ms"`
InterruptionCount int `json:"interruption_count"`
ToolCallCount int `json:"tool_call_count"`
ErrorCount int `json:"error_count"`
AudioBytesReceived int64 `json:"audio_bytes_received"`
AudioBytesSent int64 `json:"audio_bytes_sent"`
}
Metrics contains session performance metrics.
type ParticipantHandler ¶
type ParticipantHandler func(participant *ParticipantInfo) error
ParticipantHandler is called when a participant joins the room. Return nil to accept the participant, or an error to ignore them.
type ParticipantInfo ¶
type ParticipantInfo struct {
// Identity is the unique identifier for the participant.
Identity string
// DisplayName is the human-readable name.
DisplayName string
// RoomName is the room the participant joined.
RoomName string
// Metadata is optional JSON metadata attached to the participant.
Metadata string
// JoinedAt is when the participant joined.
JoinedAt time.Time
}
ParticipantInfo contains information about a participant.
type PipelineMode ¶ added in v0.12.0
type PipelineMode string
PipelineMode determines how audio is processed.
const ( // PipelineModeText uses the traditional STT → LLM → TTS pipeline. // Higher latency (~500-1000ms) but works with any LLM. PipelineModeText PipelineMode = "text" // PipelineModeRealtime uses native voice-to-voice via realtime.Provider. // Lower latency (~100-200ms) but requires OpenAI Realtime or Gemini Live. PipelineModeRealtime PipelineMode = "realtime" )
type ProviderName ¶
type ProviderName string
ProviderName identifies a voice gateway provider.
const ( // ProviderTwilio uses Twilio Media Streams. ProviderTwilio ProviderName = "twilio" // ProviderTelnyx uses Telnyx Media Streaming. ProviderTelnyx ProviderName = "telnyx" // ProviderVonage uses Vonage Voice WebSocket. ProviderVonage ProviderName = "vonage" // ProviderPlivo uses Plivo Audio Streaming. ProviderPlivo ProviderName = "plivo" // ProviderLiveKit uses LiveKit WebRTC. ProviderLiveKit ProviderName = "livekit" )
type RealtimeBridge ¶ added in v0.12.0
type RealtimeBridge struct {
// contains filtered or unexported fields
}
RealtimeBridge bridges telephony audio streams with a realtime.Provider. It handles audio format conversion and event routing.
func NewRealtimeBridge ¶ added in v0.12.0
func NewRealtimeBridge(cfg BridgeConfig) *RealtimeBridge
NewRealtimeBridge creates a new bridge between telephony and a realtime provider.
func NewRealtimeBridgeForTwilio ¶ added in v0.12.0
func NewRealtimeBridgeForTwilio(provider realtime.Provider, config realtime.ProcessConfig) *RealtimeBridge
NewRealtimeBridgeForTwilio creates a bridge configured for Twilio + OpenAI.
func NewRealtimeBridgeForTwilioGemini ¶ added in v0.12.0
func NewRealtimeBridgeForTwilioGemini(provider realtime.Provider, config realtime.ProcessConfig) *RealtimeBridge
NewRealtimeBridgeForTwilioGemini creates a bridge configured for Twilio + Gemini.
func (*RealtimeBridge) AudioOut ¶ added in v0.12.0
func (b *RealtimeBridge) AudioOut() <-chan []byte
AudioOut returns the channel for receiving telephony audio.
func (*RealtimeBridge) Close ¶ added in v0.12.0
func (b *RealtimeBridge) Close() error
Close stops the bridge and releases resources.
func (*RealtimeBridge) Events ¶ added in v0.12.0
func (b *RealtimeBridge) Events() <-chan Event
Events returns the channel for session events.
func (*RealtimeBridge) Interrupt ¶ added in v0.12.0
func (b *RealtimeBridge) Interrupt()
Interrupt cancels the current agent response.
func (*RealtimeBridge) Metrics ¶ added in v0.12.0
func (b *RealtimeBridge) Metrics() Metrics
Metrics returns the bridge performance metrics.
func (*RealtimeBridge) SendAudio ¶ added in v0.12.0
func (b *RealtimeBridge) SendAudio(audio []byte) error
SendAudio sends telephony audio to the bridge for processing.
func (*RealtimeBridge) Start ¶ added in v0.12.0
func (b *RealtimeBridge) Start(ctx context.Context) error
Start begins processing audio through the bridge. Returns channels for sending/receiving telephony audio.
func (*RealtimeBridge) Transcript ¶ added in v0.12.0
func (b *RealtimeBridge) Transcript() []Turn
Transcript returns the conversation transcript.
type RealtimeConfig ¶ added in v0.12.0
type RealtimeConfig struct {
// Provider is the realtime provider name ("openai" or "gemini").
Provider string `json:"provider"`
// APIKey is the API key for the realtime provider.
APIKey string `json:"api_key"`
// Model is the model to use (e.g., "gpt-4o-realtime-preview-2024-12-17").
Model string `json:"model,omitempty"`
// Voice is the voice for audio output (e.g., "alloy", "Puck").
Voice string `json:"voice,omitempty"`
// Instructions is the system prompt for the conversation.
Instructions string `json:"instructions,omitempty"`
// Functions are tools the model can call during conversation.
Functions []realtime.FunctionDeclaration `json:"functions,omitempty"`
// OnFunctionCall handles function calls from the model.
// If nil, function calls are ignored.
OnFunctionCall func(id, name, args string) (result any, err error) `json:"-"`
// Temperature controls response randomness (0-2).
Temperature float64 `json:"temperature,omitempty"`
}
RealtimeConfig configures a realtime provider for voice-to-voice.
func (*RealtimeConfig) ToProcessConfig ¶ added in v0.12.0
func (c *RealtimeConfig) ToProcessConfig() realtime.ProcessConfig
ToProcessConfig converts RealtimeConfig to realtime.ProcessConfig.
type RealtimeProviderFactory ¶ added in v0.12.0
type RealtimeProviderFactory interface {
// Create creates a realtime provider from the given configuration.
Create(config *RealtimeConfig) (realtime.Provider, error)
// Name returns the provider name (e.g., "openai", "gemini").
Name() string
}
RealtimeProviderFactory creates realtime providers from configuration. Implementations should be registered for each supported provider.
type RoomInfo ¶
type RoomInfo struct {
// Name is the room identifier.
Name string
// ParticipantCount is the number of participants in the room.
ParticipantCount int
// CreatedAt is when the room was created.
CreatedAt time.Time
// Metadata is optional JSON metadata attached to the room.
Metadata string
}
RoomInfo contains information about a WebRTC room.
type Session ¶
type Session interface {
// ID returns the session identifier.
ID() string
// From returns the caller phone number.
From() string
// To returns the called phone number.
To() string
// Direction returns "inbound" or "outbound".
Direction() string
// StartTime returns when the session started.
StartTime() time.Time
// Duration returns the session duration.
Duration() time.Duration
// Events returns a channel for session events.
Events() <-chan Event
// Transcript returns the conversation transcript.
Transcript() []Turn
// Metrics returns session performance metrics.
Metrics() Metrics
// SendText sends text input to the agent (bypasses STT).
SendText(text string) error
// Interrupt stops the current agent speech.
Interrupt()
// Close ends the session.
Close() error
}
Session represents an active voice conversation session.
type ToolCall ¶
type ToolCall struct {
Name string `json:"name"`
Arguments map[string]any `json:"arguments"`
Result string `json:"result"`
Error string `json:"error,omitempty"`
DurationMs int `json:"duration_ms"`
}
ToolCall represents a tool invocation during conversation.
type Turn ¶
type Turn struct {
Role string `json:"role"` // "user" or "agent"
Text string `json:"text"`
Timestamp time.Time `json:"timestamp"`
DurationMs int `json:"duration_ms"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
}
Turn represents a single conversation turn.
type WebRTCConfig ¶
type WebRTCConfig struct {
// Server URL (e.g., "wss://your-app.livekit.cloud")
ServerURL string
// API credentials
APIKey string
APISecret string
// Room configuration
RoomName string
AgentIdentity string
AgentName string
// Audio configuration
SampleRate int // 16000 or 24000 (default: 24000)
Channels int // 1 for mono (default: 1)
// Voice pipeline configuration
STTProvider string
STTAPIKey string
STTModel string
STTLanguage string
TTSProvider string
TTSAPIKey string
TTSVoiceID string
TTSModel string
LLMProvider string
LLMAPIKey string
LLMModel string
LLMSystemPrompt string
// Session configuration
MaxSessionDuration time.Duration
InterruptionMode string // "immediate", "after_sentence", "disabled"
}
WebRTCConfig provides common configuration for WebRTC gateways.
type WebRTCGateway ¶
type WebRTCGateway interface {
// Name returns the provider name.
Name() ProviderName
// Start connects to the room and starts handling participants.
Start(ctx context.Context) error
// Stop disconnects from the room and cleans up.
Stop() error
// OnParticipantJoined sets the handler for when participants join.
// Return nil to accept the participant, or an error to ignore them.
OnParticipantJoined(handler ParticipantHandler)
// JoinRoom joins a specific room. Some implementations may join
// automatically on Start() based on configuration.
JoinRoom(ctx context.Context, roomName string) error
// LeaveRoom leaves the current room.
LeaveRoom() error
// CurrentRoom returns the name of the currently joined room, or empty string.
CurrentRoom() string
// GetSession retrieves an active session by participant identity.
GetSession(participantID string) (WebRTCSession, bool)
// ListSessions returns all active sessions.
ListSessions() []WebRTCSession
// GenerateClientToken creates a JWT token for a client to join a room.
GenerateClientToken(roomName, identity, displayName string) (string, error)
}
WebRTCGateway defines the interface for WebRTC-based voice gateways. Unlike PSTN gateways that handle phone calls, WebRTC gateways handle browser and mobile app connections via rooms and participants.
Implementations:
- github.com/plexusone/omni-livekit/omnivoice/gateway
Key differences from Gateway (PSTN):
- No phone numbers; uses room names and participant identities
- No webhooks; direct WebRTC signaling
- Agent joins a room and waits for participants (vs answering calls)
- Lower latency (<200ms vs 500ms+ for PSTN)
type WebRTCSession ¶
type WebRTCSession interface {
// ID returns the session identifier (typically participant identity).
ID() string
// Participant returns information about the remote participant.
Participant() *ParticipantInfo
// RoomName returns the room this session is in.
RoomName() string
// AgentIdentity returns the identity of the AI agent.
AgentIdentity() string
// StartTime returns when the session started.
StartTime() time.Time
// Duration returns the session duration.
Duration() time.Duration
// Events returns a channel for session events.
// Uses the same Event type as PSTN sessions.
Events() <-chan Event
// Transcript returns the conversation transcript.
Transcript() []Turn
// Metrics returns session performance metrics.
Metrics() Metrics
// SendText sends text input to the agent (bypasses STT).
SendText(text string) error
// SendAudio sends PCM16 audio samples to the participant.
// This is the raw audio output method for WebRTC.
SendAudio(samples []int16) error
// Interrupt stops the current agent speech.
Interrupt()
// Close ends the session.
Close() error
}
WebRTCSession represents an active voice conversation with a WebRTC participant. This is similar to Session but with WebRTC-specific semantics.