gateway

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package gateway provides a provider-agnostic interface for voice gateways.

A Gateway handles full-duplex voice calls via telephony providers like Twilio or Telnyx. It manages HTTP webhooks, WebSocket media streams, and voice processing pipelines.

Pipeline Modes

The gateway supports two pipeline modes:

Text Pipeline (PipelineModeText):

Phone → Twilio WS → [STT → LLM (text) → TTS] → Twilio WS → Phone
Latency: ~500-1000ms (STT + LLM + TTS)

Realtime Pipeline (PipelineModeRealtime):

Phone → Twilio WS → [realtime.Provider] → Twilio WS → Phone
Latency: ~100-200ms (native voice-to-voice)

Audio Format Conversion

Telephony providers typically use mulaw 8kHz, while realtime providers use:

  • OpenAI Realtime: PCM16 24kHz mono
  • Gemini Live: PCM16 16kHz input, 24kHz output

The gateway handles format conversion automatically.

Implementations

  • github.com/plexusone/omni-twilio/omnivoice/gateway
  • github.com/plexusone/omni-telnyx/omnivoice/gateway
  • github.com/plexusone/omni-vonage/omnivoice/gateway
  • github.com/plexusone/omni-plivo/omnivoice/gateway
  • github.com/plexusone/omni-livekit/omnivoice/gateway

Index

Constants

This section is empty.

Variables

View Source
var (
	// AudioFormatTwilio is Twilio's native format (mulaw 8kHz mono).
	AudioFormatTwilio = format.Twilio

	// AudioFormatTelnyx is Telnyx's native format (mulaw 8kHz mono).
	AudioFormatTelnyx = format.Telnyx

	// AudioFormatOpenAI is OpenAI Realtime's format (PCM16 24kHz mono).
	AudioFormatOpenAI = format.OpenAI

	// AudioFormatGeminiInput is Gemini Live's input format (PCM16 16kHz mono).
	AudioFormatGeminiInput = format.GeminiInput

	// AudioFormatGeminiOutput is Gemini Live's output format (PCM16 24kHz mono).
	AudioFormatGeminiOutput = format.GeminiOutput
)

Common audio formats - re-exported from format package for convenience.

Functions

This section is empty.

Types

type AudioConverter added in v0.12.0

type AudioConverter interface {
	// Convert converts audio from the source format to the target format.
	Convert(audio []byte, from, to format.AudioFormat) ([]byte, error)
}

AudioConverter converts audio between formats. Used to bridge telephony audio (mulaw 8kHz) with realtime providers (PCM16 16/24kHz).

type AudioFormat added in v0.12.0

type AudioFormat = format.AudioFormat

AudioFormat is an alias for format.AudioFormat. Deprecated: Use format.AudioFormat directly.

type BridgeConfig added in v0.12.0

type BridgeConfig struct {
	// Provider is the realtime provider to use.
	Provider realtime.Provider

	// ProcessConfig is passed to the provider's ProcessAudioStream.
	ProcessConfig realtime.ProcessConfig

	// FromFormat is the telephony audio format (e.g., format.Twilio).
	FromFormat format.AudioFormat

	// ToFormat is the realtime provider's format (e.g., format.OpenAI).
	ToFormat format.AudioFormat

	// BufferSize is the size of audio channel buffers (default: 100).
	BufferSize int
}

BridgeConfig configures a RealtimeBridge.

type CallHandler

type CallHandler func(call *CallInfo) error

CallHandler is called when a new call is received. Return nil to accept the call, or an error to reject it.

type CallInfo

type CallInfo struct {
	CallID    string
	From      string
	To        string
	Direction string
	StartTime time.Time
}

CallInfo contains information about a call.

type Config

type Config struct {
	// Server configuration
	ListenAddr string // e.g., ":8080"
	PublicURL  string // e.g., "https://your-server.com"

	// Pipeline mode selection
	// Default: PipelineModeText if RealtimeProvider is nil
	Mode PipelineMode

	// Text pipeline configuration (used when Mode == PipelineModeText)
	STTProvider string
	STTAPIKey   string
	STTModel    string
	STTLanguage string

	TTSProvider string
	TTSAPIKey   string
	TTSVoiceID  string
	TTSModel    string

	LLMProvider     string
	LLMAPIKey       string
	LLMModel        string
	LLMSystemPrompt string

	// Realtime pipeline configuration (used when Mode == PipelineModeRealtime)
	// Provide either RealtimeProvider directly, or RealtimeConfig to create one.
	RealtimeProvider realtime.Provider
	RealtimeConfig   *RealtimeConfig

	// Session configuration
	MaxSessionDuration time.Duration
	InterruptionMode   string // "immediate", "after_sentence", "disabled"

	// Logging
	Logger *slog.Logger
}

Config provides common configuration for voice gateways.

type Event

type Event struct {
	Type      EventType `json:"type"`
	Timestamp time.Time `json:"timestamp"`
	Data      any       `json:"data,omitempty"`
	Error     error     `json:"error,omitempty"`
}

Event represents a session event.

type EventType

type EventType string

EventType identifies the type of session event.

const (
	EventSessionStarted   EventType = "session_started"
	EventSessionEnded     EventType = "session_ended"
	EventUserSpeechStart  EventType = "user_speech_start"
	EventUserSpeechEnd    EventType = "user_speech_end"
	EventUserTranscript   EventType = "user_transcript"
	EventAgentThinking    EventType = "agent_thinking"
	EventAgentSpeechStart EventType = "agent_speech_start"
	EventAgentSpeechEnd   EventType = "agent_speech_end"
	EventAgentTranscript  EventType = "agent_transcript"
	EventToolCall         EventType = "tool_call"
	EventInterruption     EventType = "interruption"
	EventError            EventType = "error"
	EventAudioReceived    EventType = "audio_received"
	EventAudioSent        EventType = "audio_sent"
)

type Gateway

type Gateway interface {
	// Name returns the provider name.
	Name() ProviderName

	// Start starts the gateway server.
	Start(ctx context.Context) error

	// Stop gracefully shuts down the gateway.
	Stop() error

	// OnCall sets the handler for incoming calls.
	OnCall(handler CallHandler)

	// MakeCall initiates an outbound call.
	MakeCall(ctx context.Context, to string) (Session, error)

	// GetSession retrieves an active session by call ID.
	GetSession(callID string) (Session, bool)

	// ListSessions returns all active sessions.
	ListSessions() []Session
}

Gateway defines the interface for voice gateway providers. Implementations handle provider-specific webhooks, media streams, and call control.

type LLMProvider

type LLMProvider interface {
	// Generate produces a response given user input and conversation history.
	Generate(ctx context.Context, input string, history []Turn) (response string, toolCalls []ToolCall, err error)
}

LLMProvider defines the interface for LLM integration with voice gateways. Used in text pipeline mode (PipelineModeText).

type Metrics

type Metrics struct {
	SessionDurationMs     int   `json:"session_duration_ms"`
	TurnCount             int   `json:"turn_count"`
	UserSpeechDurationMs  int   `json:"user_speech_duration_ms"`
	AgentSpeechDurationMs int   `json:"agent_speech_duration_ms"`
	AvgSTTLatencyMs       int   `json:"avg_stt_latency_ms"`
	AvgLLMLatencyMs       int   `json:"avg_llm_latency_ms"`
	AvgTTSLatencyMs       int   `json:"avg_tts_latency_ms"`
	AvgTotalLatencyMs     int   `json:"avg_total_latency_ms"`
	InterruptionCount     int   `json:"interruption_count"`
	ToolCallCount         int   `json:"tool_call_count"`
	ErrorCount            int   `json:"error_count"`
	AudioBytesReceived    int64 `json:"audio_bytes_received"`
	AudioBytesSent        int64 `json:"audio_bytes_sent"`
}

Metrics contains session performance metrics.

type ParticipantHandler

type ParticipantHandler func(participant *ParticipantInfo) error

ParticipantHandler is called when a participant joins the room. Return nil to accept the participant, or an error to ignore them.

type ParticipantInfo

type ParticipantInfo struct {
	// Identity is the unique identifier for the participant.
	Identity string

	// DisplayName is the human-readable name.
	DisplayName string

	// RoomName is the room the participant joined.
	RoomName string

	// Metadata is optional JSON metadata attached to the participant.
	Metadata string

	// JoinedAt is when the participant joined.
	JoinedAt time.Time
}

ParticipantInfo contains information about a participant.

type PipelineMode added in v0.12.0

type PipelineMode string

PipelineMode determines how audio is processed.

const (
	// PipelineModeText uses the traditional STT → LLM → TTS pipeline.
	// Higher latency (~500-1000ms) but works with any LLM.
	PipelineModeText PipelineMode = "text"

	// PipelineModeRealtime uses native voice-to-voice via realtime.Provider.
	// Lower latency (~100-200ms) but requires OpenAI Realtime or Gemini Live.
	PipelineModeRealtime PipelineMode = "realtime"
)

type ProviderName

type ProviderName string

ProviderName identifies a voice gateway provider.

const (
	// ProviderTwilio uses Twilio Media Streams.
	ProviderTwilio ProviderName = "twilio"

	// ProviderTelnyx uses Telnyx Media Streaming.
	ProviderTelnyx ProviderName = "telnyx"

	// ProviderVonage uses Vonage Voice WebSocket.
	ProviderVonage ProviderName = "vonage"

	// ProviderPlivo uses Plivo Audio Streaming.
	ProviderPlivo ProviderName = "plivo"

	// ProviderLiveKit uses LiveKit WebRTC.
	ProviderLiveKit ProviderName = "livekit"
)

type RealtimeBridge added in v0.12.0

type RealtimeBridge struct {
	// contains filtered or unexported fields
}

RealtimeBridge bridges telephony audio streams with a realtime.Provider. It handles audio format conversion and event routing.

func NewRealtimeBridge added in v0.12.0

func NewRealtimeBridge(cfg BridgeConfig) *RealtimeBridge

NewRealtimeBridge creates a new bridge between telephony and a realtime provider.

func NewRealtimeBridgeForTwilio added in v0.12.0

func NewRealtimeBridgeForTwilio(provider realtime.Provider, config realtime.ProcessConfig) *RealtimeBridge

NewRealtimeBridgeForTwilio creates a bridge configured for Twilio + OpenAI.

func NewRealtimeBridgeForTwilioGemini added in v0.12.0

func NewRealtimeBridgeForTwilioGemini(provider realtime.Provider, config realtime.ProcessConfig) *RealtimeBridge

NewRealtimeBridgeForTwilioGemini creates a bridge configured for Twilio + Gemini.

func (*RealtimeBridge) AudioOut added in v0.12.0

func (b *RealtimeBridge) AudioOut() <-chan []byte

AudioOut returns the channel for receiving telephony audio.

func (*RealtimeBridge) Close added in v0.12.0

func (b *RealtimeBridge) Close() error

Close stops the bridge and releases resources.

func (*RealtimeBridge) Events added in v0.12.0

func (b *RealtimeBridge) Events() <-chan Event

Events returns the channel for session events.

func (*RealtimeBridge) Interrupt added in v0.12.0

func (b *RealtimeBridge) Interrupt()

Interrupt cancels the current agent response.

func (*RealtimeBridge) Metrics added in v0.12.0

func (b *RealtimeBridge) Metrics() Metrics

Metrics returns the bridge performance metrics.

func (*RealtimeBridge) SendAudio added in v0.12.0

func (b *RealtimeBridge) SendAudio(audio []byte) error

SendAudio sends telephony audio to the bridge for processing.

func (*RealtimeBridge) Start added in v0.12.0

func (b *RealtimeBridge) Start(ctx context.Context) error

Start begins processing audio through the bridge. Returns channels for sending/receiving telephony audio.

func (*RealtimeBridge) Transcript added in v0.12.0

func (b *RealtimeBridge) Transcript() []Turn

Transcript returns the conversation transcript.

type RealtimeConfig added in v0.12.0

type RealtimeConfig struct {
	// Provider is the realtime provider name ("openai" or "gemini").
	Provider string `json:"provider"`

	// APIKey is the API key for the realtime provider.
	APIKey string `json:"api_key"`

	// Model is the model to use (e.g., "gpt-4o-realtime-preview-2024-12-17").
	Model string `json:"model,omitempty"`

	// Voice is the voice for audio output (e.g., "alloy", "Puck").
	Voice string `json:"voice,omitempty"`

	// Instructions is the system prompt for the conversation.
	Instructions string `json:"instructions,omitempty"`

	// Functions are tools the model can call during conversation.
	Functions []realtime.FunctionDeclaration `json:"functions,omitempty"`

	// OnFunctionCall handles function calls from the model.
	// If nil, function calls are ignored.
	OnFunctionCall func(id, name, args string) (result any, err error) `json:"-"`

	// Temperature controls response randomness (0-2).
	Temperature float64 `json:"temperature,omitempty"`
}

RealtimeConfig configures a realtime provider for voice-to-voice.

func (*RealtimeConfig) ToProcessConfig added in v0.12.0

func (c *RealtimeConfig) ToProcessConfig() realtime.ProcessConfig

ToProcessConfig converts RealtimeConfig to realtime.ProcessConfig.

type RealtimeProviderFactory added in v0.12.0

type RealtimeProviderFactory interface {
	// Create creates a realtime provider from the given configuration.
	Create(config *RealtimeConfig) (realtime.Provider, error)

	// Name returns the provider name (e.g., "openai", "gemini").
	Name() string
}

RealtimeProviderFactory creates realtime providers from configuration. Implementations should be registered for each supported provider.

type RoomInfo

type RoomInfo struct {
	// Name is the room identifier.
	Name string

	// ParticipantCount is the number of participants in the room.
	ParticipantCount int

	// CreatedAt is when the room was created.
	CreatedAt time.Time

	// Metadata is optional JSON metadata attached to the room.
	Metadata string
}

RoomInfo contains information about a WebRTC room.

type Session

type Session interface {
	// ID returns the session identifier.
	ID() string

	// From returns the caller phone number.
	From() string

	// To returns the called phone number.
	To() string

	// Direction returns "inbound" or "outbound".
	Direction() string

	// StartTime returns when the session started.
	StartTime() time.Time

	// Duration returns the session duration.
	Duration() time.Duration

	// Events returns a channel for session events.
	Events() <-chan Event

	// Transcript returns the conversation transcript.
	Transcript() []Turn

	// Metrics returns session performance metrics.
	Metrics() Metrics

	// SendText sends text input to the agent (bypasses STT).
	SendText(text string) error

	// Interrupt stops the current agent speech.
	Interrupt()

	// Close ends the session.
	Close() error
}

Session represents an active voice conversation session.

type ToolCall

type ToolCall struct {
	Name       string         `json:"name"`
	Arguments  map[string]any `json:"arguments"`
	Result     string         `json:"result"`
	Error      string         `json:"error,omitempty"`
	DurationMs int            `json:"duration_ms"`
}

ToolCall represents a tool invocation during conversation.

type Turn

type Turn struct {
	Role       string     `json:"role"` // "user" or "agent"
	Text       string     `json:"text"`
	Timestamp  time.Time  `json:"timestamp"`
	DurationMs int        `json:"duration_ms"`
	ToolCalls  []ToolCall `json:"tool_calls,omitempty"`
}

Turn represents a single conversation turn.

type WebRTCConfig

type WebRTCConfig struct {
	// Server URL (e.g., "wss://your-app.livekit.cloud")
	ServerURL string

	// API credentials
	APIKey    string
	APISecret string

	// Room configuration
	RoomName      string
	AgentIdentity string
	AgentName     string

	// Audio configuration
	SampleRate int // 16000 or 24000 (default: 24000)
	Channels   int // 1 for mono (default: 1)

	// Voice pipeline configuration
	STTProvider string
	STTAPIKey   string
	STTModel    string
	STTLanguage string

	TTSProvider string
	TTSAPIKey   string
	TTSVoiceID  string
	TTSModel    string

	LLMProvider     string
	LLMAPIKey       string
	LLMModel        string
	LLMSystemPrompt string

	// Session configuration
	MaxSessionDuration time.Duration
	InterruptionMode   string // "immediate", "after_sentence", "disabled"
}

WebRTCConfig provides common configuration for WebRTC gateways.

type WebRTCGateway

type WebRTCGateway interface {
	// Name returns the provider name.
	Name() ProviderName

	// Start connects to the room and starts handling participants.
	Start(ctx context.Context) error

	// Stop disconnects from the room and cleans up.
	Stop() error

	// OnParticipantJoined sets the handler for when participants join.
	// Return nil to accept the participant, or an error to ignore them.
	OnParticipantJoined(handler ParticipantHandler)

	// JoinRoom joins a specific room. Some implementations may join
	// automatically on Start() based on configuration.
	JoinRoom(ctx context.Context, roomName string) error

	// LeaveRoom leaves the current room.
	LeaveRoom() error

	// CurrentRoom returns the name of the currently joined room, or empty string.
	CurrentRoom() string

	// GetSession retrieves an active session by participant identity.
	GetSession(participantID string) (WebRTCSession, bool)

	// ListSessions returns all active sessions.
	ListSessions() []WebRTCSession

	// GenerateClientToken creates a JWT token for a client to join a room.
	GenerateClientToken(roomName, identity, displayName string) (string, error)
}

WebRTCGateway defines the interface for WebRTC-based voice gateways. Unlike PSTN gateways that handle phone calls, WebRTC gateways handle browser and mobile app connections via rooms and participants.

Implementations:

  • github.com/plexusone/omni-livekit/omnivoice/gateway

Key differences from Gateway (PSTN):

  • No phone numbers; uses room names and participant identities
  • No webhooks; direct WebRTC signaling
  • Agent joins a room and waits for participants (vs answering calls)
  • Lower latency (<200ms vs 500ms+ for PSTN)

type WebRTCSession

type WebRTCSession interface {
	// ID returns the session identifier (typically participant identity).
	ID() string

	// Participant returns information about the remote participant.
	Participant() *ParticipantInfo

	// RoomName returns the room this session is in.
	RoomName() string

	// AgentIdentity returns the identity of the AI agent.
	AgentIdentity() string

	// StartTime returns when the session started.
	StartTime() time.Time

	// Duration returns the session duration.
	Duration() time.Duration

	// Events returns a channel for session events.
	// Uses the same Event type as PSTN sessions.
	Events() <-chan Event

	// Transcript returns the conversation transcript.
	Transcript() []Turn

	// Metrics returns session performance metrics.
	Metrics() Metrics

	// SendText sends text input to the agent (bypasses STT).
	SendText(text string) error

	// SendAudio sends PCM16 audio samples to the participant.
	// This is the raw audio output method for WebRTC.
	SendAudio(samples []int16) error

	// Interrupt stops the current agent speech.
	Interrupt()

	// Close ends the session.
	Close() error
}

WebRTCSession represents an active voice conversation with a WebRTC participant. This is similar to Session but with WebRTC-specific semantics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL