omnivoice

package module

v0.14.0 Latest Latest Go to latest Published: Jun 15, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/plexusone/omnivoice-core

Links

Open Source Insights

README ¶

OmniVoice

Voice abstraction layer for AgentPlexus supporting TTS, STT, and Voice Agents across multiple providers and transport protocols.

Voice Architecture: Traditional vs Native

OmniVoice supports two fundamentally different approaches for real-time voice:

Traditional Pipeline (STT → LLM → TTS)

Audio In → [STT Provider] → Text → [LLM] → Text → [TTS Provider] → Audio Out

Latency: 500-1500ms (sum of STT + LLM + TTS)
Flexibility: Mix and match any STT, LLM, and TTS providers
Use case: Custom voice selection, specialized STT for domain-specific terminology

Native Voice-to-Voice

Audio In → [OpenAI Realtime / Gemini Live] → Audio Out

Latency: 100-200ms (model handles audio directly)
Simplicity: Single API, no separate STT/TTS configuration
Use case: Low-latency conversations, natural barge-in handling

Aspect	Traditional	Native Voice-to-Voice
Latency	500-1500ms	100-200ms
STT/TTS Config	Required	Built-in
Core Interface	`stt.Provider`, `tts.Provider`	`realtime.Provider`
Provider Packages	`tts/`, `stt/`	`omni-openai/omnivoice/realtime`, `omni-google/omnivoice/realtime`
Gateway Bridge	Pipeline-based	`RealtimeBridge` in `gateway/`
Barge-in	Via `bargein/` package	Native support

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              OmniVoice                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────────────┐  │
│  │     TTS     │    │     STT     │    │          Voice Agent            │  │
│  │             │    │             │    │                                 │  │
│  │ Text → Audio│    │ Audio → Text│    │  Real-time bidirectional voice  │  │
│  └──────┬──────┘    └──────┬──────┘    └───────────────┬─────────────────┘  │
│         │                  │                           │                    │
│         ▼                  ▼                           ▼                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         Provider Layer                              │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤    │
│  │ ElevenLabs  │  Deepgram   │ Google Cloud│    AWS      │   Azure     │    │
│  │ Cartesia    │  Whisper    │ AssemblyAI  │   Polly     │   Speech    │    │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         Transport Layer                             │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤    │
│  │   WebRTC    │     SIP     │    PSTN     │  WebSocket  │    HTTP     │    │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                      Call System Integration                        │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤    │
│  │   Twilio    │   Telnyx    │   Vonage    │    Plivo    │   LiveKit   │    │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Package Structure

omnivoice/
├── tts/                    # Text-to-Speech
│   ├── tts.go              # Interface definitions
│   ├── elevenlabs/         # ElevenLabs provider
│   ├── polly/              # AWS Polly provider
│   ├── google/             # Google Cloud TTS
│   ├── azure/              # Azure Speech
│   └── cartesia/           # Cartesia provider
│
├── stt/                    # Speech-to-Text
│   ├── stt.go              # Interface definitions
│   ├── transcript.go       # Canonical Transcript format
│   ├── whisper/            # OpenAI Whisper
│   ├── deepgram/           # Deepgram provider
│   ├── google/             # Google Speech-to-Text
│   ├── azure/              # Azure Speech
│   └── assemblyai/         # AssemblyAI provider
│
├── schema/                 # Embedded JSON Schemas
│   ├── schema.go           # //go:embed directives
│   └── transcript-v1.schema.json  # Transcript format schema
│
├── agent/                  # Voice Agent orchestration
│   ├── agent.go            # Interface definitions
│   ├── session.go          # Conversation session management
│   ├── elevenlabs/         # ElevenLabs Agents
│   ├── vapi/               # Vapi.ai
│   ├── retell/             # Retell AI
│   └── custom/             # Custom agent (STT + LLM + TTS)
│
├── transport/              # Audio transport protocols
│   ├── transport.go        # Interface definitions
│   ├── webrtc/             # WebRTC transport
│   ├── websocket/          # WebSocket streaming
│   ├── sip/                # SIP protocol
│   └── http/               # HTTP-based (batch)
│
├── callsystem/             # Call system integrations
│   ├── callsystem.go       # Interface definitions
│   ├── client.go           # Multi-provider client with failover
│   ├── sms.go              # SMSProvider interface
│   ├── twilio/             # Twilio ConversationRelay
│   ├── ringcentral/        # RingCentral Voice API
│   ├── zoom/               # Zoom SDK integration
│   ├── livekit/            # LiveKit rooms
│   └── daily/              # Daily.co
│
├── observability/          # Voice instrumentation
│   ├── events.go           # VoiceEvent, VoiceObserver
│   └── hooks.go            # TTSHook, STTHook interfaces
│
├── resilience/             # Error handling and retry logic
│   ├── category.go         # Error categories (transient, rate_limit, auth, etc.)
│   ├── error.go            # ProviderError with classification metadata
│   ├── classifier.go       # ErrorClassifier interface
│   ├── retry.go            # Retry and RetryWithResult functions
│   └── backoff.go          # Backoff strategies (exponential, linear, etc.)
│
├── storage/                # Session state persistence
│   ├── store.go            # SessionStore interface
│   ├── types.go            # SessionState, Turn, Metrics types
│   ├── memory.go           # In-memory implementation
│   └── redis.go            # Redis implementation
│
├── bargein/                # Barge-in detection
│   ├── config.go           # InterruptionMode (immediate, after_sentence, disabled)
│   └── detector.go         # BargeInDetector with TTS/STT integration
│
├── realtime/               # Native voice-to-voice
│   ├── provider.go         # Provider interface for OpenAI Realtime / Gemini Live
│   ├── client.go           # Multi-provider client with fallback
│   └── errors.go           # Common realtime errors
│
├── audio/                  # Audio processing
│   ├── format/             # Audio format definitions
│   │   └── format.go       # Encoding type with normalization, provider format constants
│   ├── converter/          # Audio format conversion
│   │   └── converter.go    # TwilioToOpenAI, OpenAIToTwilio, etc.
│   └── codec/              # Audio codecs (mulaw, alaw, PCM)
│
├── gateway/                # Voice gateway integration
│   ├── gateway.go          # Gateway, Session, Config interfaces
│   └── bridge.go           # RealtimeBridge for telephony ↔ realtime
│
├── registry.go             # Global provider registry (STT, TTS, CallSystem, Gateway, Realtime)
├── registry/               # Provider discovery types
│   ├── registry.go         # Registry interface, factory types, Gateway/RealtimeProvider interfaces
│   └── options.go          # ProviderConfig, ProviderOption (WithVoice, WithModel, etc.)
│
├── subtitle/               # Subtitle generation
│   └── subtitle.go         # SRT/VTT from transcription results
│
└── examples/
    ├── simple-tts/         # Basic TTS example
    ├── voice-agent/        # Voice agent with Twilio
    └── multi-provider/     # Provider fallback example

Voice Gateway Interfaces

OmniVoice provides two gateway interfaces for different use cases:

`Gateway` - PSTN Phone Calls

For traditional phone calls via Twilio, Telnyx, Vonage, or Plivo:

type Gateway interface {
    Name() ProviderName
    Start(ctx context.Context) error
    Stop() error
    OnCall(handler CallHandler)              // Phone call comes in
    MakeCall(ctx, to string) (Session, error) // Dial phone number
    GetSession(callID string) (Session, bool)
    ListSessions() []Session
}

`WebRTCGateway` - Browser/Mobile Apps

For WebRTC-based voice via LiveKit, Daily, etc.:

type WebRTCGateway interface {
    Name() ProviderName
    Start(ctx context.Context) error
    Stop() error
    OnParticipantJoined(handler ParticipantHandler) // User joins room
    JoinRoom(ctx, roomName string) error
    LeaveRoom() error
    CurrentRoom() string
    GetSession(participantID string) (WebRTCSession, bool)
    ListSessions() []WebRTCSession
    GenerateClientToken(roomName, identity, displayName string) (string, error)
}

Comparison

Aspect	`Gateway` (PSTN)	`WebRTCGateway`
Connection	Phone number	Room name
Incoming	`OnCall()`	`OnParticipantJoined()`
Outgoing	`MakeCall(phoneNumber)`	`JoinRoom(roomName)`
Latency	500ms+	<200ms
Clients	Phone calls	Browser/mobile apps

Call System Integration

How Voice Agents Connect to Phone/Video Calls

Voice AI agents need a transport layer to receive and send audio. The choice depends on the use case:

┌───────────────────────────────────────────────────────────────────────┐
│                   Voice Gateway Providers (Bidirectional)             │
├────────────────┬───────────────┬─────────────────┬────────────────────┤
│    Platform    │   Protocol    │   Audio Format  │   Auth Method      │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Twilio         │ Media Streams │ mulaw 8kHz      │ Account SID/Token  │
│ Media Streams  │ WebSocket     │                 │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Telnyx         │ Media         │ L16 16kHz       │ API Key            │
│ Media Streaming│ WebSocket     │                 │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Vonage         │ Voice         │ L16 16kHz       │ JWT (RS256)        │
│ Voice WebSocket│ WebSocket     │                 │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Plivo          │ Stream API    │ L16 16kHz       │ Auth ID/Token      │
│ Audio Streaming│ WebSocket     │                 │                    │
└────────────────┴───────────────┴─────────────────┴────────────────────┘

┌───────────────────────────────────────────────────────────────────────┐
│                     Other Call System Options                         │
├────────────────┬───────────────┬─────────────────┬────────────────────┤
│    Platform    │   Protocol    │   Best For      │   Complexity       │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ LiveKit        │ WebRTC        │ Custom apps,    │ Low - open source  │
│                │               │ real-time AI    │ WebRTC rooms       │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Daily.co       │ WebRTC        │ Embedded video, │ Low - simple API   │
│                │               │ browser-based   │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ WebSocket      │ WS/WSS        │ Web apps,       │ Low - direct       │
│ (Direct)       │               │ custom UIs      │ streaming          │
└────────────────┴───────────────┴─────────────────┴────────────────────┘

Wiring Diagram: Voice Agent in a Phone Call

┌────────────────────────────────────────────────────────────────────────────────┐
│                     PSTN/WebSocket Call Flow                                   │
│                                                                                │
│   ┌─────────┐         ┌─────────────┐          ┌───────────────────────────┐   │
│   │  User   │◄───────►│  Provider   │◄────────►│        OmniVoice          │   │
│   │ (Phone) │  PSTN   │  (Twilio/   │ WebSocket│                           │   │
│   │         │         │   Telnyx/   │          │  ┌─────────────────────┐  │   │
│   └─────────┘         │   Vonage/   │          │  │   Voice Agent       │  │   │
│                       │   Plivo)    │          │  │                     │  │   │
│                       └─────────────┘          │  │  ┌───────┐          │  │   │
│                         Audio In ─────────────►│  │  │  STT  │──┐       │  │   │
│                                                │  │  └───────┘  │       │  │   │
│                                                │  │             ▼       │  │   │
│                                                │  │  ┌───────────────┐  │  │   │
│                                                │  │  │  LLM / Agent  │  │  │   │
│                                                │  │  │  (Eino, etc.) │  │  │   │
│                                                │  │  └───────────────┘  │  │   │
│                                                │  │             │       │  │   │
│                                                │  │             ▼       │  │   │
│                                                │  │  ┌───────┐          │  │   │
│                         Audio Out ◄────────────│  │  │  TTS  │◄─┘       │  │   │
│                                                │  │  └───────┘          │  │   │
│                                                │  └─────────────────────┘  │   │
│                                                └───────────────────────────┘   │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘

Wiring Diagram: Voice Agent in a Zoom Meeting

┌────────────────────────────────────────────────────────────────────────────┐
│                     Zoom Meeting Flow                                      │
│                                                                            │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                         Zoom Meeting                               │   │
│   │                                                                    │   │
│   │   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────────────────┐   │   │
│   │   │ User 1  │  │ User 2  │  │ User 3  │  │     Bot Client      │   │   │
│   │   │ (Human) │  │ (Human) │  │ (Human) │  │   (Zoom SDK)        │   │   │
│   │   └─────────┘  └─────────┘  └─────────┘  └──────────┬──────────┘   │   │
│   │                                                     │              │   │
│   └─────────────────────────────────────────────────────┼──────────────┘   │
│                                                         │                  │
│                                        Raw Audio Stream │                  │
│                                                         ▼                  │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                        OmniVoice Agent                             │   │
│   │                                                                    │   │
│   │   Option A: Use Recall.ai (recommended)                            │   │
│   │   ┌─────────────┐                                                  │   │
│   │   │  Recall.ai  │──► Handles Zoom SDK complexity                   │   │
│   │   │     Bot     │──► Provides audio stream via WebSocket           │   │
│   │   └─────────────┘                                                  │   │
│   │                                                                    │   │
│   │   Option B: Self-hosted Zoom SDK Bot                               │   │
│   │   ┌─────────────┐                                                  │   │
│   │   │ Zoom Linux  │──► Complex: requires native SDK                  │   │
│   │   │   SDK Bot   │──► One instance per meeting                      │   │
│   │   └─────────────┘──► Months of engineering                         │   │
│   │                                                                    │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Use Case Recommendations

Use Case	Call System	Transport	Notes
IVR / Call Center	Twilio, Telnyx, Plivo	PSTN/WebSocket	Managed infrastructure
International Calls	Plivo, Vonage	PSTN/WebSocket	Good international rates
Enterprise Voice	Vonage, Telnyx	PSTN/WebSocket	Flexible call control
Custom Web App	LiveKit or Daily	WebRTC	Open source, flexible
Browser Widget	Direct WebSocket	WebSocket	ElevenLabs widget or custom
Mobile App	LiveKit	WebRTC	Cross-platform support

Latency Considerations

For natural conversation, total round-trip latency should be under 500ms:

User speaks → STT (100-300ms) → LLM (200-500ms) → TTS (100-200ms) → User hears

Target: < 500ms total for "instant" feel
Acceptable: < 1000ms for natural conversation
Poor: > 1500ms feels laggy

Optimization Strategies

Streaming STT: Start processing before user finishes speaking
Streaming TTS: Start playing audio before full response generated
Edge inference: Use providers with edge nodes (Deepgram, ElevenLabs)
Turn detection: Use voice activity detection (VAD) for quick turn-taking

Provider Comparison

TTS Providers

Provider	Latency	Quality	Voices	Streaming	Price
ElevenLabs	Low	Excellent	5000+	Yes	$$$
Cartesia	Very Low	Good	100+	Yes	$$
AWS Polly	Low	Good	60+	Yes	$
Google TTS	Low	Good	200+	Yes	$
Azure Speech	Low	Excellent	400+	Yes	$$

STT Providers

Provider	Latency	Accuracy	Streaming	Languages	Price
Deepgram	Very Low	Excellent	Yes	30+	$$
Whisper (OpenAI)	Medium	Excellent	No*	50+	$
Google Speech	Low	Excellent	Yes	125+	$$
AssemblyAI	Low	Excellent	Yes	20+	$$
Azure Speech	Low	Excellent	Yes	100+	$$

*Whisper requires self-hosting for streaming (e.g., faster-whisper)

Voice Agent Platforms

Provider	Customization	Latency	Telephony	Price
ElevenLabs Agents	Medium	Low	Via Twilio	$$$
Vapi	High	Low	Built-in	$$
Retell AI	High	Low	Built-in	$$
Custom (OmniVoice)	Full	Variable	Via integration	Variable

Provider Conformance Testing

OmniVoice includes conformance test suites that provider implementations can use to verify they correctly implement the TTS and STT interfaces with consistent behavior.

Using Conformance Tests

Provider implementations should import the providertest packages and run the conformance tests:

// In your provider's conformance_test.go
import (
    "github.com/plexusone/omnivoice-core/stt/providertest"
    // or for TTS:
    // "github.com/plexusone/omnivoice-core/tts/providertest"
)

func TestConformance(t *testing.T) {
    p, err := New(WithAPIKey(apiKey))
    if err != nil {
        t.Fatal(err)
    }

    providertest.RunAll(t, providertest.Config{
        Provider:        p,
        TestAudioFile:   "/path/to/test.mp3",
        TestAudioURL:    "https://example.com/test.mp3",
        // ...
    })
}

Test Categories

Category	Description	API Required
Interface	Verify provider implements interface contract (Name, etc.)	No
Behavior	Verify edge case handling (empty input, context cancellation)	Sometimes
Integration	Verify actual synthesis/transcription works	Yes

STT Integration Tests

Test	Description
`Transcribe`	Batch transcription from audio bytes
`TranscribeFile`	Batch transcription from local file path
`TranscribeURL`	Batch transcription from remote URL
`TranscribeStream`	Real-time streaming transcription

TTS Integration Tests

Test	Description
`Synthesize`	Returns valid audio bytes
`SynthesizeStream`	Streams audio chunks
`SynthesizeFromReader`	Handles streaming text input

See Provider Conformance Testing TRD for detailed design documentation.

Mock Providers for Testing

The tts/providertest package includes mock providers and fixtures for testing TTS integrations without API keys:

import "github.com/plexusone/omnivoice-core/tts/providertest"

// Provider-specific mocks with realistic voices
elevenLabs := providertest.NewElevenLabsMock()  // Rachel, Bella, Antoni
deepgram := providertest.NewDeepgramMock()      // Asteria, Luna, Orion
openai := providertest.NewOpenAIMock()          // Alloy, Echo, Fable, Onyx, Nova, Shimmer

// Configurable mock behaviors
mock := providertest.NewMockProviderWithOptions(
    providertest.WithLatency(100 * time.Millisecond),  // Simulate network delay
    providertest.WithError(providertest.ErrMockRateLimit),  // Error injection
    providertest.WithFailAfterN(3, providertest.ErrMockQuotaExceeded),  // Failover testing
)

// Generate valid WAV fixtures
fixture := providertest.GenerateWAVFixture(1000, 22050)  // 1 second at 22050 Hz

Native Voice-to-Voice Providers

For lowest latency, use native voice-to-voice APIs that bypass traditional STT/TTS:

Provider	Package	Latency	Audio Format
OpenAI Realtime	`omni-openai/omnivoice/realtime`	~100ms	PCM16 24kHz
Gemini Live	`omni-google/omnivoice`	~200ms	PCM16 16kHz in, 24kHz out

These providers implement the RealtimeProvider interface:

type RealtimeProvider interface {
    ProcessAudioStream(
        ctx context.Context,
        audioIn <-chan []byte,
        config ProcessConfig,
    ) (<-chan AudioChunk, <-chan Transcript, error)
    Name() string
}

Resources

Voice Gateway Providers

Other Call Systems

LiveKit Voice AI
Daily.co
Recall.ai - Meeting bot infrastructure

Voice AI Providers

Documentation ¶

Index ¶

Constants
func GetCallSystemProvider(name string, opts ...registry.ProviderOption) (callsystem.CallSystem, error)
func GetCallSystemProviderPriority(name string) int
func GetGatewayProvider(name string, opts ...registry.ProviderOption) (registry.Gateway, error)
func GetGatewayProviderPriority(name string) int
func GetRealtimeProvider(name string, opts ...registry.ProviderOption) (registry.RealtimeProvider, error)
func GetRealtimeProviderPriority(name string) int
func GetSTTProvider(name string, opts ...registry.ProviderOption) (stt.Provider, error)
func GetSTTProviderPriority(name string) int
func GetTTSProvider(name string, opts ...registry.ProviderOption) (tts.Provider, error)
func GetTTSProviderPriority(name string) int
func HasCallSystemProvider(name string) bool
func HasGatewayProvider(name string) bool
func HasRealtimeProvider(name string) bool
func HasSTTProvider(name string) bool
func HasTTSProvider(name string) bool
func ListCallSystemProviders() []string
func ListGatewayProviders() []string
func ListRealtimeProviders() []string
func ListSTTProviders() []string
func ListTTSProviders() []string
func RegisterCallSystemProvider(name string, factory registry.CallSystemProviderFactory, priority int)
func RegisterGatewayProvider(name string, factory registry.GatewayProviderFactory, priority int)
func RegisterRealtimeProvider(name string, factory registry.RealtimeProviderFactory, priority int)
func RegisterSTTProvider(name string, factory registry.STTProviderFactory, priority int)
func RegisterTTSProvider(name string, factory registry.TTSProviderFactory, priority int)

Constants ¶

View Source

const (
	// PriorityThin is the priority for thin (stdlib-only) provider implementations.
	// These have no external dependencies beyond the standard library.
	PriorityThin = 0

	// PriorityThick is the priority for thick (official SDK) provider implementations.
	// These use official provider SDKs for full feature support.
	PriorityThick = 10
)

Priority constants for provider registration. Higher priority values override lower priority registrations.

Variables ¶

This section is empty.

Functions ¶

func GetCallSystemProvider ¶

func GetCallSystemProvider(name string, opts ...registry.ProviderOption) (callsystem.CallSystem, error)

GetCallSystemProvider creates a CallSystem provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetCallSystemProviderPriority ¶

func GetCallSystemProviderPriority(name string) int

GetCallSystemProviderPriority returns the priority of the registered CallSystem provider. Returns -1 if the provider is not registered.

func GetGatewayProvider ¶ added in v0.14.0

func GetGatewayProvider(name string, opts ...registry.ProviderOption) (registry.Gateway, error)

GetGatewayProvider creates a Gateway provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetGatewayProviderPriority ¶ added in v0.14.0

func GetGatewayProviderPriority(name string) int

GetGatewayProviderPriority returns the priority of the registered Gateway provider. Returns -1 if the provider is not registered.

func GetRealtimeProvider ¶ added in v0.14.0

func GetRealtimeProvider(name string, opts ...registry.ProviderOption) (registry.RealtimeProvider, error)

GetRealtimeProvider creates a Realtime provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetRealtimeProviderPriority ¶ added in v0.14.0

func GetRealtimeProviderPriority(name string) int

GetRealtimeProviderPriority returns the priority of the registered Realtime provider. Returns -1 if the provider is not registered.

func GetSTTProvider ¶

func GetSTTProvider(name string, opts ...registry.ProviderOption) (stt.Provider, error)

GetSTTProvider creates an STT provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetSTTProviderPriority ¶

func GetSTTProviderPriority(name string) int

GetSTTProviderPriority returns the priority of the registered STT provider. Returns -1 if the provider is not registered.

func GetTTSProvider ¶

func GetTTSProvider(name string, opts ...registry.ProviderOption) (tts.Provider, error)

GetTTSProvider creates a TTS provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetTTSProviderPriority ¶

func GetTTSProviderPriority(name string) int

GetTTSProviderPriority returns the priority of the registered TTS provider. Returns -1 if the provider is not registered.

func HasCallSystemProvider ¶

func HasCallSystemProvider(name string) bool

HasCallSystemProvider returns true if a CallSystem provider with the given name is registered.

func HasGatewayProvider ¶ added in v0.14.0

func HasGatewayProvider(name string) bool

HasGatewayProvider returns true if a Gateway provider with the given name is registered.

func HasRealtimeProvider ¶ added in v0.14.0

func HasRealtimeProvider(name string) bool

HasRealtimeProvider returns true if a Realtime provider with the given name is registered.

func HasSTTProvider ¶

func HasSTTProvider(name string) bool

HasSTTProvider returns true if an STT provider with the given name is registered.

func HasTTSProvider ¶

func HasTTSProvider(name string) bool

HasTTSProvider returns true if a TTS provider with the given name is registered.

func ListCallSystemProviders ¶

func ListCallSystemProviders() []string

ListCallSystemProviders returns a list of all registered CallSystem provider names.

func ListGatewayProviders ¶ added in v0.14.0

func ListGatewayProviders() []string

ListGatewayProviders returns a list of all registered Gateway provider names.

func ListRealtimeProviders ¶ added in v0.14.0

func ListRealtimeProviders() []string

ListRealtimeProviders returns a list of all registered Realtime provider names.

func ListSTTProviders ¶

func ListSTTProviders() []string

ListSTTProviders returns a list of all registered STT provider names.

func ListTTSProviders ¶

func ListTTSProviders() []string

ListTTSProviders returns a list of all registered TTS provider names.

func RegisterCallSystemProvider ¶

func RegisterCallSystemProvider(name string, factory registry.CallSystemProviderFactory, priority int)

RegisterCallSystemProvider registers a CallSystem provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-twilio/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterCallSystemProvider("twilio", NewCallSystemProvider, omnivoice.PriorityThick)
}

func RegisterGatewayProvider ¶ added in v0.14.0

func RegisterGatewayProvider(name string, factory registry.GatewayProviderFactory, priority int)

RegisterGatewayProvider registers a Gateway provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-twilio/omnivoice/gateway/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterGatewayProvider("twilio", NewGatewayProvider, omnivoice.PriorityThick)
}

func RegisterRealtimeProvider ¶ added in v0.14.0

func RegisterRealtimeProvider(name string, factory registry.RealtimeProviderFactory, priority int)

RegisterRealtimeProvider registers a Realtime provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-openai/omnivoice/realtime/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterRealtimeProvider("openai", NewRealtimeProvider, omnivoice.PriorityThick)
}

func RegisterSTTProvider ¶

func RegisterSTTProvider(name string, factory registry.STTProviderFactory, priority int)

RegisterSTTProvider registers an STT provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-deepgram/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterSTTProvider("deepgram", NewSTTProvider, omnivoice.PriorityThick)
}

func RegisterTTSProvider ¶

func RegisterTTSProvider(name string, factory registry.TTSProviderFactory, priority int)

RegisterTTSProvider registers a TTS provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-elevenlabs/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterTTSProvider("elevenlabs", NewTTSProvider, omnivoice.PriorityThick)
}

Types ¶

This section is empty.

Source Files ¶

View all Source files

registry.go

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
agent Package agent provides voice agent orchestration for real-time conversations.	Package agent provides voice agent orchestration for real-time conversations.
audio
codec Package codec provides audio codec implementations for telephony.	Package codec provides audio codec implementations for telephony.
converter Package converter provides audio format conversion for voice gateways.	Package converter provides audio format conversion for voice gateways.
format Package format defines audio format types for voice processing.	Package format defines audio format types for voice processing.
bargein Package bargein provides barge-in detection for voice conversations.	Package bargein provides barge-in detection for voice conversations.
callsystem Package callsystem provides integrations with telephony and meeting platforms.	Package callsystem provides integrations with telephony and meeting platforms.
providertest Package providertest provides conformance tests for CallSystem provider implementations.	Package providertest provides conformance tests for CallSystem provider implementations.
examples
simple-tts command Example: Simple TTS with provider fallback	Example: Simple TTS with provider fallback
zoom-agent command Example: Voice agent in Zoom meetings	Example: Voice agent in Zoom meetings
gateway Package gateway provides a provider-agnostic interface for voice gateways.	Package gateway provides a provider-agnostic interface for voice gateways.
mcp Package mcp provides an MCP (Model Context Protocol) server for voice interactions.	Package mcp provides an MCP (Model Context Protocol) server for voice interactions.
observability Package observability provides instrumentation interfaces for voice operations.	Package observability provides instrumentation interfaces for voice operations.
pipeline Package pipeline provides components for connecting voice processing stages.	Package pipeline provides components for connecting voice processing stages.
provider Package provider provides generic multi-provider client management.	Package provider provides generic multi-provider client management.
realtime Package realtime provides a unified interface for real-time voice-to-voice providers.	Package realtime provides a unified interface for real-time voice-to-voice providers.
registry Package registry provides types for provider registration and discovery.	Package registry provides types for provider registration and discovery.
resilience Package resilience provides error handling, retry logic, and backoff strategies for building resilient voice applications.	Package resilience provides error handling, retry logic, and backoff strategies for building resilient voice applications.
schema Package schema provides embedded JSON Schema definitions for OmniVoice formats.	Package schema provides embedded JSON Schema definitions for OmniVoice formats.
storage Package storage provides session state persistence for voice calls.	Package storage provides session state persistence for voice calls.
stt Package stt provides a unified interface for Speech-to-Text providers.	Package stt provides a unified interface for Speech-to-Text providers.
providertest Package providertest provides conformance tests for STT provider implementations.	Package providertest provides conformance tests for STT provider implementations.
subtitle Package subtitle generates SRT and WebVTT subtitles from STT transcription results.	Package subtitle generates SRT and WebVTT subtitles from STT transcription results.
transport Package transport provides audio transport protocols for voice agents.	Package transport provides audio transport protocols for voice agents.
providertest Package providertest provides conformance tests for Transport provider implementations.	Package providertest provides conformance tests for Transport provider implementations.
tts Package tts provides a unified interface for Text-to-Speech providers.	Package tts provides a unified interface for Text-to-Speech providers.
providertest Package providertest provides conformance tests for TTS provider implementations.	Package providertest provides conformance tests for TTS provider implementations.

README ¶

OmniVoice

Voice Architecture: Traditional vs Native

Traditional Pipeline (STT → LLM → TTS)

Native Voice-to-Voice

Architecture Overview

Package Structure

Voice Gateway Interfaces

Gateway - PSTN Phone Calls

WebRTCGateway - Browser/Mobile Apps

Comparison

Call System Integration

How Voice Agents Connect to Phone/Video Calls

Wiring Diagram: Voice Agent in a Phone Call

Wiring Diagram: Voice Agent in a Zoom Meeting

Use Case Recommendations

Latency Considerations

Optimization Strategies

Provider Comparison

TTS Providers

STT Providers

Voice Agent Platforms

Provider Conformance Testing

Using Conformance Tests

Test Categories

STT Integration Tests

TTS Integration Tests

Mock Providers for Testing

Native Voice-to-Voice Providers

Resources

Voice Gateway Providers

Other Call Systems

Voice AI Providers

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func GetCallSystemProvider ¶

func GetCallSystemProviderPriority ¶

func GetGatewayProvider ¶ added in v0.14.0

func GetGatewayProviderPriority ¶ added in v0.14.0

func GetRealtimeProvider ¶ added in v0.14.0

func GetRealtimeProviderPriority ¶ added in v0.14.0

func GetSTTProvider ¶

func GetSTTProviderPriority ¶

func GetTTSProvider ¶

func GetTTSProviderPriority ¶

func HasCallSystemProvider ¶

func HasGatewayProvider ¶ added in v0.14.0

func HasRealtimeProvider ¶ added in v0.14.0

func HasSTTProvider ¶

func HasTTSProvider ¶

func ListCallSystemProviders ¶

func ListGatewayProviders ¶ added in v0.14.0

func ListRealtimeProviders ¶ added in v0.14.0

func ListSTTProviders ¶

func ListTTSProviders ¶

func RegisterCallSystemProvider ¶

func RegisterGatewayProvider ¶ added in v0.14.0

func RegisterRealtimeProvider ¶ added in v0.14.0

func RegisterSTTProvider ¶

func RegisterTTSProvider ¶

Types ¶

Source Files ¶

Directories ¶

`Gateway` - PSTN Phone Calls

`WebRTCGateway` - Browser/Mobile Apps