omnivoice

package module
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2026 License: MIT Imports: 6 Imported by: 0

README

OmniVoice

Go CI Go Lint Go SAST Go Report Card Docs Visualization License

Voice abstraction layer for AgentPlexus supporting TTS, STT, and Voice Agents across multiple providers and transport protocols.

Voice Architecture: Traditional vs Native

OmniVoice supports two fundamentally different approaches for real-time voice:

Traditional Pipeline (STT → LLM → TTS)
Audio In → [STT Provider] → Text → [LLM] → Text → [TTS Provider] → Audio Out
  • Latency: 500-1500ms (sum of STT + LLM + TTS)
  • Flexibility: Mix and match any STT, LLM, and TTS providers
  • Use case: Custom voice selection, specialized STT for domain-specific terminology
Native Voice-to-Voice
Audio In → [OpenAI Realtime / Gemini Live] → Audio Out
  • Latency: 100-200ms (model handles audio directly)
  • Simplicity: Single API, no separate STT/TTS configuration
  • Use case: Low-latency conversations, natural barge-in handling
Aspect Traditional Native Voice-to-Voice
Latency 500-1500ms 100-200ms
STT/TTS Config Required Built-in
Core Interface stt.Provider, tts.Provider realtime.Provider
Provider Packages tts/, stt/ omni-openai/omnivoice/realtime, omni-google/omnivoice/realtime
Gateway Bridge Pipeline-based RealtimeBridge in gateway/
Barge-in Via bargein/ package Native support

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              OmniVoice                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────────────┐  │
│  │     TTS     │    │     STT     │    │          Voice Agent            │  │
│  │             │    │             │    │                                 │  │
│  │ Text → Audio│    │ Audio → Text│    │  Real-time bidirectional voice  │  │
│  └──────┬──────┘    └──────┬──────┘    └───────────────┬─────────────────┘  │
│         │                  │                           │                    │
│         ▼                  ▼                           ▼                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         Provider Layer                              │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤    │
│  │ ElevenLabs  │  Deepgram   │ Google Cloud│    AWS      │   Azure     │    │
│  │ Cartesia    │  Whisper    │ AssemblyAI  │   Polly     │   Speech    │    │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         Transport Layer                             │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤    │
│  │   WebRTC    │     SIP     │    PSTN     │  WebSocket  │    HTTP     │    │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                      Call System Integration                        │    │
│  ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤    │
│  │   Twilio    │   Telnyx    │   Vonage    │    Plivo    │   LiveKit   │    │
│  └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Package Structure

omnivoice/
├── tts/                    # Text-to-Speech
│   ├── tts.go              # Interface definitions
│   ├── elevenlabs/         # ElevenLabs provider
│   ├── polly/              # AWS Polly provider
│   ├── google/             # Google Cloud TTS
│   ├── azure/              # Azure Speech
│   └── cartesia/           # Cartesia provider
│
├── stt/                    # Speech-to-Text
│   ├── stt.go              # Interface definitions
│   ├── transcript.go       # Canonical Transcript format
│   ├── whisper/            # OpenAI Whisper
│   ├── deepgram/           # Deepgram provider
│   ├── google/             # Google Speech-to-Text
│   ├── azure/              # Azure Speech
│   └── assemblyai/         # AssemblyAI provider
│
├── schema/                 # Embedded JSON Schemas
│   ├── schema.go           # //go:embed directives
│   └── transcript-v1.schema.json  # Transcript format schema
│
├── agent/                  # Voice Agent orchestration
│   ├── agent.go            # Interface definitions
│   ├── session.go          # Conversation session management
│   ├── elevenlabs/         # ElevenLabs Agents
│   ├── vapi/               # Vapi.ai
│   ├── retell/             # Retell AI
│   └── custom/             # Custom agent (STT + LLM + TTS)
│
├── transport/              # Audio transport protocols
│   ├── transport.go        # Interface definitions
│   ├── webrtc/             # WebRTC transport
│   ├── websocket/          # WebSocket streaming
│   ├── sip/                # SIP protocol
│   └── http/               # HTTP-based (batch)
│
├── callsystem/             # Call system integrations
│   ├── callsystem.go       # Interface definitions
│   ├── client.go           # Multi-provider client with failover
│   ├── sms.go              # SMSProvider interface
│   ├── twilio/             # Twilio ConversationRelay
│   ├── ringcentral/        # RingCentral Voice API
│   ├── zoom/               # Zoom SDK integration
│   ├── livekit/            # LiveKit rooms
│   └── daily/              # Daily.co
│
├── observability/          # Voice instrumentation
│   ├── events.go           # VoiceEvent, VoiceObserver
│   └── hooks.go            # TTSHook, STTHook interfaces
│
├── resilience/             # Error handling and retry logic
│   ├── category.go         # Error categories (transient, rate_limit, auth, etc.)
│   ├── error.go            # ProviderError with classification metadata
│   ├── classifier.go       # ErrorClassifier interface
│   ├── retry.go            # Retry and RetryWithResult functions
│   └── backoff.go          # Backoff strategies (exponential, linear, etc.)
│
├── storage/                # Session state persistence
│   ├── store.go            # SessionStore interface
│   ├── types.go            # SessionState, Turn, Metrics types
│   ├── memory.go           # In-memory implementation
│   └── redis.go            # Redis implementation
│
├── bargein/                # Barge-in detection
│   ├── config.go           # InterruptionMode (immediate, after_sentence, disabled)
│   └── detector.go         # BargeInDetector with TTS/STT integration
│
├── realtime/               # Native voice-to-voice
│   ├── provider.go         # Provider interface for OpenAI Realtime / Gemini Live
│   ├── client.go           # Multi-provider client with fallback
│   └── errors.go           # Common realtime errors
│
├── audio/                  # Audio processing
│   ├── format/             # Audio format definitions
│   │   └── format.go       # Encoding type with normalization, provider format constants
│   ├── converter/          # Audio format conversion
│   │   └── converter.go    # TwilioToOpenAI, OpenAIToTwilio, etc.
│   └── codec/              # Audio codecs (mulaw, alaw, PCM)
│
├── gateway/                # Voice gateway integration
│   ├── gateway.go          # Gateway, Session, Config interfaces
│   └── bridge.go           # RealtimeBridge for telephony ↔ realtime
│
├── registry.go             # Global provider registry (STT, TTS, CallSystem, Gateway, Realtime)
├── registry/               # Provider discovery types
│   ├── registry.go         # Registry interface, factory types, Gateway/RealtimeProvider interfaces
│   └── options.go          # ProviderConfig, ProviderOption (WithVoice, WithModel, etc.)
│
├── subtitle/               # Subtitle generation
│   └── subtitle.go         # SRT/VTT from transcription results
│
└── examples/
    ├── simple-tts/         # Basic TTS example
    ├── voice-agent/        # Voice agent with Twilio
    └── multi-provider/     # Provider fallback example

Voice Gateway Interfaces

OmniVoice provides two gateway interfaces for different use cases:

Gateway - PSTN Phone Calls

For traditional phone calls via Twilio, Telnyx, Vonage, or Plivo:

type Gateway interface {
    Name() ProviderName
    Start(ctx context.Context) error
    Stop() error
    OnCall(handler CallHandler)              // Phone call comes in
    MakeCall(ctx, to string) (Session, error) // Dial phone number
    GetSession(callID string) (Session, bool)
    ListSessions() []Session
}
WebRTCGateway - Browser/Mobile Apps

For WebRTC-based voice via LiveKit, Daily, etc.:

type WebRTCGateway interface {
    Name() ProviderName
    Start(ctx context.Context) error
    Stop() error
    OnParticipantJoined(handler ParticipantHandler) // User joins room
    JoinRoom(ctx, roomName string) error
    LeaveRoom() error
    CurrentRoom() string
    GetSession(participantID string) (WebRTCSession, bool)
    ListSessions() []WebRTCSession
    GenerateClientToken(roomName, identity, displayName string) (string, error)
}
Comparison
Aspect Gateway (PSTN) WebRTCGateway
Connection Phone number Room name
Incoming OnCall() OnParticipantJoined()
Outgoing MakeCall(phoneNumber) JoinRoom(roomName)
Latency 500ms+ <200ms
Clients Phone calls Browser/mobile apps

Call System Integration

How Voice Agents Connect to Phone/Video Calls

Voice AI agents need a transport layer to receive and send audio. The choice depends on the use case:

┌───────────────────────────────────────────────────────────────────────┐
│                   Voice Gateway Providers (Bidirectional)             │
├────────────────┬───────────────┬─────────────────┬────────────────────┤
│    Platform    │   Protocol    │   Audio Format  │   Auth Method      │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Twilio         │ Media Streams │ mulaw 8kHz      │ Account SID/Token  │
│ Media Streams  │ WebSocket     │                 │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Telnyx         │ Media         │ L16 16kHz       │ API Key            │
│ Media Streaming│ WebSocket     │                 │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Vonage         │ Voice         │ L16 16kHz       │ JWT (RS256)        │
│ Voice WebSocket│ WebSocket     │                 │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Plivo          │ Stream API    │ L16 16kHz       │ Auth ID/Token      │
│ Audio Streaming│ WebSocket     │                 │                    │
└────────────────┴───────────────┴─────────────────┴────────────────────┘

┌───────────────────────────────────────────────────────────────────────┐
│                     Other Call System Options                         │
├────────────────┬───────────────┬─────────────────┬────────────────────┤
│    Platform    │   Protocol    │   Best For      │   Complexity       │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ LiveKit        │ WebRTC        │ Custom apps,    │ Low - open source  │
│                │               │ real-time AI    │ WebRTC rooms       │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ Daily.co       │ WebRTC        │ Embedded video, │ Low - simple API   │
│                │               │ browser-based   │                    │
├────────────────┼───────────────┼─────────────────┼────────────────────┤
│ WebSocket      │ WS/WSS        │ Web apps,       │ Low - direct       │
│ (Direct)       │               │ custom UIs      │ streaming          │
└────────────────┴───────────────┴─────────────────┴────────────────────┘
Wiring Diagram: Voice Agent in a Phone Call
┌────────────────────────────────────────────────────────────────────────────────┐
│                     PSTN/WebSocket Call Flow                                   │
│                                                                                │
│   ┌─────────┐         ┌─────────────┐          ┌───────────────────────────┐   │
│   │  User   │◄───────►│  Provider   │◄────────►│        OmniVoice          │   │
│   │ (Phone) │  PSTN   │  (Twilio/   │ WebSocket│                           │   │
│   │         │         │   Telnyx/   │          │  ┌─────────────────────┐  │   │
│   └─────────┘         │   Vonage/   │          │  │   Voice Agent       │  │   │
│                       │   Plivo)    │          │  │                     │  │   │
│                       └─────────────┘          │  │  ┌───────┐          │  │   │
│                         Audio In ─────────────►│  │  │  STT  │──┐       │  │   │
│                                                │  │  └───────┘  │       │  │   │
│                                                │  │             ▼       │  │   │
│                                                │  │  ┌───────────────┐  │  │   │
│                                                │  │  │  LLM / Agent  │  │  │   │
│                                                │  │  │  (Eino, etc.) │  │  │   │
│                                                │  │  └───────────────┘  │  │   │
│                                                │  │             │       │  │   │
│                                                │  │             ▼       │  │   │
│                                                │  │  ┌───────┐          │  │   │
│                         Audio Out ◄────────────│  │  │  TTS  │◄─┘       │  │   │
│                                                │  │  └───────┘          │  │   │
│                                                │  └─────────────────────┘  │   │
│                                                └───────────────────────────┘   │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘
Wiring Diagram: Voice Agent in a Zoom Meeting
┌────────────────────────────────────────────────────────────────────────────┐
│                     Zoom Meeting Flow                                      │
│                                                                            │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                         Zoom Meeting                               │   │
│   │                                                                    │   │
│   │   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────────────────┐   │   │
│   │   │ User 1  │  │ User 2  │  │ User 3  │  │     Bot Client      │   │   │
│   │   │ (Human) │  │ (Human) │  │ (Human) │  │   (Zoom SDK)        │   │   │
│   │   └─────────┘  └─────────┘  └─────────┘  └──────────┬──────────┘   │   │
│   │                                                     │              │   │
│   └─────────────────────────────────────────────────────┼──────────────┘   │
│                                                         │                  │
│                                        Raw Audio Stream │                  │
│                                                         ▼                  │
│   ┌────────────────────────────────────────────────────────────────────┐   │
│   │                        OmniVoice Agent                             │   │
│   │                                                                    │   │
│   │   Option A: Use Recall.ai (recommended)                            │   │
│   │   ┌─────────────┐                                                  │   │
│   │   │  Recall.ai  │──► Handles Zoom SDK complexity                   │   │
│   │   │     Bot     │──► Provides audio stream via WebSocket           │   │
│   │   └─────────────┘                                                  │   │
│   │                                                                    │   │
│   │   Option B: Self-hosted Zoom SDK Bot                               │   │
│   │   ┌─────────────┐                                                  │   │
│   │   │ Zoom Linux  │──► Complex: requires native SDK                  │   │
│   │   │   SDK Bot   │──► One instance per meeting                      │   │
│   │   └─────────────┘──► Months of engineering                         │   │
│   │                                                                    │   │
│   └────────────────────────────────────────────────────────────────────┘   │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Use Case Recommendations

Use Case Call System Transport Notes
IVR / Call Center Twilio, Telnyx, Plivo PSTN/WebSocket Managed infrastructure
International Calls Plivo, Vonage PSTN/WebSocket Good international rates
Enterprise Voice Vonage, Telnyx PSTN/WebSocket Flexible call control
Custom Web App LiveKit or Daily WebRTC Open source, flexible
Browser Widget Direct WebSocket WebSocket ElevenLabs widget or custom
Mobile App LiveKit WebRTC Cross-platform support

Latency Considerations

For natural conversation, total round-trip latency should be under 500ms:

User speaks → STT (100-300ms) → LLM (200-500ms) → TTS (100-200ms) → User hears

Target: < 500ms total for "instant" feel
Acceptable: < 1000ms for natural conversation
Poor: > 1500ms feels laggy
Optimization Strategies
  1. Streaming STT: Start processing before user finishes speaking
  2. Streaming TTS: Start playing audio before full response generated
  3. Edge inference: Use providers with edge nodes (Deepgram, ElevenLabs)
  4. Turn detection: Use voice activity detection (VAD) for quick turn-taking

Provider Comparison

TTS Providers
Provider Latency Quality Voices Streaming Price
ElevenLabs Low Excellent 5000+ Yes $$$
Cartesia Very Low Good 100+ Yes $$
AWS Polly Low Good 60+ Yes $
Google TTS Low Good 200+ Yes $
Azure Speech Low Excellent 400+ Yes $$
STT Providers
Provider Latency Accuracy Streaming Languages Price
Deepgram Very Low Excellent Yes 30+ $$
Whisper (OpenAI) Medium Excellent No* 50+ $
Google Speech Low Excellent Yes 125+ $$
AssemblyAI Low Excellent Yes 20+ $$
Azure Speech Low Excellent Yes 100+ $$

*Whisper requires self-hosting for streaming (e.g., faster-whisper)

Voice Agent Platforms
Provider Customization Latency Telephony Price
ElevenLabs Agents Medium Low Via Twilio $$$
Vapi High Low Built-in $$
Retell AI High Low Built-in $$
Custom (OmniVoice) Full Variable Via integration Variable

Provider Conformance Testing

OmniVoice includes conformance test suites that provider implementations can use to verify they correctly implement the TTS and STT interfaces with consistent behavior.

Using Conformance Tests

Provider implementations should import the providertest packages and run the conformance tests:

// In your provider's conformance_test.go
import (
    "github.com/plexusone/omnivoice-core/stt/providertest"
    // or for TTS:
    // "github.com/plexusone/omnivoice-core/tts/providertest"
)

func TestConformance(t *testing.T) {
    p, err := New(WithAPIKey(apiKey))
    if err != nil {
        t.Fatal(err)
    }

    providertest.RunAll(t, providertest.Config{
        Provider:        p,
        TestAudioFile:   "/path/to/test.mp3",
        TestAudioURL:    "https://example.com/test.mp3",
        // ...
    })
}
Test Categories
Category Description API Required
Interface Verify provider implements interface contract (Name, etc.) No
Behavior Verify edge case handling (empty input, context cancellation) Sometimes
Integration Verify actual synthesis/transcription works Yes
STT Integration Tests
Test Description
Transcribe Batch transcription from audio bytes
TranscribeFile Batch transcription from local file path
TranscribeURL Batch transcription from remote URL
TranscribeStream Real-time streaming transcription
TTS Integration Tests
Test Description
Synthesize Returns valid audio bytes
SynthesizeStream Streams audio chunks
SynthesizeFromReader Handles streaming text input

See Provider Conformance Testing TRD for detailed design documentation.

Mock Providers for Testing

The tts/providertest package includes mock providers and fixtures for testing TTS integrations without API keys:

import "github.com/plexusone/omnivoice-core/tts/providertest"

// Provider-specific mocks with realistic voices
elevenLabs := providertest.NewElevenLabsMock()  // Rachel, Bella, Antoni
deepgram := providertest.NewDeepgramMock()      // Asteria, Luna, Orion
openai := providertest.NewOpenAIMock()          // Alloy, Echo, Fable, Onyx, Nova, Shimmer

// Configurable mock behaviors
mock := providertest.NewMockProviderWithOptions(
    providertest.WithLatency(100 * time.Millisecond),  // Simulate network delay
    providertest.WithError(providertest.ErrMockRateLimit),  // Error injection
    providertest.WithFailAfterN(3, providertest.ErrMockQuotaExceeded),  // Failover testing
)

// Generate valid WAV fixtures
fixture := providertest.GenerateWAVFixture(1000, 22050)  // 1 second at 22050 Hz

Native Voice-to-Voice Providers

For lowest latency, use native voice-to-voice APIs that bypass traditional STT/TTS:

Provider Package Latency Audio Format
OpenAI Realtime omni-openai/omnivoice/realtime ~100ms PCM16 24kHz
Gemini Live omni-google/omnivoice ~200ms PCM16 16kHz in, 24kHz out

These providers implement the RealtimeProvider interface:

type RealtimeProvider interface {
    ProcessAudioStream(
        ctx context.Context,
        audioIn <-chan []byte,
        config ProcessConfig,
    ) (<-chan AudioChunk, <-chan Transcript, error)
    Name() string
}

Resources

Voice Gateway Providers
Other Call Systems
Voice AI Providers

Documentation

Index

Constants

View Source
const (
	// PriorityThin is the priority for thin (stdlib-only) provider implementations.
	// These have no external dependencies beyond the standard library.
	PriorityThin = 0

	// PriorityThick is the priority for thick (official SDK) provider implementations.
	// These use official provider SDKs for full feature support.
	PriorityThick = 10
)

Priority constants for provider registration. Higher priority values override lower priority registrations.

Variables

This section is empty.

Functions

func GetCallSystemProvider

func GetCallSystemProvider(name string, opts ...registry.ProviderOption) (callsystem.CallSystem, error)

GetCallSystemProvider creates a CallSystem provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetCallSystemProviderPriority

func GetCallSystemProviderPriority(name string) int

GetCallSystemProviderPriority returns the priority of the registered CallSystem provider. Returns -1 if the provider is not registered.

func GetGatewayProvider added in v0.14.0

func GetGatewayProvider(name string, opts ...registry.ProviderOption) (registry.Gateway, error)

GetGatewayProvider creates a Gateway provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetGatewayProviderPriority added in v0.14.0

func GetGatewayProviderPriority(name string) int

GetGatewayProviderPriority returns the priority of the registered Gateway provider. Returns -1 if the provider is not registered.

func GetRealtimeProvider added in v0.14.0

func GetRealtimeProvider(name string, opts ...registry.ProviderOption) (registry.RealtimeProvider, error)

GetRealtimeProvider creates a Realtime provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetRealtimeProviderPriority added in v0.14.0

func GetRealtimeProviderPriority(name string) int

GetRealtimeProviderPriority returns the priority of the registered Realtime provider. Returns -1 if the provider is not registered.

func GetSTTProvider

func GetSTTProvider(name string, opts ...registry.ProviderOption) (stt.Provider, error)

GetSTTProvider creates an STT provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetSTTProviderPriority

func GetSTTProviderPriority(name string) int

GetSTTProviderPriority returns the priority of the registered STT provider. Returns -1 if the provider is not registered.

func GetTTSProvider

func GetTTSProvider(name string, opts ...registry.ProviderOption) (tts.Provider, error)

GetTTSProvider creates a TTS provider instance from the registry. Returns an error if the provider is not registered or if creation fails.

func GetTTSProviderPriority

func GetTTSProviderPriority(name string) int

GetTTSProviderPriority returns the priority of the registered TTS provider. Returns -1 if the provider is not registered.

func HasCallSystemProvider

func HasCallSystemProvider(name string) bool

HasCallSystemProvider returns true if a CallSystem provider with the given name is registered.

func HasGatewayProvider added in v0.14.0

func HasGatewayProvider(name string) bool

HasGatewayProvider returns true if a Gateway provider with the given name is registered.

func HasRealtimeProvider added in v0.14.0

func HasRealtimeProvider(name string) bool

HasRealtimeProvider returns true if a Realtime provider with the given name is registered.

func HasSTTProvider

func HasSTTProvider(name string) bool

HasSTTProvider returns true if an STT provider with the given name is registered.

func HasTTSProvider

func HasTTSProvider(name string) bool

HasTTSProvider returns true if a TTS provider with the given name is registered.

func ListCallSystemProviders

func ListCallSystemProviders() []string

ListCallSystemProviders returns a list of all registered CallSystem provider names.

func ListGatewayProviders added in v0.14.0

func ListGatewayProviders() []string

ListGatewayProviders returns a list of all registered Gateway provider names.

func ListRealtimeProviders added in v0.14.0

func ListRealtimeProviders() []string

ListRealtimeProviders returns a list of all registered Realtime provider names.

func ListSTTProviders

func ListSTTProviders() []string

ListSTTProviders returns a list of all registered STT provider names.

func ListTTSProviders

func ListTTSProviders() []string

ListTTSProviders returns a list of all registered TTS provider names.

func RegisterCallSystemProvider

func RegisterCallSystemProvider(name string, factory registry.CallSystemProviderFactory, priority int)

RegisterCallSystemProvider registers a CallSystem provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-twilio/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterCallSystemProvider("twilio", NewCallSystemProvider, omnivoice.PriorityThick)
}

func RegisterGatewayProvider added in v0.14.0

func RegisterGatewayProvider(name string, factory registry.GatewayProviderFactory, priority int)

RegisterGatewayProvider registers a Gateway provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-twilio/omnivoice/gateway/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterGatewayProvider("twilio", NewGatewayProvider, omnivoice.PriorityThick)
}

func RegisterRealtimeProvider added in v0.14.0

func RegisterRealtimeProvider(name string, factory registry.RealtimeProviderFactory, priority int)

RegisterRealtimeProvider registers a Realtime provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-openai/omnivoice/realtime/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterRealtimeProvider("openai", NewRealtimeProvider, omnivoice.PriorityThick)
}

func RegisterSTTProvider

func RegisterSTTProvider(name string, factory registry.STTProviderFactory, priority int)

RegisterSTTProvider registers an STT provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-deepgram/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterSTTProvider("deepgram", NewSTTProvider, omnivoice.PriorityThick)
}

func RegisterTTSProvider

func RegisterTTSProvider(name string, factory registry.TTSProviderFactory, priority int)

RegisterTTSProvider registers a TTS provider factory with the given name and priority. Higher priority values override lower priority registrations.

Example:

// In omni-elevenlabs/init.go (thick, priority 10)
func init() {
    omnivoice.RegisterTTSProvider("elevenlabs", NewTTSProvider, omnivoice.PriorityThick)
}

Types

This section is empty.

Directories

Path Synopsis
Package agent provides voice agent orchestration for real-time conversations.
Package agent provides voice agent orchestration for real-time conversations.
audio
codec
Package codec provides audio codec implementations for telephony.
Package codec provides audio codec implementations for telephony.
converter
Package converter provides audio format conversion for voice gateways.
Package converter provides audio format conversion for voice gateways.
format
Package format defines audio format types for voice processing.
Package format defines audio format types for voice processing.
Package bargein provides barge-in detection for voice conversations.
Package bargein provides barge-in detection for voice conversations.
Package callsystem provides integrations with telephony and meeting platforms.
Package callsystem provides integrations with telephony and meeting platforms.
providertest
Package providertest provides conformance tests for CallSystem provider implementations.
Package providertest provides conformance tests for CallSystem provider implementations.
examples
simple-tts command
Example: Simple TTS with provider fallback
Example: Simple TTS with provider fallback
zoom-agent command
Example: Voice agent in Zoom meetings
Example: Voice agent in Zoom meetings
Package gateway provides a provider-agnostic interface for voice gateways.
Package gateway provides a provider-agnostic interface for voice gateways.
Package mcp provides an MCP (Model Context Protocol) server for voice interactions.
Package mcp provides an MCP (Model Context Protocol) server for voice interactions.
Package observability provides instrumentation interfaces for voice operations.
Package observability provides instrumentation interfaces for voice operations.
Package pipeline provides components for connecting voice processing stages.
Package pipeline provides components for connecting voice processing stages.
Package provider provides generic multi-provider client management.
Package provider provides generic multi-provider client management.
Package realtime provides a unified interface for real-time voice-to-voice providers.
Package realtime provides a unified interface for real-time voice-to-voice providers.
Package registry provides types for provider registration and discovery.
Package registry provides types for provider registration and discovery.
Package resilience provides error handling, retry logic, and backoff strategies for building resilient voice applications.
Package resilience provides error handling, retry logic, and backoff strategies for building resilient voice applications.
Package schema provides embedded JSON Schema definitions for OmniVoice formats.
Package schema provides embedded JSON Schema definitions for OmniVoice formats.
Package storage provides session state persistence for voice calls.
Package storage provides session state persistence for voice calls.
stt
Package stt provides a unified interface for Speech-to-Text providers.
Package stt provides a unified interface for Speech-to-Text providers.
providertest
Package providertest provides conformance tests for STT provider implementations.
Package providertest provides conformance tests for STT provider implementations.
Package subtitle generates SRT and WebVTT subtitles from STT transcription results.
Package subtitle generates SRT and WebVTT subtitles from STT transcription results.
Package transport provides audio transport protocols for voice agents.
Package transport provides audio transport protocols for voice agents.
providertest
Package providertest provides conformance tests for Transport provider implementations.
Package providertest provides conformance tests for Transport provider implementations.
tts
Package tts provides a unified interface for Text-to-Speech providers.
Package tts provides a unified interface for Text-to-Speech providers.
providertest
Package providertest provides conformance tests for TTS provider implementations.
Package providertest provides conformance tests for TTS provider implementations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL