omnivoice

package module
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 7, 2026 License: MIT Imports: 6 Imported by: 0

README

OmniVoice

Go CI Go Lint Go SAST Go Report Card Docs Visualization License

Batteries-included voice pipeline framework for Go. This package provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) with all providers included.

For a minimal dependency footprint, use omnivoice-core instead.

Features

  • Unified Interface: Single API for all STT and TTS providers
  • Provider Registry: Get providers by name - no need to import individual provider packages
  • Multiple Providers: OpenAI, Deepgram, ElevenLabs, Twilio
  • Streaming Support: Real-time transcription and synthesis
  • Easy Integration: Import and use with minimal configuration

Installation

go get github.com/plexusone/omnivoice

Quick Start

import (
    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all" // Register all providers
)

Usage

package main

import (
    "context"
    "log"
    "os"

    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all"
)

func main() {
    ctx := context.Background()

    // Get providers by name using the registry
    sttProvider, err := omnivoice.GetSTTProvider("deepgram",
        omnivoice.WithAPIKey(os.Getenv("DEEPGRAM_API_KEY")))
    if err != nil {
        log.Fatal(err)
    }

    ttsProvider, err := omnivoice.GetTTSProvider("elevenlabs",
        omnivoice.WithAPIKey(os.Getenv("ELEVENLABS_API_KEY")))
    if err != nil {
        log.Fatal(err)
    }

    // Transcribe audio
    result, err := sttProvider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
        Language:             "en",
        EnableWordTimestamps: true,
    })
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("Transcription: %s", result.Text)

    // Synthesize speech
    audio, err := ttsProvider.Synthesize(ctx, "Hello, world!", omnivoice.SynthesisConfig{
        VoiceID: "pNInz6obpgDQGcFmaJgB", // Adam
    })
    if err != nil {
        log.Fatal(err)
    }
    // audio.Audio contains the audio bytes
}

Provider Registry

Get providers by name at runtime - no need to import individual provider packages:

// Available providers: "openai", "elevenlabs", "deepgram", "twilio"
ttsProvider, _ := omnivoice.GetTTSProvider("elevenlabs", omnivoice.WithAPIKey(key))
sttProvider, _ := omnivoice.GetSTTProvider("deepgram", omnivoice.WithAPIKey(key))

// List registered providers
fmt.Println(omnivoice.ListTTSProviders()) // [openai elevenlabs deepgram twilio]
fmt.Println(omnivoice.ListSTTProviders()) // [openai elevenlabs deepgram twilio]

Language Codes

OmniVoice accepts language codes in BCP-47 format, which includes ISO 639-1 two-letter codes and regional variants.

Common codes:

Code Language
en English
en-US English (US)
en-GB English (UK)
es Spanish
es-MX Spanish (Mexico)
fr French
de German
it Italian
pt Portuguese
pt-BR Portuguese (Brazil)
ja Japanese
ko Korean
zh Chinese
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
ar Arabic
hi Hindi
ru Russian

Notes:

  • Use simple codes (en) for broad compatibility across providers
  • Use regional variants (en-US) when accent/dialect matters for TTS
  • Provider support varies; see provider documentation for full language lists
  • STT providers generally support automatic language detection when no code is specified

Included Providers

Provider STT TTS Registry Name
OpenAI Whisper TTS-1/TTS-1-HD "openai"
ElevenLabs Scribe Multilingual v2 "elevenlabs"
Deepgram Nova-2 Aura "deepgram"
Twilio Media Streams Media Streams "twilio"

License

MIT License - see LICENSE for details.

Documentation

Overview

Package omnivoice provides a unified interface for speech-to-text and text-to-speech. This is the batteries-included package that imports all providers. For a minimal dependency footprint, use github.com/plexusone/omnivoice-core instead.

Package omnivoice provides a unified interface for speech-to-text and text-to-speech.

This is the batteries-included package that imports all providers. For a minimal dependency footprint, use github.com/plexusone/omnivoice-core instead.

Quick Start

Import the package with all providers:

import (
    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all"
)

Or import specific providers:

import (
    "github.com/plexusone/omnivoice"
    openai "github.com/plexusone/omnivoice-openai/omnivoice"
)

Creating Providers

// OpenAI provider
sttProvider := openai.NewSTTProvider(apiKey)
ttsProvider := openai.NewTTSProvider(apiKey)

// Create multi-provider client
sttClient := omnivoice.NewSTTClient(sttProvider)
ttsClient := omnivoice.NewTTSClient(ttsProvider)

Package omnivoice provides a unified interface for speech-to-text and text-to-speech. This is the batteries-included package that imports all providers. For a minimal dependency footprint, use github.com/plexusone/omnivoice-core instead.

Package omnivoice provides a unified interface for speech-to-text and text-to-speech. This is the batteries-included package that imports all providers. For a minimal dependency footprint, use github.com/plexusone/omnivoice-core instead.

Index

Constants

View Source
const (
	StatusRinging  = callsystem.StatusRinging
	StatusAnswered = callsystem.StatusAnswered
	StatusEnded    = callsystem.StatusEnded
	StatusFailed   = callsystem.StatusFailed
	StatusBusy     = callsystem.StatusBusy
	StatusNoAnswer = callsystem.StatusNoAnswer
	CallInbound    = callsystem.Inbound
	CallOutbound   = callsystem.Outbound
)

Re-export CallStatus constants

View Source
const (
	SubtitleFormatSRT = subtitle.FormatSRT
	SubtitleFormatVTT = subtitle.FormatVTT
)

Subtitle format constants.

View Source
const Version = "0.6.0"

Version is the current version of the omnivoice package.

Variables

View Source
var (
	WithFrom             = callsystem.WithFrom
	WithTimeout          = callsystem.WithTimeout
	WithMachineDetection = callsystem.WithMachineDetection
	WithRecording        = callsystem.WithRecording
	WithWhisper          = callsystem.WithWhisper
	WithAgent            = callsystem.WithAgent
	WithStatusCallback   = callsystem.WithStatusCallback
)

Re-export CallOption functions

View Source
var (
	ErrNoAvailableProvider   = stt.ErrNoAvailableProvider
	ErrStreamingNotSupported = stt.ErrStreamingNotSupported
	ErrInvalidAudio          = stt.ErrInvalidAudio
	ErrInvalidConfig         = stt.ErrInvalidConfig
	ErrAudioTooLong          = stt.ErrAudioTooLong
	ErrAudioTooShort         = stt.ErrAudioTooShort
	ErrRateLimited           = stt.ErrRateLimited
	ErrQuotaExceeded         = stt.ErrQuotaExceeded
	ErrUnsupportedLanguage   = stt.ErrUnsupportedLanguage
	ErrUnsupportedFormat     = stt.ErrUnsupportedFormat
	ErrStreamClosed          = stt.ErrStreamClosed
)

Re-export STT errors

View Source
var (
	// DefaultSubtitleOptions returns sensible defaults for subtitle generation.
	DefaultSubtitleOptions = subtitle.DefaultOptions

	// GenerateSRT generates SRT subtitles from a transcription result.
	GenerateSRT = subtitle.GenerateSRT

	// GenerateVTT generates WebVTT subtitles from a transcription result.
	GenerateVTT = subtitle.GenerateVTT

	// SaveSRT generates and saves SRT to a file.
	SaveSRT = subtitle.SaveSRT

	// SaveVTT generates and saves WebVTT to a file.
	SaveVTT = subtitle.SaveVTT
)

Re-export subtitle functions.

View Source
var (
	ErrTTSNoAvailableProvider = tts.ErrNoAvailableProvider
	ErrVoiceNotFound          = tts.ErrVoiceNotFound
	ErrTTSInvalidConfig       = tts.ErrInvalidConfig
	ErrTTSRateLimited         = tts.ErrRateLimited
	ErrTTSQuotaExceeded       = tts.ErrQuotaExceeded
	ErrTTSStreamClosed        = tts.ErrStreamClosed
)

Re-export TTS errors

View Source
var NewSTTClient = stt.NewClient

Re-export STT functions

View Source
var NewTTSClient = tts.NewClient

Re-export TTS functions

Functions

func GetSTTProvider

func GetSTTProvider(name string, opts ...ProviderOption) (stt.Provider, error)

GetSTTProvider creates an STT provider by name with the given options.

Example:

provider, err := omnivoice.GetSTTProvider("deepgram", omnivoice.WithAPIKey(apiKey))

func GetTTSProvider

func GetTTSProvider(name string, opts ...ProviderOption) (tts.Provider, error)

GetTTSProvider creates a TTS provider by name with the given options.

Example:

provider, err := omnivoice.GetTTSProvider("elevenlabs", omnivoice.WithAPIKey(apiKey))

func HasSTTProvider

func HasSTTProvider(name string) bool

HasSTTProvider checks if an STT provider is registered.

func HasTTSProvider

func HasTTSProvider(name string) bool

HasTTSProvider checks if a TTS provider is registered.

func ListSTTProviders

func ListSTTProviders() []string

ListSTTProviders returns the names of all registered STT providers.

func ListTTSProviders

func ListTTSProviders() []string

ListTTSProviders returns the names of all registered TTS providers.

func RegisterSTTProvider

func RegisterSTTProvider(name string, factory STTProviderFactory)

RegisterSTTProvider registers an STT provider factory by name. This is typically called from provider init() functions.

func RegisterTTSProvider

func RegisterTTSProvider(name string, factory TTSProviderFactory)

RegisterTTSProvider registers a TTS provider factory by name. This is typically called from provider init() functions.

Types

type Call added in v0.6.0

type Call = callsystem.Call

Call represents a phone or video call.

type CallDirection added in v0.6.0

type CallDirection = callsystem.CallDirection

CallDirection indicates inbound or outbound call.

type CallHandler added in v0.6.0

type CallHandler = callsystem.CallHandler

CallHandler is called when a new call arrives.

type CallOption added in v0.6.0

type CallOption = callsystem.CallOption

CallOption configures an outbound call.

type CallOptions added in v0.6.0

type CallOptions = callsystem.CallOptions

CallOptions holds parsed options for MakeCall.

type CallStatus added in v0.6.0

type CallStatus = callsystem.CallStatus

CallStatus represents the call state.

type CallSystem added in v0.6.0

type CallSystem = callsystem.CallSystem

CallSystem defines the interface for telephony/meeting integrations.

type CallSystemConfig added in v0.6.0

type CallSystemConfig = callsystem.CallSystemConfig

CallSystemConfig configures a call system integration.

type ProviderConfig

type ProviderConfig struct {
	// APIKey is the authentication key for the provider.
	APIKey string //nolint:gosec // G117: This is a config struct, not storing secrets

	// BaseURL is an optional custom API endpoint.
	BaseURL string

	// Extensions holds provider-specific configuration.
	Extensions map[string]any
}

ProviderConfig holds common configuration options for creating providers.

type ProviderOption

type ProviderOption func(*ProviderConfig)

ProviderOption configures a ProviderConfig.

func WithAPIKey

func WithAPIKey(apiKey string) ProviderOption

WithAPIKey sets the API key for the provider.

func WithBaseURL

func WithBaseURL(baseURL string) ProviderOption

WithBaseURL sets a custom base URL for the provider.

func WithExtension

func WithExtension(key string, value any) ProviderOption

WithExtension sets a provider-specific configuration value.

type STTClient

type STTClient = stt.Client

STTClient is the multi-provider STT client.

type STTProvider

type STTProvider = stt.Provider

STTProvider defines the interface for STT providers.

type STTProviderFactory

type STTProviderFactory func(config ProviderConfig) (stt.Provider, error)

STTProviderFactory creates an STT provider with the given configuration.

type STTStreamingProvider

type STTStreamingProvider = stt.StreamingProvider

STTStreamingProvider extends Provider with streaming support.

type Segment

type Segment = stt.Segment

Segment represents a transcription segment.

type StreamEvent

type StreamEvent = stt.StreamEvent

StreamEvent represents a streaming transcription event.

type SubtitleFormat

type SubtitleFormat = subtitle.Format

SubtitleFormat represents the output format for subtitles.

type SubtitleOptions

type SubtitleOptions = subtitle.Options

SubtitleOptions configures subtitle generation.

type SynthesisConfig

type SynthesisConfig = tts.SynthesisConfig

SynthesisConfig configures a TTS synthesis request.

type SynthesisResult

type SynthesisResult = tts.SynthesisResult

SynthesisResult contains the result of a TTS synthesis.

type TTSClient

type TTSClient = tts.Client

TTSClient is the multi-provider TTS client.

type TTSProvider

type TTSProvider = tts.Provider

TTSProvider defines the interface for TTS providers.

type TTSProviderFactory

type TTSProviderFactory func(config ProviderConfig) (tts.Provider, error)

TTSProviderFactory creates a TTS provider with the given configuration.

type TTSStreamChunk

type TTSStreamChunk = tts.StreamChunk

StreamChunk represents a chunk of streaming audio.

type TTSStreamingProvider

type TTSStreamingProvider = tts.StreamingProvider

TTSStreamingProvider extends Provider with input streaming support.

type TranscriptionConfig

type TranscriptionConfig = stt.TranscriptionConfig

TranscriptionConfig configures a STT transcription request.

type TranscriptionResult

type TranscriptionResult = stt.TranscriptionResult

TranscriptionResult contains the result of a STT transcription.

type Voice

type Voice = tts.Voice

Voice represents a voice configuration for TTS.

type Word

type Word = stt.Word

Word represents a word with timing information.

Directories

Path Synopsis
providers
all
Package all imports and registers all omnivoice providers.
Package all imports and registers all omnivoice providers.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL