realtime

package

v0.14.0 Latest Latest Go to latest Published: Jun 15, 2026 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/plexusone/omnivoice-core

Links

Open Source Insights

Documentation ¶

Overview ¶

Package realtime provides a unified interface for real-time voice-to-voice providers.

Real-time providers enable native voice-to-voice conversations with ~100-300ms latency by handling audio input and output directly, without separate STT/TTS steps.

Supported Providers ¶

The following providers implement the Provider interface:

OpenAI Realtime API (github.com/plexusone/omni-openai/omnivoice/realtime)
Gemini Live API (github.com/plexusone/omni-google/omnivoice)

Audio Format ¶

Input audio should be PCM16 (signed 16-bit little-endian) at the provider's expected sample rate:

OpenAI Realtime: 24kHz mono
Gemini Live: 16kHz mono (input), 24kHz mono (output)

Output audio is PCM16 24kHz mono for both providers.

Usage ¶

provider := openairealtime.NewProvider(apiKey,
    openairealtime.WithVoice("alloy"),
    openairealtime.WithInstructions("You are a helpful assistant."),
)

audioIn := make(chan []byte, 100)
audioCh, transcriptCh, err := provider.ProcessAudioStream(ctx, audioIn, realtime.ProcessConfig{
    OnFunctionCall: func(id, name, args string) (any, error) {
        return handleFunction(name, args)
    },
})

// Send audio from microphone
go func() {
    for chunk := range microphoneAudio {
        audioIn <- chunk
    }
    close(audioIn)
}()

// Receive audio and transcripts
for {
    select {
    case audio, ok := <-audioCh:
        if !ok {
            return
        }
        playAudio(audio.Audio)
    case transcript := <-transcriptCh:
        log.Printf("[%s] %s", transcript.Role(), transcript.Text)
    }
}

Integration with Telephony ¶

Real-time providers integrate with telephony gateways (Twilio, Telnyx, Plivo) by connecting the gateway's audio streams to the provider:

gateway.OnCall(func(session gateway.Session) {
    audioIn := make(chan []byte, 100)

    // Forward gateway audio to provider
    go func() {
        for chunk := range session.AudioIn() {
            // Convert mulaw 8kHz to PCM16 24kHz
            pcm := codec.MulawToPCM16(chunk)
            resampled := resample(pcm, 8000, 24000)
            audioIn <- resampled
        }
        close(audioIn)
    }()

    audioCh, _, _ := provider.ProcessAudioStream(ctx, audioIn, config)

    // Forward provider audio to gateway
    for audio := range audioCh {
        // Convert PCM16 24kHz to mulaw 8kHz
        resampled := resample(audio.Audio, 24000, 8000)
        mulaw := codec.PCM16ToMulaw(resampled)
        session.SendAudio(mulaw)
    }
})

Index ¶

Variables
type AudioChunk
type Client
- func NewClient(providers ...Provider) *Client
- func (c *Client) Close() error
- func (c *Client) ProcessAudioStream(ctx context.Context, audioIn <-chan []byte, config ProcessConfig) (<-chan AudioChunk, <-chan Transcript, error)
type FunctionDeclaration
type ProcessConfig
type Provider
type Transcript
- func (t Transcript) Role() string

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ErrSessionClosed is returned when operating on a closed session.
	ErrSessionClosed = errors.New("session closed")

	// ErrConnectionFailed is returned when the WebSocket connection fails.
	ErrConnectionFailed = errors.New("connection failed")

	// ErrAuthenticationFailed is returned when API authentication fails.
	ErrAuthenticationFailed = errors.New("authentication failed")

	// ErrRateLimited is returned when the provider rate limits the request.
	ErrRateLimited = errors.New("rate limited")

	// ErrInvalidConfig is returned when the configuration is invalid.
	ErrInvalidConfig = errors.New("invalid configuration")

	// ErrProviderUnavailable is returned when the provider service is unavailable.
	ErrProviderUnavailable = errors.New("provider unavailable")

	// ErrContextCancelled is returned when the context is cancelled.
	ErrContextCancelled = errors.New("context cancelled")
)

Common errors returned by real-time providers.

Functions ¶

This section is empty.

Types ¶

type AudioChunk ¶

type AudioChunk struct {
	// Audio is the raw audio data.
	// Format is PCM16 (signed 16-bit little-endian) at 24kHz mono.
	Audio []byte

	// IsFinal indicates this is the last chunk for the current turn.
	// Use this to know when the model has finished speaking.
	IsFinal bool
}

AudioChunk represents a chunk of audio data from the model.

type Client ¶

type Client struct {
	*provider.Client[Provider]
}

Client provides a unified interface across multiple real-time providers. It supports provider selection and fallback.

func NewClient ¶

func NewClient(providers ...Provider) *Client

NewClient creates a new real-time client with the specified providers. The first provider is set as the primary.

func (*Client) Close ¶

func (c *Client) Close() error

Close closes all providers.

func (*Client) ProcessAudioStream ¶

func (c *Client) ProcessAudioStream(ctx context.Context, audioIn <-chan []byte, config ProcessConfig) (<-chan AudioChunk, <-chan Transcript, error)

ProcessAudioStream uses the primary provider with fallback on connection errors.

type FunctionDeclaration ¶

type FunctionDeclaration struct {
	// Name is the function name.
	Name string `json:"name"`

	// Description explains what the function does.
	Description string `json:"description"`

	// Parameters is a JSON Schema describing the function parameters.
	// Use json.RawMessage for flexibility across providers.
	Parameters json.RawMessage `json:"parameters,omitempty"`
}

FunctionDeclaration describes a function the model can call.

type ProcessConfig ¶

type ProcessConfig struct {
	// Instructions is the system prompt for the conversation.
	Instructions string

	// Voice is the voice identifier for audio output.
	// Provider-specific (e.g., "alloy", "Puck").
	Voice string

	// Functions are functions the model can call during the conversation.
	Functions []FunctionDeclaration

	// OnFunctionCall is called when the model invokes a function.
	// The handler should execute the function and return the result.
	//
	// Parameters:
	//   - id: unique identifier for this function call
	//   - name: function name being called
	//   - args: JSON-encoded function arguments
	//
	// Returns:
	//   - result: any JSON-serializable value to return to the model
	//   - error: if non-nil, sent as error response to the model
	OnFunctionCall func(id, name, args string) (result any, err error)

	// Temperature controls response randomness (0.0 to 2.0).
	// Default varies by provider.
	Temperature float64

	// Extensions holds provider-specific settings.
	// Keys should be namespaced by provider (e.g., "openai.turn_detection").
	Extensions map[string]any
}

ProcessConfig configures a real-time audio processing session.

type Provider ¶

type Provider interface {
	// ProcessAudioStream starts a real-time voice session.
	//
	// audioIn receives raw audio chunks from the user (microphone, telephony).
	// The audio format depends on the provider (typically PCM16 16-24kHz mono).
	//
	// Returns two channels:
	//   - audioCh: audio chunks from the model (PCM16 24kHz mono)
	//   - transcriptCh: transcript updates (both user input and model output)
	//
	// Both channels are closed when the session ends (context cancelled,
	// audioIn closed, or error).
	ProcessAudioStream(ctx context.Context, audioIn <-chan []byte, config ProcessConfig) (
		audioCh <-chan AudioChunk,
		transcriptCh <-chan Transcript,
		err error,
	)

	// Name returns the provider name (e.g., "openai-realtime", "gemini-live").
	Name() string

	// Close releases any resources held by the provider.
	Close() error
}

Provider defines the interface for real-time voice-to-voice providers.

Real-time providers handle bidirectional audio streaming, enabling native voice conversations with low latency (~100-300ms). Unlike traditional STT+LLM+TTS pipelines, real-time providers process audio directly.

type Transcript ¶

type Transcript struct {
	// Text is the transcript text.
	Text string

	// IsFinal indicates this is a final (non-interim) transcript.
	// Interim transcripts may be revised; final transcripts are stable.
	IsFinal bool

	// IsInput indicates this is user input transcription.
	// If false, this is model output transcription.
	IsInput bool

	// ItemID is a provider-specific identifier for this transcript item.
	// Can be used to correlate with audio chunks.
	ItemID string
}

Transcript represents a transcript update during the conversation.

func (Transcript) Role ¶

func (t Transcript) Role() string

Role returns the role associated with this transcript.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL