realtime

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrAgentClosed = errors.New("realtime: agent already closed")

ErrAgentClosed is returned by PushAudio / Cancel after Close has run.

Functions

func Register

func Register(provider Provider, slugs ...string)

Register installs a Provider under one or more provider slugs. Slugs are normalised to lowercase. Re-registering a slug overwrites — useful for tests; production code should call Register exactly once per provider in init() of its sub-package.

func RegisteredProviders

func RegisteredProviders() []string

RegisteredProviders returns the sorted list of registered slugs (handy for /api/tenant/voice/providers introspection).

func ToolsForSession

func ToolsForSession(tools []Tool) []map[string]any

ToolsForSession builds the `tools` array for DashScope session.update.

Types

type Agent

type Agent interface {
	// Start opens the underlying transport (WebSocket for current vendors)
	// and configures the session. Returns once the session is ready to
	// receive audio (or with an error before EventSessionOpen fires).
	Start(ctx context.Context) error
	// PushAudio appends one PCM16LE chunk at Options.InputSampleRate.
	// Implementations base64-encode internally as required by the wire
	// protocol. Safe to call from a single goroutine.
	PushAudio(pcm []byte) error
	// CommitInputAudio signals end-of-utterance manually. No-op when
	// server VAD is enabled (the default).
	CommitInputAudio() error
	// Cancel asks the model to stop the current reply (barge-in). The
	// EventAssistantTurnEnd will still fire so callers can drain state
	// machines uniformly.
	Cancel() error
	// Close tears the session down. Idempotent.
	Close() error
	// UpdateInstructions patches session instructions mid-call (e.g. transfer
	// confirm counter). No-op or error if the session is not open.
	UpdateInstructions(instructions string) error
}

Agent is the provider-agnostic full-duplex realtime client. Implementations are NOT required to be safe for concurrent calls into PushAudio from multiple goroutines — SIP attach uses a single audio-feed goroutine. Cancel / Close are safe to call from any goroutine at any time.

func NewAgentFromCredential

func NewAgentFromCredential(cfg map[string]any, opts Options) (Agent, error)

NewAgentFromCredential resolves a Provider by `cfg["provider"]` and constructs an Agent. Returns ErrUnknownProvider when the slug is missing or unregistered.

type ErrUnknownProvider

type ErrUnknownProvider struct {
	Provider string
}

ErrUnknownProvider is returned by NewAgentFromCredential when the "provider" field doesn't match any Register'ed slug.

func (*ErrUnknownProvider) Error

func (e *ErrUnknownProvider) Error() string

type Event

type Event struct {
	Type    EventType
	Text    string
	Final   bool
	AudioPC []byte // PCM16LE mono @ Options.OutputSampleRate
	Err     error
	Fatal   bool
	// Vendor is the provider slug ("aliyun_omni", …) for logging.
	Vendor string
	// Raw carries the unparsed event JSON for diagnostic logs (optional;
	// implementations may leave nil to avoid the allocation).
	Raw []byte
}

Event is the union type passed to Options.OnEvent. Only the fields relevant to the EventType are populated; the rest are zero-value.

type EventType

type EventType string

EventType enumerates the high-level events surfaced to callers. Provider implementations are responsible for translating their wire protocol into this enum so SIP integration code is provider-agnostic.

const (
	// EventSessionOpen fires once after a successful WS handshake +
	// session.update ack. Used by SIP to mark "ready to forward audio".
	EventSessionOpen EventType = "session.open"
	// EventSessionClose fires once when the WS closes for any reason.
	EventSessionClose EventType = "session.close"

	// EventUserTranscript carries the caller's ASR transcript. `Final=true`
	// means the model finished hearing this utterance.
	EventUserTranscript EventType = "user.transcript"
	// EventUserSpeechStarted fires when the server-side VAD detects the
	// caller began speaking. SIP must immediately stop forwarding any
	// in-flight AI PCM to the call leg (barge-in).
	EventUserSpeechStarted EventType = "user.speech.started"
	// EventUserSpeechEnded fires when the server-side VAD detects the
	// caller stopped speaking.
	EventUserSpeechEnded EventType = "user.speech.ended"

	// EventAssistantText carries an AI text fragment. `Final=true` ends
	// the current assistant response. Used by SIP for hangup-phrase /
	// transfer keyword detection when Tools are not configured.
	EventAssistantText EventType = "assistant.text"
	// EventAssistantAudio carries a chunk of AI audio at OutputSampleRate.
	EventAssistantAudio EventType = "assistant.audio"
	// EventAssistantTurnEnd fires when the model finishes one full reply.
	EventAssistantTurnEnd EventType = "assistant.turn.end"

	// EventError surfaces a server-reported error. `Fatal=true` means the
	// session is unrecoverable and SIP should fall back to playback.
	EventError EventType = "error"
)

type Options

type Options struct {
	// SystemPrompt is sent as the model's `instructions` (Qwen-Omni) or
	// `instructions` field on session.update (GPT-4o realtime).
	SystemPrompt string
	// Voice selects the AI voice. Provider-specific values (Ethan / Cherry
	// for Qwen, alloy / nova for GPT-4o).
	Voice string
	// InputSampleRate is the sample rate of PCM frames passed to PushAudio.
	// Defaults to 16000 if 0.
	InputSampleRate int
	// OutputSampleRate is the rate the provider will emit audio at.
	// Defaults to 24000 if 0.
	OutputSampleRate int
	// OnEvent receives all session events. Required.
	OnEvent func(Event)
	// EnableServerVAD opts in to server-side voice activity detection.
	// When false, caller is responsible for sending response.create /
	// equivalent boundary signals (advanced; not used by SIP attach
	// today). Defaults to true.
	DisableServerVAD bool
	// Modalities selects which output streams the model emits. Empty =
	// vendor default (audio + text). The realtime SIP path always wants
	// audio, but text-only is useful for dry-run tests.
	Modalities []string
	// Temperature controls sampling randomness on the model side. 0
	// means "use vendor default" (caller didn't set it). Provider
	// implementations clamp to their supported range. Lower values
	// produce more deterministic / consistent replies which is what
	// telephony deployments usually want (script adherence > variety).
	Temperature float64
	// Tools are sent in session.update (Qwen3.5-Omni-Realtime Function Calling).
	Tools []Tool
	// ToolHandler runs when the model requests a tool (response.done batch).
	// Return value is sent back as function_call_output. Nil skips tool execution.
	ToolHandler ToolHandler
}

Options configure a single realtime session. The caller-facing PCM contract is fixed at PCM16LE mono; rates may differ per provider but 16 kHz in / 24 kHz out is the de-facto standard (Qwen-Omni, GPT-4o).

type Provider

type Provider func(cfg map[string]any, opts Options) (Agent, error)

Provider is a credential-driven factory. cfg is the parsed JSON the tenant control plane stored (must include "provider"). Each provider's expected fields are documented next to its registration call.

func Lookup

func Lookup(slug string) Provider

Lookup returns the Provider registered under slug, or nil.

type Tool

type Tool struct {
	Name        string
	Description string
	Parameters  json.RawMessage
}

Tool is an OpenAI-style function tool definition for session.update (Qwen-Omni-Realtime).

type ToolHandler

type ToolHandler func(name string, args map[string]any) string

ToolHandler executes a tool on the WS read path. Must return quickly; heavy work (dial transfer, DB) should be delegated to another goroutine by the caller.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL