Documentation
¶
Overview ¶
Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target. It provides three things:
- A Session Manager that tracks active Voice Agent sessions, enforces per-identity and global concurrency limits, and mints HMAC-signed one-time tickets so browser clients can upgrade a WebSocket without sending Authorization headers.
- A wire protocol (see protocol.go) that carries control frames as JSON and audio frames as binary.
- An adapter that bridges the WebSocket to the Framework kernel's internal/voiceagent.Session without the kernel needing to know anything about HTTP.
This file covers the Session Manager and ticket machinery. The wire protocol, WebSocket handler, and adapter live in sibling files.
Index ¶
- Constants
- Variables
- func RenderHostInstructionUpdate(cfg LiveConfigFrame) string
- type ActivityDetectionFrame
- type Adapter
- type AdvanceStepFrame
- type CascadedAgent
- type CascadedConfig
- type CascadedDeps
- type CascadedProvider
- func (p *CascadedProvider) Close() error
- func (p *CascadedProvider) Connect(ctx context.Context, cfg LiveConfigFrame) error
- func (p *CascadedProvider) Name() string
- func (p *CascadedProvider) Receive(ctx context.Context) (*LiveMessage, error)
- func (p *CascadedProvider) SendAudio(chunk []byte) error
- func (p *CascadedProvider) SendAudioStreamEnd() error
- func (p *CascadedProvider) SendText(text string) error
- func (p *CascadedProvider) UpdateInstructions(ctx context.Context, cfg LiveConfigFrame) error
- type CascadedSTT
- type CascadedTTS
- type ErrorFrame
- type Handler
- type HandlerOptions
- type Identity
- type InterruptedFrame
- type LiveConfigFrame
- type LiveInstructionUpdater
- type LiveKitJoinInfo
- type LiveKitMediaBridgeFactory
- type LiveKitTokenIssuer
- type LiveKitTransportSupport
- type LiveMessage
- type LiveProviderAdapter
- type LiveToolResponder
- type ManagedSession
- type MediaBridge
- type MediaBridgeFactory
- type MediaBridgeRequest
- type Options
- type PersonaResolver
- type PongFrame
- type ProviderFactory
- type SequenceRunner
- func (r *SequenceRunner) Active() bool
- func (r *SequenceRunner) Advance(ctx context.Context, frame AdvanceStepFrame) (SequenceTransition, error)
- func (r *SequenceRunner) Current() LiveConfigFrame
- func (r *SequenceRunner) InitialEnteredFrame() *SequenceStepFrame
- func (r *SequenceRunner) RecordUserTurn() bool
- type SequenceStepFrame
- type SequenceTransition
- type SessionEndFrame
- type SessionManager
- func (m *SessionManager) Attach(id string) error
- func (m *SessionManager) Create(owner Identity) (*ManagedSession, string, error)
- func (m *SessionManager) Get(id string) (*ManagedSession, error)
- func (m *SessionManager) List(userID string) []*ManagedSession
- func (m *SessionManager) Remove(id string)
- func (m *SessionManager) Stats(userID string) SessionStats
- func (m *SessionManager) VerifyTicket(sessionID, ticket string) error
- type SessionStats
- type StartFrame
- type State
- type StateFrame
- type StepResolver
- type TextFrame
- type ToolCall
- type ToolCallFrame
- type ToolResponseFrame
- type TranscriptFrame
Constants ¶
const ( MsgStart = "start" MsgAudioEnd = "audio_end" MsgText = "text" MsgToolResponse = "tool_response" MsgPing = "ping" MsgStop = "stop" MsgAdvanceStep = "advance_step" )
Client-to-server message types.
const ( MediaTransportWebSocket = "websocket" MediaTransportLiveKit = "livekit" )
const ( MsgState = "state" MsgInputTranscript = "input_transcript" MsgOutputTranscript = "output_transcript" MsgToolCall = "tool_call" MsgSequenceStep = "sequence_step" MsgInterrupted = "interrupted" MsgError = "error" MsgSessionEnd = "session_end" MsgPong = "pong" )
Server-to-client message types.
Variables ¶
var ( ErrSessionNotFound = errors.New("voiceagent: session not found") ErrSessionAlreadyActive = errors.New("voiceagent: session already has an active WS connection") ErrSessionExpired = errors.New("voiceagent: session ticket expired") ErrInvalidTicket = errors.New("voiceagent: ticket signature or payload invalid") ErrGlobalLimitExceeded = errors.New("voiceagent: global session limit exceeded") ErrIdentityLimitExceeded = errors.New("voiceagent: per-identity session limit exceeded") )
Common errors reported by the session manager.
var NewAgentFlowAdapter = cascaded.NewAgentFlowAdapter
NewAgentFlowAdapter re-exports the cross-platform helper used by internal/server/core/voiceagent_wiring.go.
Functions ¶
func RenderHostInstructionUpdate ¶ added in v0.28.2
func RenderHostInstructionUpdate(cfg LiveConfigFrame) string
Types ¶
type ActivityDetectionFrame ¶
type ActivityDetectionFrame struct {
Automatic bool `json:"automatic"`
StartSensitivity string `json:"start_sensitivity,omitempty"`
EndSensitivity string `json:"end_sensitivity,omitempty"`
PrefixPaddingMs int32 `json:"prefix_padding_ms,omitempty"`
SilenceDurationMs int32 `json:"silence_duration_ms,omitempty"`
ActivityHandling string `json:"activity_handling,omitempty"`
TurnCoverage string `json:"turn_coverage,omitempty"`
}
ActivityDetectionFrame maps to voiceagent.ActivityDetectionPolicy.
type Adapter ¶
type Adapter struct {
Session *ManagedSession
Conn *websocket.Conn
Provider LiveProviderAdapter
Persona PersonaResolver
// MediaBridge starts an optional LiveKit media bridge for sessions that
// keep this WebSocket as control transport and move audio through LiveKit.
MediaBridge MediaBridgeFactory
// IdleTimeout terminates a session whose readPump and writePump have
// both been silent for the duration. Zero disables the server-side
// idle watchdog. Defaults to 15 minutes when set by the WS handler.
IdleTimeout time.Duration
// OnClose runs after both pumps have returned. Typically removes the
// session from the manager.
OnClose func()
// contains filtered or unexported fields
}
Adapter bridges a live WebSocket conversation to the Framework kernel's voiceagent session. One Adapter instance handles exactly one session, which matches the manager's concurrency model.
The adapter owns two long-lived goroutines:
readPump: WebSocket → kernel (control + audio frames from client) writePump: kernel → WebSocket (audio + transcript + tool-call frames)
When either pump errors or the provider reports session_end, the adapter closes the socket, calls OnClose, and returns from Run.
type AdvanceStepFrame ¶ added in v0.28.2
type AdvanceStepFrame struct {
Type string `json:"type"` // "advance_step"
StepID string `json:"step_id,omitempty"`
Reason string `json:"reason,omitempty"`
}
AdvanceStepFrame asks the server-side workflow runner to move from the active sequence step to the next step. StepID is reserved for future direct jumps; v1 advances linearly through the authored sequence.
type CascadedAgent ¶
Re-export the cascaded types under their historical server-package names. These aliases are intentionally type aliases (not new types) so existing code that passes app.STTRouter / app.TTSRouter etc. continues to type-check without conversions.
type CascadedConfig ¶
Re-export the cascaded types under their historical server-package names. These aliases are intentionally type aliases (not new types) so existing code that passes app.STTRouter / app.TTSRouter etc. continues to type-check without conversions.
type CascadedDeps ¶
Re-export the cascaded types under their historical server-package names. These aliases are intentionally type aliases (not new types) so existing code that passes app.STTRouter / app.TTSRouter etc. continues to type-check without conversions.
type CascadedProvider ¶
type CascadedProvider struct {
// contains filtered or unexported fields
}
CascadedProvider wraps cascaded.Provider and adapts its SessionConfig/Message types to the server's LiveConfigFrame/ LiveMessage types so the result satisfies LiveProviderAdapter.
func NewCascadedProvider ¶
func NewCascadedProvider(deps CascadedDeps) *CascadedProvider
NewCascadedProvider preserves the historical constructor name.
func (*CascadedProvider) Connect ¶
func (p *CascadedProvider) Connect(ctx context.Context, cfg LiveConfigFrame) error
Connect translates the rich server LiveConfigFrame into a minimal cascaded.SessionConfig and forwards.
func (*CascadedProvider) Name ¶
func (p *CascadedProvider) Name() string
Name returns the provider identifier.
func (*CascadedProvider) Receive ¶
func (p *CascadedProvider) Receive(ctx context.Context) (*LiveMessage, error)
Receive blocks for the next cascaded.Message and translates it into the server's LiveMessage type.
func (*CascadedProvider) SendAudio ¶
func (p *CascadedProvider) SendAudio(chunk []byte) error
SendAudio delegates.
func (*CascadedProvider) SendAudioStreamEnd ¶
func (p *CascadedProvider) SendAudioStreamEnd() error
SendAudioStreamEnd delegates.
func (*CascadedProvider) SendText ¶
func (p *CascadedProvider) SendText(text string) error
SendText delegates.
func (*CascadedProvider) UpdateInstructions ¶ added in v0.28.2
func (p *CascadedProvider) UpdateInstructions(ctx context.Context, cfg LiveConfigFrame) error
UpdateInstructions translates rich LiveConfigFrame to SessionConfig.
type CascadedSTT ¶
Re-export the cascaded types under their historical server-package names. These aliases are intentionally type aliases (not new types) so existing code that passes app.STTRouter / app.TTSRouter etc. continues to type-check without conversions.
type CascadedTTS ¶
Re-export the cascaded types under their historical server-package names. These aliases are intentionally type aliases (not new types) so existing code that passes app.STTRouter / app.TTSRouter etc. continues to type-check without conversions.
type ErrorFrame ¶
type Handler ¶
type Handler struct {
// contains filtered or unexported fields
}
Handler exposes both the HTTP session-creation endpoint and the WS upgrade endpoint under /v1/voiceagent/*.
func New ¶
func New(opts HandlerOptions) (*Handler, error)
New constructs a handler. All options except MaxAllowedClockSkew are required — the adapter cannot function without a manager, provider, and persona resolver.
func (*Handler) Mount ¶
Mount wires the voiceagent endpoints onto mux:
POST /v1/voiceagent/sessions — create session + mint ticket
GET /v1/voiceagent/sessions — list caller's active sessions
DELETE /v1/voiceagent/sessions/{id} — force close a session
GET /v1/voiceagent/sessions/{id}/ws — upgrade to WebSocket with Sec-WebSocket-Protocol: ticket.<ticket>
type HandlerOptions ¶
type HandlerOptions struct {
Manager *SessionManager
Provider ProviderFactory
Persona PersonaResolver
PublicURL string
AllowedOrigins []string
MaxAllowedClockSkew time.Duration
// IdleTimeout terminates a session that hasn't seen any activity
// (client frame OR provider message) within the duration. Zero
// disables the server-side idle watchdog. Defaults to 15 minutes
// when zero is passed; pass a negative value to disable explicitly.
IdleTimeout time.Duration
Store store.Store
LiveKit *LiveKitTokenIssuer
MediaBridge MediaBridgeFactory
// ReadLimit caps per-frame bytes the upgraded WebSocket will read.
// Zero or negative falls back to defaultWSReadLimitBytes (64 KiB).
ReadLimit int64
}
HandlerOptions configures the WebSocket handler.
type Identity ¶
Identity is the caller's resolved identity from the auth middleware. Stored on each session so closes can verify ownership.
type InterruptedFrame ¶
type InterruptedFrame struct {
Type string `json:"type"` // "interrupted"
}
type LiveConfigFrame ¶
type LiveConfigFrame struct {
PersonaID string
RoleID string
// Sequence metadata is internal runtime state derived from the
// persona/role resolver. It lets the WebSocket adapter report and advance
// workflow steps without coupling to the persona package.
SequenceID string
SequenceCompletion string
SequenceMaxTurns int
StepID string
StepIndex int
StepCount int
StepInstruction string
StepExitCriteria string
StepMaxTurns int
Model string
// FallbackModel is forwarded to providers that support same-provider
// fallback (kernel Gemini Live retries the fallback when the primary
// connect fails). Empty disables the fallback.
FallbackModel string
APIKey string
Voice string
SystemPrompt string
RefinementPrompt string
Locale string
// Raw activity-detection passthrough; adapter translates to the
// kernel's internal types.
Automatic bool
StartSensitivity string
EndSensitivity string
PrefixPaddingMs int32
SilenceDurationMs int32
ActivityHandling string
TurnCoverage string
}
LiveConfigFrame is the subset of configuration the adapter derives from a StartFrame and the persona/role resolver. Kept as a separate type so the test double doesn't need to depend on the kernel's concrete LiveConfig (which embeds Google genai types).
type LiveInstructionUpdater ¶ added in v0.28.2
type LiveInstructionUpdater interface {
UpdateInstructions(ctx context.Context, cfg LiveConfigFrame) error
}
LiveInstructionUpdater is implemented by providers that can update their active host instructions without treating the update as a user turn.
type LiveKitJoinInfo ¶ added in v0.31.0
type LiveKitMediaBridgeFactory ¶ added in v0.31.0
type LiveKitMediaBridgeFactory struct {
Issuer *LiveKitTokenIssuer
ConnectTimeout time.Duration
// contains filtered or unexported fields
}
LiveKitMediaBridgeFactory joins the existing session room as an agent participant. It subscribes to client microphone Opus tracks, decodes them into 16 kHz S16 mono PCM for the realtime provider, and publishes provider PCM output as an Opus track back into the room.
func NewLiveKitMediaBridgeFactory ¶ added in v0.31.0
func NewLiveKitMediaBridgeFactory(issuer *LiveKitTokenIssuer) *LiveKitMediaBridgeFactory
func (*LiveKitMediaBridgeFactory) Start ¶ added in v0.31.0
func (f *LiveKitMediaBridgeFactory) Start(ctx context.Context, req MediaBridgeRequest) (MediaBridge, error)
type LiveKitTokenIssuer ¶ added in v0.31.0
type LiveKitTokenIssuer struct {
URL string
APIKey string
APISecret string
TokenTTL time.Duration
RoomPrefix string
Clock func() time.Time
}
LiveKitTokenIssuer mints short-lived LiveKit join tokens for authenticated SpeechKit voice-agent sessions. It deliberately does not create rooms via the LiveKit admin API: a scoped join token is enough for LiveKit to lazily create self-hosted rooms, while keeping this path independent of Twirp.
func (*LiveKitTokenIssuer) Enabled ¶ added in v0.31.0
func (i *LiveKitTokenIssuer) Enabled() bool
func (*LiveKitTokenIssuer) IssueJoinToken ¶ added in v0.31.0
func (i *LiveKitTokenIssuer) IssueJoinToken(_ context.Context, sessionID string, owner Identity) (LiveKitJoinInfo, error)
type LiveKitTransportSupport ¶ added in v0.31.0
type LiveKitTransportSupport interface {
SupportsLiveKitTransport() bool
}
LiveKitTransportSupport marks providers whose realtime APIs consume and emit raw PCM audio without SpeechKit-side transcoding.
type LiveMessage ¶
type LiveMessage struct {
Audio []byte
OutputTranscript string
OutputTranscriptDone bool
InputTranscript string
InputTranscriptDone bool
ToolCalls []ToolCall
Interrupted bool
GoAway bool
}
LiveMessage is the subset of kernel/internal/voiceagent.LiveMessage the adapter relays to the client. Matching field names keep the translation trivial.
type LiveProviderAdapter ¶
type LiveProviderAdapter interface {
Connect(ctx context.Context, cfg LiveConfigFrame) error
SendAudio(chunk []byte) error
SendAudioStreamEnd() error
SendText(text string) error
Receive(ctx context.Context) (*LiveMessage, error)
Close() error
Name() string
}
LiveProviderAdapter is the minimal slice of the kernel's LiveProvider interface the Server-Target adapter needs. Keeping it narrow makes it trivial for tests to stub without pulling in the full genai SDK.
type LiveToolResponder ¶ added in v0.28.2
type LiveToolResponder interface {
SendToolResponse(ToolResponseFrame) error
}
LiveToolResponder is implemented by providers that accept host-side tool results from the client.
type ManagedSession ¶
type ManagedSession struct {
ID string
Owner Identity
CreatedAt time.Time
State State
HasWSClient bool // true once the client has upgraded
}
ManagedSession is the manager's record for one active or pending session. The WebSocket handler and adapter pull additional fields (conn, pumps) onto this struct at handshake time.
type MediaBridge ¶ added in v0.31.0
MediaBridge moves provider output audio to the selected media transport and delivers inbound transport audio to the provider.
type MediaBridgeFactory ¶ added in v0.31.0
type MediaBridgeFactory interface {
Start(ctx context.Context, req MediaBridgeRequest) (MediaBridge, error)
}
MediaBridgeFactory creates the media transport sidecar for a session. The WebSocket stays the control channel; a bridge only carries PCM audio.
type MediaBridgeRequest ¶ added in v0.31.0
type MediaBridgeRequest struct {
SessionID string
Owner Identity
Provider LiveProviderAdapter
}
MediaBridgeRequest is the minimum session/provider state needed to attach LiveKit media to an already accepted Voice Agent session.
type Options ¶
type Options struct {
// TicketSecret signs the one-time WS upgrade tickets. Must be at least
// 16 bytes; when empty, the manager generates a random secret at
// construction time (acceptable only for single-process deployments).
TicketSecret []byte
// TicketTTL limits how long a minted ticket remains valid. Defaults to
// 90 seconds.
TicketTTL time.Duration
// MaxGlobalSessions caps concurrent sessions across all callers.
// Defaults to 100.
MaxGlobalSessions int
// MaxPerIdentitySessions caps concurrent sessions per caller.
// Defaults to 3.
MaxPerIdentitySessions int
// Clock is a time source; nil means time.Now. Tests can override.
Clock func() time.Time
}
Options configures a SessionManager.
type PersonaResolver ¶
type PersonaResolver interface {
Resolve(StartFrame) (LiveConfigFrame, error)
}
PersonaResolver derives a LiveConfigFrame from a StartFrame. The server's persona registry implements this in M5; for M4 a stub resolver is used that echoes the StartFrame through with sensible defaults.
type ProviderFactory ¶
type ProviderFactory interface {
NewProvider() LiveProviderAdapter
}
ProviderFactory builds a Framework kernel voice-agent provider on demand. Each WebSocket session gets its own provider instance so concurrent sessions don't share Gemini Live state.
The concrete production implementation returns a *voiceagent.GeminiLive; tests supply a fake that records frames and replies with canned audio.
type SequenceRunner ¶ added in v0.28.2
type SequenceRunner struct {
// contains filtered or unexported fields
}
SequenceRunner owns per-session workflow state for one Voice Agent WebSocket. It is small on purpose: persona resolution remains the source of truth for prompts and defaults.
func NewSequenceRunner ¶ added in v0.28.2
func NewSequenceRunner(start StartFrame, initial LiveConfigFrame, resolver StepResolver) *SequenceRunner
func (*SequenceRunner) Active ¶ added in v0.28.2
func (r *SequenceRunner) Active() bool
func (*SequenceRunner) Advance ¶ added in v0.28.2
func (r *SequenceRunner) Advance(ctx context.Context, frame AdvanceStepFrame) (SequenceTransition, error)
func (*SequenceRunner) Current ¶ added in v0.28.2
func (r *SequenceRunner) Current() LiveConfigFrame
func (*SequenceRunner) InitialEnteredFrame ¶ added in v0.28.2
func (r *SequenceRunner) InitialEnteredFrame() *SequenceStepFrame
func (*SequenceRunner) RecordUserTurn ¶ added in v0.28.2
func (r *SequenceRunner) RecordUserTurn() bool
type SequenceStepFrame ¶
type SequenceStepFrame struct {
Type string `json:"type"` // "sequence_step"
SequenceID string `json:"sequence_id,omitempty"`
StepID string `json:"step_id"`
StepIndex int `json:"step_index,omitempty"`
Status string `json:"status"` // "entered" | "completed" | "sequence_completed"
Reason string `json:"reason,omitempty"`
}
type SequenceTransition ¶ added in v0.28.2
type SequenceTransition struct {
Completed *SequenceStepFrame
Entered *SequenceStepFrame
NextConfig LiveConfigFrame
SequenceCompleted bool
}
SequenceTransition describes the frames and provider update produced by an advance operation.
type SessionEndFrame ¶
type SessionManager ¶
type SessionManager struct {
// contains filtered or unexported fields
}
SessionManager tracks active Voice Agent sessions.
func NewSessionManager ¶
func NewSessionManager(opts Options) (*SessionManager, error)
NewSessionManager constructs a manager with opts. Unset fields receive sensible defaults.
func (*SessionManager) Attach ¶
func (m *SessionManager) Attach(id string) error
Attach moves a session from pending to active. Returns an error when the session is already attached to another WS client or has been closed.
func (*SessionManager) Create ¶
func (m *SessionManager) Create(owner Identity) (*ManagedSession, string, error)
Create registers a new pending session for the given caller and returns the session record plus a one-time WS upgrade ticket. Fails with a limit error when concurrency caps are exceeded.
func (*SessionManager) Get ¶
func (m *SessionManager) Get(id string) (*ManagedSession, error)
Get returns the session with the given ID, or ErrSessionNotFound.
func (*SessionManager) List ¶
func (m *SessionManager) List(userID string) []*ManagedSession
List returns a snapshot of sessions owned by the given user (or all when userID is empty — reserved for admin callers).
func (*SessionManager) Remove ¶
func (m *SessionManager) Remove(id string)
Remove closes and removes the session. Safe to call multiple times.
func (*SessionManager) Stats ¶ added in v0.40.6
func (m *SessionManager) Stats(userID string) SessionStats
Stats returns a point-in-time session and capacity snapshot.
func (*SessionManager) VerifyTicket ¶
func (m *SessionManager) VerifyTicket(sessionID, ticket string) error
VerifyTicket validates a ticket string against its expected session ID. Returns nil when the ticket is cryptographically valid, un-expired, and bound to this sessionID. Mutates nothing; callers must follow up with Attach to mark the session active.
type SessionStats ¶ added in v0.40.6
type SessionStats struct {
TotalSessions int `json:"total_sessions"`
ActiveSessions int `json:"active_sessions"`
PendingSessions int `json:"pending_sessions"`
IdentitySessions int `json:"identity_sessions,omitempty"`
MaxGlobalSessions int `json:"max_global_sessions"`
MaxPerIdentitySessions int `json:"max_per_identity_sessions"`
}
SessionStats reports current session occupancy and configured capacity.
type StartFrame ¶
type StartFrame struct {
Type string `json:"type"` // must be "start"
PersonaID string `json:"persona_id,omitempty"`
RoleID string `json:"role_id,omitempty"`
SequenceID string `json:"sequence_id,omitempty"`
// MediaTransport selects where microphone and model audio move. Empty
// defaults to "websocket" for existing clients. "livekit" keeps this
// WebSocket as the control channel and moves audio through LiveKit tracks.
MediaTransport string `json:"media_transport,omitempty"`
Voice string `json:"voice,omitempty"`
Locale string `json:"locale,omitempty"`
Model string `json:"model,omitempty"`
Thinking string `json:"thinking,omitempty"` // "off" | "low" | "medium" | "high"
// Raw activity-detection policy override. Pipeline translates these to
// the kernel's internal enums.
ActivityDetection *ActivityDetectionFrame `json:"activity_detection,omitempty"`
// Optional durable instruction layered on top of the role's prompt.
SystemPromptOverride string `json:"system_prompt_override,omitempty"`
}
StartFrame is the mandatory first client frame. Fields marked optional fall back to persona/role defaults when omitted.
type State ¶
type State string
State captures the session's manager-level lifecycle. The Framework kernel (internal/voiceagent.Session) has its own finer-grained state; this one only distinguishes pending ticket vs. active connection.
type StateFrame ¶
type StepResolver ¶ added in v0.28.2
type StepResolver interface {
ResolveStep(StartFrame, int) (LiveConfigFrame, error)
}
StepResolver is the optional extension implemented by persona resolvers that can compose a LiveConfigFrame for a specific sequence step.
type ToolCallFrame ¶
type ToolResponseFrame ¶
type ToolResponseFrame struct {
Type string `json:"type"` // "tool_response"
ID string `json:"id"`
Name string `json:"name"`
Response map[string]any `json:"response"`
}
ToolResponseFrame resolves a tool call issued by the server.