Documentation
¶
Overview ¶
Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.
Index ¶
- type ActivityDetectionPolicy
- type ActivityHandling
- type Callbacks
- type ContextCompressionPolicy
- type EndSensitivity
- type GeminiLive
- func (g *GeminiLive) Close() error
- func (g *GeminiLive) Connect(ctx context.Context, cfg LiveConfig) error
- func (g *GeminiLive) Name() string
- func (g *GeminiLive) Receive(ctx context.Context) (*LiveMessage, error)
- func (g *GeminiLive) Reconnect(ctx context.Context) error
- func (g *GeminiLive) SendAudio(chunk []byte) error
- func (g *GeminiLive) SendText(text string) error
- func (g *GeminiLive) SendToolResponse(response ToolResponse) error
- type IdleConfig
- type IdleTimer
- type LiveConfig
- type LiveMessage
- type LivePolicies
- type LiveProvider
- type LiveReconnector
- type Session
- func (s *Session) CurrentState() State
- func (s *Session) SendAudio(chunk []byte) error
- func (s *Session) SendText(text string) error
- func (s *Session) SendToolResponse(response ToolResponse) error
- func (s *Session) Start(ctx context.Context, cfg LiveConfig, idleCfg IdleConfig) error
- func (s *Session) Stop()
- type StartSensitivity
- type State
- type ThinkingLevel
- type ThinkingPolicy
- type ToolBehavior
- type ToolCall
- type ToolDefinition
- type ToolResponse
- type ToolResponseScheduling
- type TurnCoverage
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ActivityDetectionPolicy ¶ added in v0.18.0
type ActivityDetectionPolicy struct {
Automatic bool
StartSensitivity StartSensitivity
EndSensitivity EndSensitivity
PrefixPaddingMs int32
SilenceDurationMs int32
ActivityHandling ActivityHandling
TurnCoverage TurnCoverage
}
ActivityDetectionPolicy defines server-side VAD/session turn behavior.
type ActivityHandling ¶ added in v0.18.0
type ActivityHandling string
ActivityHandling controls what Gemini Live should do when new activity starts.
const ( ActivityHandlingUnspecified ActivityHandling = "" ActivityHandlingNoInterrupt ActivityHandling = "no_interrupt" ActivityHandlingStartOfActivityInterrupts ActivityHandling = "start_of_activity_interrupts" )
type Callbacks ¶
type Callbacks struct {
OnStateChange func(state State)
OnAudio func(audio []byte) // Audio chunk to play
OnText func(text string) // Text for display (speech bubble)
OnError func(err error)
OnInputTranscript func(text string, done bool) // User speech transcribed
OnOutputTranscript func(text string, done bool) // Model speech transcribed
OnToolCall func(call ToolCall)
OnToolCallCancellation func(ids []string)
OnInterrupted func() // User interrupted model (barge-in)
OnSessionEnd func() // Session ended (error, GoAway failure, or deactivation)
}
Callbacks are event handlers for UI integration.
type ContextCompressionPolicy ¶ added in v0.18.0
ContextCompressionPolicy defines how the live API should compress long sessions.
type EndSensitivity ¶ added in v0.18.0
type EndSensitivity string
EndSensitivity controls how aggressively automatic activity detection commits speech end.
const ( EndSensitivityLow EndSensitivity = "low" EndSensitivityMedium EndSensitivity = "medium" EndSensitivityHigh EndSensitivity = "high" )
type GeminiLive ¶
type GeminiLive struct {
// contains filtered or unexported fields
}
GeminiLive implements LiveProvider using the Google GenAI Live API.
func (*GeminiLive) Close ¶
func (g *GeminiLive) Close() error
func (*GeminiLive) Connect ¶
func (g *GeminiLive) Connect(ctx context.Context, cfg LiveConfig) error
func (*GeminiLive) Name ¶
func (g *GeminiLive) Name() string
func (*GeminiLive) Receive ¶
func (g *GeminiLive) Receive(ctx context.Context) (*LiveMessage, error)
func (*GeminiLive) Reconnect ¶ added in v0.18.0
func (g *GeminiLive) Reconnect(ctx context.Context) error
Reconnect re-establishes the session using the stored resumption handle. If the handle has expired (TTL) or been cleared, a fresh session is opened.
func (*GeminiLive) SendAudio ¶
func (g *GeminiLive) SendAudio(chunk []byte) error
func (*GeminiLive) SendText ¶
func (g *GeminiLive) SendText(text string) error
func (*GeminiLive) SendToolResponse ¶ added in v0.18.0
func (g *GeminiLive) SendToolResponse(response ToolResponse) error
type IdleConfig ¶
type IdleConfig struct {
ReminderAfter time.Duration // Default: 5 minutes
DeactivateAfter time.Duration // Default: 15 minutes
}
IdleConfig configures the idle timer behavior.
func DefaultIdleConfig ¶
func DefaultIdleConfig() IdleConfig
DefaultIdleConfig returns sensible defaults.
type IdleTimer ¶
type IdleTimer struct {
// contains filtered or unexported fields
}
IdleTimer manages reminder and auto-deactivation for Voice Agent.
func NewIdleTimer ¶
func NewIdleTimer(cfg IdleConfig, session *Session) *IdleTimer
NewIdleTimer creates an idle timer bound to a session.
type LiveConfig ¶
type LiveConfig struct {
Model string // e.g. "gemini-2.5-flash-native-audio-preview-12-2025"
APIKey string
Voice string // Voice name
FrameworkPrompt string
RefinementPrompt string
Instruction string // Deprecated: kept as compat alias for FrameworkPrompt.
SystemPrompt string // Deprecated: kept as compat alias, prefer FrameworkPrompt.
VocabularyHint string
Locale string
Policies LivePolicies
Tools []ToolDefinition
}
LiveConfig configures a real-time session.
type LiveMessage ¶
type LiveMessage struct {
Audio []byte // PCM audio chunk (24kHz 16-bit mono)
Text string // Text transcript (may be partial or empty)
Done bool // True when the model's turn is complete
// Transcription fields (populated when transcription is enabled).
InputTranscript string // User speech transcribed by server
InputTranscriptDone bool // True when input transcription segment is final
OutputTranscript string // Model speech transcribed by server
OutputTranscriptDone bool // True when output transcription segment is final
ToolCalls []ToolCall
ToolCallCancellationIDs []string
Interrupted bool // True when user interrupted model (barge-in)
GoAway bool // True when server signals imminent session end
}
LiveMessage is a message received from the real-time model.
type LivePolicies ¶ added in v0.18.0
type LivePolicies struct {
EnableInputAudioTranscription bool
EnableOutputAudioTranscription bool
EnableAffectiveDialog bool
Thinking ThinkingPolicy
ContextCompression ContextCompressionPolicy
ActivityDetection ActivityDetectionPolicy
}
LivePolicies configures Google Live API features that shape Voice Agent behavior.
type LiveProvider ¶
type LiveProvider interface {
// Connect establishes a WebSocket session to the real-time model.
Connect(ctx context.Context, cfg LiveConfig) error
// SendAudio streams PCM audio chunks to the model.
// Format: 16-bit signed int, little-endian, mono, 16kHz.
SendAudio(chunk []byte) error
// Receive blocks until the next server message arrives.
// Returns audio chunks and/or text from the model.
Receive(ctx context.Context) (*LiveMessage, error)
// SendText injects a text prompt into the session (for idle reminders).
SendText(text string) error
// SendToolResponse sends the result of a host-side tool invocation back to the model.
SendToolResponse(response ToolResponse) error
// Close terminates the WebSocket session.
Close() error
// Name returns the provider identifier.
Name() string
}
LiveProvider abstracts a real-time audio-to-audio model connection.
type LiveReconnector ¶ added in v0.18.0
LiveReconnector is an optional interface for providers that support session reconnection.
type Session ¶
type Session struct {
// contains filtered or unexported fields
}
Session manages a Voice Agent conversation.
func NewSession ¶
func NewSession(provider LiveProvider, callbacks Callbacks) *Session
NewSession creates a Voice Agent session with the given provider.
func (*Session) CurrentState ¶
State returns the current session state.
func (*Session) SendText ¶ added in v0.19.0
SendText injects a user text turn into the live session.
func (*Session) SendToolResponse ¶ added in v0.18.0
func (s *Session) SendToolResponse(response ToolResponse) error
SendToolResponse forwards the result of a host-side tool invocation to the model.
func (*Session) Start ¶
func (s *Session) Start(ctx context.Context, cfg LiveConfig, idleCfg IdleConfig) error
Start activates the Voice Agent session.
type StartSensitivity ¶ added in v0.18.0
type StartSensitivity string
StartSensitivity controls how aggressively automatic activity detection commits speech start.
const ( StartSensitivityLow StartSensitivity = "low" StartSensitivityMedium StartSensitivity = "medium" StartSensitivityHigh StartSensitivity = "high" )
type ThinkingLevel ¶ added in v0.18.0
type ThinkingLevel string
ThinkingLevel controls how much deliberate reasoning Gemini Live should spend.
const ( ThinkingLevelOff ThinkingLevel = "off" ThinkingLevelLow ThinkingLevel = "low" ThinkingLevelMedium ThinkingLevel = "medium" ThinkingLevelHigh ThinkingLevel = "high" )
type ThinkingPolicy ¶ added in v0.18.0
type ThinkingPolicy struct {
Enabled bool
IncludeThoughts bool
ThinkingBudget int32
ThinkingLevel ThinkingLevel
}
ThinkingPolicy defines optional Gemini Live thinking behavior.
type ToolBehavior ¶ added in v0.18.0
type ToolBehavior string
ToolBehavior controls whether the model waits for a tool result.
const ( ToolBehaviorUnspecified ToolBehavior = "" ToolBehaviorBlocking ToolBehavior = "blocking" ToolBehaviorNonBlocking ToolBehavior = "non_blocking" )
type ToolCall ¶ added in v0.18.0
ToolCall is a host-side action request emitted by the Voice Agent runtime.
type ToolDefinition ¶ added in v0.18.0
type ToolDefinition struct {
Name string
Description string
ParametersJSONSchema map[string]any
ResponseJSONSchema map[string]any
Behavior ToolBehavior
}
ToolDefinition exposes a host-side action the Voice Agent may call.
type ToolResponse ¶ added in v0.18.0
type ToolResponse struct {
ID string
Name string
Response map[string]any
Scheduling ToolResponseScheduling
WillContinue *bool
}
ToolResponse resolves a previously emitted tool call.
type ToolResponseScheduling ¶ added in v0.18.0
type ToolResponseScheduling string
ToolResponseScheduling controls how a non-blocking tool result is reintroduced into the conversation.
const ( ToolResponseSchedulingUnspecified ToolResponseScheduling = "" ToolResponseSchedulingSilent ToolResponseScheduling = "silent" ToolResponseSchedulingWhenIdle ToolResponseScheduling = "when_idle" ToolResponseSchedulingInterrupt ToolResponseScheduling = "interrupt" )
type TurnCoverage ¶ added in v0.18.0
type TurnCoverage string
TurnCoverage controls how the live API builds a user turn from incoming activity.
const ( TurnCoverageUnspecified TurnCoverage = "" TurnCoverageTurnIncludesOnlyActivity TurnCoverage = "turn_includes_only_activity" TurnCoverageTurnIncludesAllInput TurnCoverage = "turn_includes_all_input" TurnCoverageTurnIncludesAudioActivity TurnCoverage = "turn_includes_audio_activity" )