voiceagent

package
v0.22.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ActivityDetectionPolicy added in v0.18.0

type ActivityDetectionPolicy struct {
	Automatic         bool
	StartSensitivity  StartSensitivity
	EndSensitivity    EndSensitivity
	PrefixPaddingMs   int32
	SilenceDurationMs int32
	ActivityHandling  ActivityHandling
	TurnCoverage      TurnCoverage
}

ActivityDetectionPolicy defines server-side VAD/session turn behavior.

type ActivityHandling added in v0.18.0

type ActivityHandling string

ActivityHandling controls what Gemini Live should do when new activity starts.

const (
	ActivityHandlingUnspecified               ActivityHandling = ""
	ActivityHandlingNoInterrupt               ActivityHandling = "no_interrupt"
	ActivityHandlingStartOfActivityInterrupts ActivityHandling = "start_of_activity_interrupts"
)

type Callbacks

type Callbacks struct {
	OnStateChange          func(state State)
	OnAudio                func(audio []byte) // Audio chunk to play
	OnText                 func(text string)  // Text for display (speech bubble)
	OnError                func(err error)
	OnInputTranscript      func(text string, done bool) // User speech transcribed
	OnOutputTranscript     func(text string, done bool) // Model speech transcribed
	OnToolCall             func(call ToolCall)
	OnToolCallCancellation func(ids []string)
	OnInterrupted          func() // User interrupted model (barge-in)
	OnSessionEnd           func() // Session ended (error, GoAway failure, or deactivation)
}

Callbacks are event handlers for UI integration.

type ContextCompressionPolicy added in v0.18.0

type ContextCompressionPolicy struct {
	Enabled       bool
	TriggerTokens int64
	TargetTokens  int64
}

ContextCompressionPolicy defines how the live API should compress long sessions.

type EndSensitivity added in v0.18.0

type EndSensitivity string

EndSensitivity controls how aggressively automatic activity detection commits speech end.

const (
	EndSensitivityLow    EndSensitivity = "low"
	EndSensitivityMedium EndSensitivity = "medium"
	EndSensitivityHigh   EndSensitivity = "high"
)

type GeminiLive

type GeminiLive struct {
	// contains filtered or unexported fields
}

GeminiLive implements LiveProvider using the Google GenAI Live API.

func NewGeminiLive

func NewGeminiLive() *GeminiLive

NewGeminiLive creates a Gemini Live provider.

func (*GeminiLive) Close

func (g *GeminiLive) Close() error

func (*GeminiLive) Connect

func (g *GeminiLive) Connect(ctx context.Context, cfg LiveConfig) error

func (*GeminiLive) Name

func (g *GeminiLive) Name() string

func (*GeminiLive) Receive

func (g *GeminiLive) Receive(ctx context.Context) (*LiveMessage, error)

func (*GeminiLive) Reconnect added in v0.18.0

func (g *GeminiLive) Reconnect(ctx context.Context) error

Reconnect re-establishes the session using the stored resumption handle. If the handle has expired (TTL) or been cleared, a fresh session is opened.

func (*GeminiLive) SendAudio

func (g *GeminiLive) SendAudio(chunk []byte) error

func (*GeminiLive) SendAudioStreamEnd added in v0.22.4

func (g *GeminiLive) SendAudioStreamEnd() error

func (*GeminiLive) SendText

func (g *GeminiLive) SendText(text string) error

func (*GeminiLive) SendToolResponse added in v0.18.0

func (g *GeminiLive) SendToolResponse(response ToolResponse) error

type IdleConfig

type IdleConfig struct {
	ReminderAfter   time.Duration // Default: 5 minutes
	DeactivateAfter time.Duration // Default: 15 minutes
}

IdleConfig configures the idle timer behavior.

func DefaultIdleConfig

func DefaultIdleConfig() IdleConfig

DefaultIdleConfig returns sensible defaults.

type IdleTimer

type IdleTimer struct {
	// contains filtered or unexported fields
}

IdleTimer manages reminder and auto-deactivation for Voice Agent.

func NewIdleTimer

func NewIdleTimer(cfg IdleConfig, session *Session) *IdleTimer

NewIdleTimer creates an idle timer bound to a session.

func (*IdleTimer) Reset

func (t *IdleTimer) Reset()

Reset restarts the idle countdown. Call after each user interaction.

func (*IdleTimer) Stop

func (t *IdleTimer) Stop()

Stop cancels all timers.

type LiveConfig

type LiveConfig struct {
	Model            string // e.g. "gemini-2.5-flash-native-audio-preview-12-2025"
	APIKey           string
	Voice            string // Voice name
	FrameworkPrompt  string
	RefinementPrompt string
	Instruction      string // Deprecated: kept as compat alias for FrameworkPrompt.
	SystemPrompt     string // Deprecated: kept as compat alias, prefer FrameworkPrompt.
	VocabularyHint   string
	Locale           string
	Policies         LivePolicies
	Tools            []ToolDefinition
}

LiveConfig configures a real-time session.

type LiveMessage

type LiveMessage struct {
	Audio []byte // PCM audio chunk (24kHz 16-bit mono)
	Text  string // Text transcript (may be partial or empty)
	Done  bool   // True when the model's turn is complete

	// Transcription fields (populated when transcription is enabled).
	InputTranscript      string // User speech transcribed by server
	InputTranscriptDone  bool   // True when input transcription segment is final
	OutputTranscript     string // Model speech transcribed by server
	OutputTranscriptDone bool   // True when output transcription segment is final

	ToolCalls               []ToolCall
	ToolCallCancellationIDs []string
	Interrupted             bool // True when user interrupted model (barge-in)
	GoAway                  bool // True when server signals imminent session end
}

LiveMessage is a message received from the real-time model.

type LivePolicies added in v0.18.0

type LivePolicies struct {
	EnableInputAudioTranscription  bool
	EnableOutputAudioTranscription bool
	EnableAffectiveDialog          bool
	Thinking                       ThinkingPolicy
	ContextCompression             ContextCompressionPolicy
	ActivityDetection              ActivityDetectionPolicy
}

LivePolicies configures Google Live API features that shape Voice Agent behavior.

type LiveProvider

type LiveProvider interface {
	// Connect establishes a WebSocket session to the real-time model.
	Connect(ctx context.Context, cfg LiveConfig) error

	// SendAudio streams PCM audio chunks to the model.
	// Format: 16-bit signed int, little-endian, mono, 16kHz.
	SendAudio(chunk []byte) error

	// SendAudioStreamEnd signals that microphone input for the current turn ended.
	SendAudioStreamEnd() error

	// Receive blocks until the next server message arrives.
	// Returns audio chunks and/or text from the model.
	Receive(ctx context.Context) (*LiveMessage, error)

	// SendText injects a text prompt into the session (for idle reminders).
	SendText(text string) error

	// SendToolResponse sends the result of a host-side tool invocation back to the model.
	SendToolResponse(response ToolResponse) error

	// Close terminates the WebSocket session.
	Close() error

	// Name returns the provider identifier.
	Name() string
}

LiveProvider abstracts a real-time audio-to-audio model connection.

type LiveReconnector added in v0.18.0

type LiveReconnector interface {
	Reconnect(ctx context.Context) error
}

LiveReconnector is an optional interface for providers that support session reconnection.

type Session

type Session struct {
	// contains filtered or unexported fields
}

Session manages a Voice Agent conversation.

func NewSession

func NewSession(provider LiveProvider, callbacks Callbacks) *Session

NewSession creates a Voice Agent session with the given provider.

func (*Session) CurrentState

func (s *Session) CurrentState() State

State returns the current session state.

func (*Session) EndAudioStream added in v0.22.4

func (s *Session) EndAudioStream() error

EndAudioStream tells the live provider that the current microphone stream ended.

func (*Session) SendAudio

func (s *Session) SendAudio(chunk []byte) error

SendAudio forwards a PCM audio chunk to the real-time model.

func (*Session) SendText added in v0.19.0

func (s *Session) SendText(text string) error

SendText injects a user text turn into the live session.

func (*Session) SendToolResponse added in v0.18.0

func (s *Session) SendToolResponse(response ToolResponse) error

SendToolResponse forwards the result of a host-side tool invocation to the model.

func (*Session) Start

func (s *Session) Start(ctx context.Context, cfg LiveConfig, idleCfg IdleConfig) error

Start activates the Voice Agent session.

func (*Session) Stop

func (s *Session) Stop()

Stop deactivates the Voice Agent session.

type StartSensitivity added in v0.18.0

type StartSensitivity string

StartSensitivity controls how aggressively automatic activity detection commits speech start.

const (
	StartSensitivityLow    StartSensitivity = "low"
	StartSensitivityMedium StartSensitivity = "medium"
	StartSensitivityHigh   StartSensitivity = "high"
)

type State

type State string

State represents the current state of the Voice Agent session.

const (
	StateInactive     State = "inactive"
	StateConnecting   State = "connecting"
	StateListening    State = "listening"
	StateProcessing   State = "processing"
	StateSpeaking     State = "speaking"
	StateDeactivating State = "deactivating"
)

type ThinkingLevel added in v0.18.0

type ThinkingLevel string

ThinkingLevel controls how much deliberate reasoning Gemini Live should spend.

const (
	ThinkingLevelOff    ThinkingLevel = "off"
	ThinkingLevelLow    ThinkingLevel = "low"
	ThinkingLevelMedium ThinkingLevel = "medium"
	ThinkingLevelHigh   ThinkingLevel = "high"
)

type ThinkingPolicy added in v0.18.0

type ThinkingPolicy struct {
	Enabled         bool
	IncludeThoughts bool
	ThinkingBudget  int32
	ThinkingLevel   ThinkingLevel
}

ThinkingPolicy defines optional Gemini Live thinking behavior.

type ToolBehavior added in v0.18.0

type ToolBehavior string

ToolBehavior controls whether the model waits for a tool result.

const (
	ToolBehaviorUnspecified ToolBehavior = ""
	ToolBehaviorBlocking    ToolBehavior = "blocking"
	ToolBehaviorNonBlocking ToolBehavior = "non_blocking"
)

type ToolCall added in v0.18.0

type ToolCall struct {
	ID   string
	Name string
	Args map[string]any
}

ToolCall is a host-side action request emitted by the Voice Agent runtime.

type ToolDefinition added in v0.18.0

type ToolDefinition struct {
	Name                 string
	Description          string
	ParametersJSONSchema map[string]any
	ResponseJSONSchema   map[string]any
	Behavior             ToolBehavior
}

ToolDefinition exposes a host-side action the Voice Agent may call.

type ToolResponse added in v0.18.0

type ToolResponse struct {
	ID       string
	Name     string
	Response map[string]any

	Scheduling   ToolResponseScheduling
	WillContinue *bool
}

ToolResponse resolves a previously emitted tool call.

type ToolResponseScheduling added in v0.18.0

type ToolResponseScheduling string

ToolResponseScheduling controls how a non-blocking tool result is reintroduced into the conversation.

const (
	ToolResponseSchedulingUnspecified ToolResponseScheduling = ""
	ToolResponseSchedulingSilent      ToolResponseScheduling = "silent"
	ToolResponseSchedulingWhenIdle    ToolResponseScheduling = "when_idle"
	ToolResponseSchedulingInterrupt   ToolResponseScheduling = "interrupt"
)

type TurnCoverage added in v0.18.0

type TurnCoverage string

TurnCoverage controls how the live API builds a user turn from incoming activity.

const (
	TurnCoverageUnspecified               TurnCoverage = ""
	TurnCoverageTurnIncludesOnlyActivity  TurnCoverage = "turn_includes_only_activity"
	TurnCoverageTurnIncludesAllInput      TurnCoverage = "turn_includes_all_input"
	TurnCoverageTurnIncludesAudioActivity TurnCoverage = "turn_includes_audio_activity"
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL