voiceagent

package
v0.40.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Overview

Package voiceagent is the Voice Agent kernel — realtime audio-to-audio session manager backed by Gemini Live, with Persona/Role/Sequence resolution from internal/voicebehavior. The local desktop provider (cmd/speechkit) and the server adapter (internal/server/voiceagent) both depend on this package; the kernel itself stays free of OS or HTTP plumbing.

Audit 2026-05-24 maintainability sweep.

Index

Constants

View Source
const (
	StateInactive     = live.StateInactive
	StateConnecting   = live.StateConnecting
	StateListening    = live.StateListening
	StateProcessing   = live.StateProcessing
	StateSpeaking     = live.StateSpeaking
	StateRecovering   = live.StateRecovering
	StateDeactivating = live.StateDeactivating

	ThinkingLevelOff    = live.ThinkingLevelOff
	ThinkingLevelLow    = live.ThinkingLevelLow
	ThinkingLevelMedium = live.ThinkingLevelMedium
	ThinkingLevelHigh   = live.ThinkingLevelHigh

	StartSensitivityLow    = live.StartSensitivityLow
	StartSensitivityMedium = live.StartSensitivityMedium
	StartSensitivityHigh   = live.StartSensitivityHigh

	EndSensitivityLow    = live.EndSensitivityLow
	EndSensitivityMedium = live.EndSensitivityMedium
	EndSensitivityHigh   = live.EndSensitivityHigh

	ActivityHandlingUnspecified               = live.ActivityHandlingUnspecified
	ActivityHandlingNoInterrupt               = live.ActivityHandlingNoInterrupt
	ActivityHandlingStartOfActivityInterrupts = live.ActivityHandlingStartOfActivityInterrupts

	TurnCoverageUnspecified               = live.TurnCoverageUnspecified
	TurnCoverageTurnIncludesOnlyActivity  = live.TurnCoverageTurnIncludesOnlyActivity
	TurnCoverageTurnIncludesAllInput      = live.TurnCoverageTurnIncludesAllInput
	TurnCoverageTurnIncludesAudioActivity = live.TurnCoverageTurnIncludesAudioActivity

	ToolBehaviorUnspecified = live.ToolBehaviorUnspecified
	ToolBehaviorBlocking    = live.ToolBehaviorBlocking
	ToolBehaviorNonBlocking = live.ToolBehaviorNonBlocking

	ToolResponseSchedulingUnspecified = live.ToolResponseSchedulingUnspecified
	ToolResponseSchedulingSilent      = live.ToolResponseSchedulingSilent
	ToolResponseSchedulingWhenIdle    = live.ToolResponseSchedulingWhenIdle
	ToolResponseSchedulingInterrupt   = live.ToolResponseSchedulingInterrupt
)
View Source
const DefaultOpenAIRealtimeModel = defaultOpenAIRealtimeModel

DefaultOpenAIRealtimeModel is the public runtime default for OpenAI-backed Voice Agent sessions.

Variables

This section is empty.

Functions

func RenderHostInstructionUpdate added in v0.28.2

func RenderHostInstructionUpdate(cfg LiveConfig) string

RenderHostInstructionUpdate is a thin wrapper around live.RenderHostInstructionUpdate.

Types

type ActivityDetectionPolicy added in v0.18.0

type ActivityDetectionPolicy = live.ActivityDetectionPolicy

type ActivityHandling added in v0.18.0

type ActivityHandling = live.ActivityHandling

type Callbacks

type Callbacks = live.Callbacks

type ContextCompressionPolicy added in v0.18.0

type ContextCompressionPolicy = live.ContextCompressionPolicy

type EndSensitivity added in v0.18.0

type EndSensitivity = live.EndSensitivity

type GeminiLive

type GeminiLive struct {
	// contains filtered or unexported fields
}

GeminiLive implements LiveProvider using the Google GenAI Live API.

func NewGeminiLive

func NewGeminiLive() *GeminiLive

NewGeminiLive creates a Gemini Live provider.

func (*GeminiLive) Close

func (g *GeminiLive) Close() error

func (*GeminiLive) Connect

func (g *GeminiLive) Connect(ctx context.Context, cfg LiveConfig) error

func (*GeminiLive) Name

func (g *GeminiLive) Name() string

func (*GeminiLive) Receive

func (g *GeminiLive) Receive(ctx context.Context) (*LiveMessage, error)

func (*GeminiLive) Reconnect added in v0.18.0

func (g *GeminiLive) Reconnect(ctx context.Context) error

Reconnect re-establishes the session using the stored resumption handle. If the handle has expired (TTL) or been cleared, a fresh session is opened.

func (*GeminiLive) SendAudio

func (g *GeminiLive) SendAudio(chunk []byte) error

func (*GeminiLive) SendAudioStreamEnd added in v0.22.4

func (g *GeminiLive) SendAudioStreamEnd() error

func (*GeminiLive) SendText

func (g *GeminiLive) SendText(text string) error

func (*GeminiLive) SendToolResponse added in v0.18.0

func (g *GeminiLive) SendToolResponse(response ToolResponse) error

type IdleConfig

type IdleConfig = live.IdleConfig

func DefaultIdleConfig

func DefaultIdleConfig() IdleConfig

DefaultIdleConfig is a thin wrapper around live.DefaultIdleConfig.

type IdleTimer

type IdleTimer = live.IdleTimer

func NewIdleTimer

func NewIdleTimer(cfg IdleConfig, session *Session) *IdleTimer

NewIdleTimer is a thin wrapper around live.NewIdleTimer.

type LiveConfig

type LiveConfig = live.LiveConfig

type LiveInstructionUpdater added in v0.28.2

type LiveInstructionUpdater = live.LiveInstructionUpdater

type LiveMessage

type LiveMessage = live.LiveMessage

type LivePolicies added in v0.18.0

type LivePolicies = live.LivePolicies

type LiveProvider

type LiveProvider = live.LiveProvider

type LiveReconnector added in v0.18.0

type LiveReconnector = live.LiveReconnector

type LocalVoiceAgentDeps added in v0.34.9

type LocalVoiceAgentDeps struct {
	STT    cascaded.STT
	Agent  cascaded.Agent
	TTS    cascaded.TTS
	Config cascaded.Config
}

LocalVoiceAgentDeps bundles the kernel-level routers and flows the Device-Target wires into the local Voice Agent provider.

Why a separate name from cascaded.Deps: a Device-Target caller cares about "I'm wiring the LOCAL provider", not about the underlying cascaded turn-based architecture. The latter is an implementation detail of the local path.

type LocalVoiceAgentProvider added in v0.34.9

type LocalVoiceAgentProvider struct {
	// contains filtered or unexported fields
}

LocalVoiceAgentProvider is the Device-Target-side local Voice Agent provider. It wraps a cascaded.Provider and adapts its minimal SessionConfig / Message types to the public live.LiveConfig / live.LiveMessage interface so a Wails session can drive it the same way it drives Gemini Live.

func NewLocalVoiceAgentProvider added in v0.34.9

func NewLocalVoiceAgentProvider(deps LocalVoiceAgentDeps) *LocalVoiceAgentProvider

NewLocalVoiceAgentProvider constructs the local provider. Panics if STT or Agent is nil — these are hard requirements; nothing in the Device-Target should ever construct this without them.

func (*LocalVoiceAgentProvider) Close added in v0.34.9

func (p *LocalVoiceAgentProvider) Close() error

Close delegates.

func (*LocalVoiceAgentProvider) Connect added in v0.34.9

Connect translates the rich live.LiveConfig into cascaded.SessionConfig.

func (*LocalVoiceAgentProvider) Name added in v0.34.9

func (p *LocalVoiceAgentProvider) Name() string

Name overrides cascaded.Provider.Name() so observability tells the Device-Target-embedded path apart from the Server-Target wiring.

func (*LocalVoiceAgentProvider) Receive added in v0.34.9

Receive blocks for the next cascaded.Message and adapts it into a live.LiveMessage.

func (*LocalVoiceAgentProvider) SendAudio added in v0.34.9

func (p *LocalVoiceAgentProvider) SendAudio(chunk []byte) error

SendAudio delegates.

func (*LocalVoiceAgentProvider) SendAudioStreamEnd added in v0.34.9

func (p *LocalVoiceAgentProvider) SendAudioStreamEnd() error

SendAudioStreamEnd delegates.

func (*LocalVoiceAgentProvider) SendText added in v0.34.9

func (p *LocalVoiceAgentProvider) SendText(text string) error

SendText delegates.

func (*LocalVoiceAgentProvider) SendToolResponse added in v0.34.9

func (p *LocalVoiceAgentProvider) SendToolResponse(_ live.ToolResponse) error

SendToolResponse is a no-op. The local cascaded path is turn-based and does not surface tool calls upstream — clients that ask for tool calls fall back to the cloud Voice Agent.

func (*LocalVoiceAgentProvider) UpdateInstructions added in v0.34.9

func (p *LocalVoiceAgentProvider) UpdateInstructions(ctx context.Context, cfg live.LiveConfig) error

UpdateInstructions translates rich live.LiveConfig to SessionConfig for mid-session prompt updates. Implementing this also satisfies live.LiveInstructionUpdater.

type OpenAILive added in v0.31.0

type OpenAILive struct {
	// contains filtered or unexported fields
}

OpenAILive implements LiveProvider against the OpenAI Realtime API (WebSocket). It mirrors GeminiLive's surface so callers don't need to know which backend is active.

func NewOpenAILive added in v0.31.0

func NewOpenAILive() *OpenAILive

NewOpenAILive returns a fresh OpenAI Realtime provider.

func (*OpenAILive) Close added in v0.31.0

func (p *OpenAILive) Close() error

Close terminates the WebSocket. Idempotent.

func (*OpenAILive) Connect added in v0.31.0

func (p *OpenAILive) Connect(ctx context.Context, cfg LiveConfig) error

Connect dials the OpenAI Realtime WebSocket, sends the configured instructions/voice/tools as a session.update, and waits for the session.updated acknowledgement before returning.

func (*OpenAILive) Name added in v0.31.0

func (p *OpenAILive) Name() string

Name identifies the provider in Voice Agent logs.

func (*OpenAILive) Receive added in v0.31.0

func (p *OpenAILive) Receive(ctx context.Context) (*LiveMessage, error)

Receive translates the next server event into a LiveMessage. Server events that don't map to LiveMessage fields (session.created/updated, rate-limit telemetry, etc.) are swallowed and the loop fetches the next frame so callers see an aligned event stream.

func (*OpenAILive) SendAudio added in v0.31.0

func (p *OpenAILive) SendAudio(chunk []byte) error

SendAudio resamples a 16 kHz mic chunk to 24 kHz and forwards it as a base64-encoded input_audio_buffer.append event. Empty chunks are no-ops.

func (*OpenAILive) SendAudioStreamEnd added in v0.31.0

func (p *OpenAILive) SendAudioStreamEnd() error

SendAudioStreamEnd flushes the input audio buffer and triggers a model response. With server VAD enabled OpenAI commits the buffer automatically at end-of-speech; this explicit commit covers push-to-talk style turns where the kernel decides when the user stopped speaking.

func (*OpenAILive) SendText added in v0.31.0

func (p *OpenAILive) SendText(text string) error

SendText injects a text-only user turn and triggers a response.

func (*OpenAILive) SendToolResponse added in v0.31.0

func (p *OpenAILive) SendToolResponse(response ToolResponse) error

SendToolResponse delivers the host-side tool result back to the model and triggers a follow-up response.

func (*OpenAILive) UpdateInstructions added in v0.31.0

func (p *OpenAILive) UpdateInstructions(ctx context.Context, cfg LiveConfig) error

UpdateInstructions sends a fresh session.update with new instructions/tools. Implements LiveInstructionUpdater so the kernel can refresh persona prompts without forcing a reconnect.

type Session

type Session = live.Session

func NewSession

func NewSession(provider LiveProvider, callbacks Callbacks) *Session

NewSession is a thin wrapper around live.NewSession so existing non-OSS call sites can continue to construct sessions through this package.

type StartSensitivity added in v0.18.0

type StartSensitivity = live.StartSensitivity

type State

type State = live.State

type ThinkingLevel added in v0.18.0

type ThinkingLevel = live.ThinkingLevel

type ThinkingPolicy added in v0.18.0

type ThinkingPolicy = live.ThinkingPolicy

type ToolBehavior added in v0.18.0

type ToolBehavior = live.ToolBehavior

type ToolCall added in v0.18.0

type ToolCall = live.ToolCall

type ToolDefinition added in v0.18.0

type ToolDefinition = live.ToolDefinition

type ToolResponse added in v0.18.0

type ToolResponse = live.ToolResponse

type ToolResponseScheduling added in v0.18.0

type ToolResponseScheduling = live.ToolResponseScheduling

type TurnCoverage added in v0.18.0

type TurnCoverage = live.TurnCoverage

type WorkflowConfig added in v0.28.2

type WorkflowConfig = live.WorkflowConfig

type WorkflowStep added in v0.28.2

type WorkflowStep = live.WorkflowStep

Directories

Path Synopsis
Package cascaded implements a turn-based STT -> LLM -> TTS voice agent provider.
Package cascaded implements a turn-based STT -> LLM -> TTS voice agent provider.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL