Documentation
¶
Overview ¶
Package live exposes the low-level Voice Agent realtime-protocol types.
This is the public-API surface for building custom realtime providers (Gemini Live, OpenAI Realtime, or a third-party WebSocket model) and for libraries that drive a SpeechKit session directly without going through the higher-level [Service] in the parent package.
Most embedders use the parent [voiceagent] package or github.com/kombifyio/SpeechKit/pkg/speechkit/agentkit instead; reach for this package only when you need to plug in your own provider or read the raw LiveMessage stream.
Index ¶
- func RenderHostInstructionUpdate(cfg LiveConfig) string
- type ActivityDetectionPolicy
- type ActivityHandling
- type Callbacks
- type ContextCompressionPolicy
- type EndSensitivity
- type IdleConfig
- type IdleTimer
- type LiveConfig
- type LiveInstructionUpdater
- type LiveMessage
- type LivePolicies
- type LiveProvider
- type LiveReconnector
- type Session
- func (s *Session) AdvanceWorkflowStep(ctx context.Context, reason string) error
- func (s *Session) CurrentState() State
- func (s *Session) EndAudioStream() error
- func (s *Session) ProviderName() string
- func (s *Session) SendAudio(chunk []byte) error
- func (s *Session) SendText(text string) error
- func (s *Session) SendToolResponse(response ToolResponse) error
- func (s *Session) Start(ctx context.Context, cfg LiveConfig, idleCfg IdleConfig) error
- func (s *Session) Stop()
- type StartSensitivity
- type State
- type ThinkingLevel
- type ThinkingPolicy
- type ToolBehavior
- type ToolCall
- type ToolDefinition
- type ToolResponse
- type ToolResponseScheduling
- type TurnCoverage
- type WorkflowConfig
- type WorkflowStep
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func RenderHostInstructionUpdate ¶
func RenderHostInstructionUpdate(cfg LiveConfig) string
RenderHostInstructionUpdate turns a workflow step change into a provider text update. Providers that implement LiveInstructionUpdater receive the structured LiveConfig instead.
Types ¶
type ActivityDetectionPolicy ¶
type ActivityDetectionPolicy struct {
Automatic bool
StartSensitivity StartSensitivity
EndSensitivity EndSensitivity
PrefixPaddingMs int32
SilenceDurationMs int32
ActivityHandling ActivityHandling
TurnCoverage TurnCoverage
}
ActivityDetectionPolicy defines server-side VAD/session turn behavior.
type ActivityHandling ¶
type ActivityHandling string
ActivityHandling controls what Gemini Live should do when new activity starts.
const ( ActivityHandlingUnspecified ActivityHandling = "" ActivityHandlingNoInterrupt ActivityHandling = "no_interrupt" ActivityHandlingStartOfActivityInterrupts ActivityHandling = "start_of_activity_interrupts" )
type Callbacks ¶
type Callbacks struct {
OnStateChange func(state State)
OnAudio func(audio []byte) // Audio chunk to play
OnText func(text string) // Text for display (speech bubble)
OnError func(err error)
OnInputTranscript func(text string, done bool) // User speech transcribed
OnOutputTranscript func(text string, done bool) // Model speech transcribed
OnToolCall func(call ToolCall)
OnToolCallCancellation func(ids []string)
OnInterrupted func() // User interrupted model (barge-in)
OnSessionEnd func() // Session ended (error, GoAway failure, or deactivation)
}
Callbacks are event handlers for UI integration with a low-level Session.
This is the rich Callbacks struct used by the low-level Session runtime. The higher-level [voiceagent.Callbacks] in the parent package carries only the three most-common handlers (OnAudio, OnText, OnError) and is the right shape for the embedded [voiceagent.Service]; reach for this struct only when wiring a custom realtime host.
type ContextCompressionPolicy ¶
ContextCompressionPolicy defines how the live API should compress long sessions.
type EndSensitivity ¶
type EndSensitivity string
EndSensitivity controls how aggressively automatic activity detection commits speech end.
const ( EndSensitivityLow EndSensitivity = "low" EndSensitivityMedium EndSensitivity = "medium" EndSensitivityHigh EndSensitivity = "high" )
type IdleConfig ¶
type IdleConfig struct {
ReminderAfter time.Duration // Default: 5 minutes
DeactivateAfter time.Duration // Default: 15 minutes
}
IdleConfig configures the idle timer behavior.
func DefaultIdleConfig ¶
func DefaultIdleConfig() IdleConfig
DefaultIdleConfig returns sensible defaults.
type IdleTimer ¶
type IdleTimer struct {
// contains filtered or unexported fields
}
IdleTimer manages reminder and auto-deactivation for Voice Agent.
func NewIdleTimer ¶
func NewIdleTimer(cfg IdleConfig, session *Session) *IdleTimer
NewIdleTimer creates an idle timer bound to a session.
type LiveConfig ¶
type LiveConfig struct {
Model string // e.g. "gemini-3.1-flash-live-preview"
// FallbackModel is tried when the primary Model's Connect fails. Empty
// disables the fallback. Typical pairing in 2026: a preview model as
// Model + the last GA model as FallbackModel, so transient preview
// outages don't take down a Voice Agent deployment.
FallbackModel string
APIKey string
Voice string // Voice name
FrameworkPrompt string
RefinementPrompt string
Instruction string // Deprecated: kept as compat alias for FrameworkPrompt.
SystemPrompt string // Deprecated: kept as compat alias, prefer FrameworkPrompt.
VocabularyHint string
Locale string
// Region is the Google Cloud region the caller's API key / project is
// pinned to (e.g. "europe-west3", "us-central1"). Used by providers that
// support regional endpoints. For Gemini Live (as of May 2026) the API
// exposes a single global WebSocket endpoint, so this field does NOT
// redirect traffic — it is logged at connect time for compliance evidence
// (byok.key_updated audit event) and reserved for future regional routing
// once Google publishes per-region hostnames. Data residency is controlled
// at the Google Cloud project level; both the project region AND this field
// must agree for audit records to be accurate.
// See docs/compliance/byok-gemini-region-pinning.md.
Region string
Policies LivePolicies
Tools []ToolDefinition
Workflow *WorkflowConfig
}
LiveConfig configures a real-time session.
type LiveInstructionUpdater ¶
type LiveInstructionUpdater interface {
UpdateInstructions(ctx context.Context, cfg LiveConfig) error
}
LiveInstructionUpdater is optionally implemented by providers that can refresh active host instructions without treating the update as a user turn.
type LiveMessage ¶
type LiveMessage struct {
Audio []byte // PCM audio chunk (24kHz 16-bit mono)
Text string // Text transcript (may be partial or empty)
Done bool // True when the model's turn is complete
// Transcription fields (populated when transcription is enabled).
InputTranscript string // User speech transcribed by server
InputTranscriptDone bool // True when input transcription segment is final
OutputTranscript string // Model speech transcribed by server
OutputTranscriptDone bool // True when output transcription segment is final
ToolCalls []ToolCall
ToolCallCancellationIDs []string
Interrupted bool // True when user interrupted model (barge-in)
GoAway bool // True when server signals imminent session end
}
LiveMessage is a message received from the real-time model.
type LivePolicies ¶
type LivePolicies struct {
EnableInputAudioTranscription bool
EnableOutputAudioTranscription bool
EnableAffectiveDialog bool
Thinking ThinkingPolicy
ContextCompression ContextCompressionPolicy
ActivityDetection ActivityDetectionPolicy
}
LivePolicies configures Google Live API features that shape Voice Agent behavior.
type LiveProvider ¶
type LiveProvider interface {
// Connect establishes a WebSocket session to the real-time model.
Connect(ctx context.Context, cfg LiveConfig) error
// SendAudio streams PCM audio chunks to the model.
// Format: 16-bit signed int, little-endian, mono, 16kHz.
SendAudio(chunk []byte) error
// SendAudioStreamEnd signals that microphone input for the current turn ended.
SendAudioStreamEnd() error
// Receive blocks until the next server message arrives.
// Returns audio chunks and/or text from the model.
Receive(ctx context.Context) (*LiveMessage, error)
// SendText injects a text prompt into the session (for idle reminders).
SendText(text string) error
// SendToolResponse sends the result of a host-side tool invocation back to the model.
SendToolResponse(response ToolResponse) error
// Close terminates the WebSocket session.
Close() error
// Name returns the provider identifier.
Name() string
}
LiveProvider abstracts a real-time audio-to-audio model connection.
type LiveReconnector ¶
LiveReconnector is an optional interface for providers that support session reconnection.
type Session ¶
type Session struct {
// contains filtered or unexported fields
}
Session manages a Voice Agent conversation.
func NewSession ¶
func NewSession(provider LiveProvider, callbacks Callbacks) *Session
NewSession creates a Voice Agent session with the given provider.
func (*Session) AdvanceWorkflowStep ¶
AdvanceWorkflowStep moves a configured local workflow to its next step and updates the active provider instructions. It returns nil when no workflow is active or the workflow has already completed.
func (*Session) CurrentState ¶
State returns the current session state.
func (*Session) EndAudioStream ¶
EndAudioStream tells the live provider that the current microphone stream ended.
func (*Session) ProviderName ¶
func (*Session) SendToolResponse ¶
func (s *Session) SendToolResponse(response ToolResponse) error
SendToolResponse forwards the result of a host-side tool invocation to the model.
func (*Session) Start ¶
func (s *Session) Start(ctx context.Context, cfg LiveConfig, idleCfg IdleConfig) error
Start activates the Voice Agent session.
type StartSensitivity ¶
type StartSensitivity string
StartSensitivity controls how aggressively automatic activity detection commits speech start.
const ( StartSensitivityLow StartSensitivity = "low" StartSensitivityMedium StartSensitivity = "medium" StartSensitivityHigh StartSensitivity = "high" )
type ThinkingLevel ¶
type ThinkingLevel string
ThinkingLevel controls how much deliberate reasoning Gemini Live should spend.
const ( ThinkingLevelOff ThinkingLevel = "off" ThinkingLevelLow ThinkingLevel = "low" ThinkingLevelMedium ThinkingLevel = "medium" ThinkingLevelHigh ThinkingLevel = "high" )
type ThinkingPolicy ¶
type ThinkingPolicy struct {
Enabled bool
IncludeThoughts bool
ThinkingBudget int32
ThinkingLevel ThinkingLevel
}
ThinkingPolicy defines optional Gemini Live thinking behavior.
type ToolBehavior ¶
type ToolBehavior string
ToolBehavior controls whether the model waits for a tool result.
const ( ToolBehaviorUnspecified ToolBehavior = "" ToolBehaviorBlocking ToolBehavior = "blocking" ToolBehaviorNonBlocking ToolBehavior = "non_blocking" )
type ToolDefinition ¶
type ToolDefinition struct {
Name string
Description string
ParametersJSONSchema map[string]any
ResponseJSONSchema map[string]any
Behavior ToolBehavior
}
ToolDefinition exposes a host-side action the Voice Agent may call.
type ToolResponse ¶
type ToolResponse struct {
ID string
Name string
Response map[string]any
Scheduling ToolResponseScheduling
WillContinue *bool
}
ToolResponse resolves a previously emitted tool call.
type ToolResponseScheduling ¶
type ToolResponseScheduling string
ToolResponseScheduling controls how a non-blocking tool result is reintroduced into the conversation.
const ( ToolResponseSchedulingUnspecified ToolResponseScheduling = "" ToolResponseSchedulingSilent ToolResponseScheduling = "silent" ToolResponseSchedulingWhenIdle ToolResponseScheduling = "when_idle" ToolResponseSchedulingInterrupt ToolResponseScheduling = "interrupt" )
type TurnCoverage ¶
type TurnCoverage string
TurnCoverage controls how the live API builds a user turn from incoming activity.
const ( TurnCoverageUnspecified TurnCoverage = "" TurnCoverageTurnIncludesOnlyActivity TurnCoverage = "turn_includes_only_activity" TurnCoverageTurnIncludesAllInput TurnCoverage = "turn_includes_all_input" TurnCoverageTurnIncludesAudioActivity TurnCoverage = "turn_includes_audio_activity" )
type WorkflowConfig ¶
type WorkflowConfig struct {
SequenceID string
Completion string
MaxTurns int
BasePrompt string
Steps []WorkflowStep
// InitialStep selects the first active step. Out-of-range values fall
// back to zero.
InitialStep int
}
WorkflowConfig describes a deterministic, step-based Voice Agent behavior sequence. Durations are intentionally expressed as turns instead of wall clock time so tests and local installs can exercise long moderation flows quickly.