voiceagent

package
v0.17.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 12, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Callbacks

type Callbacks struct {
	OnStateChange func(state State)
	OnAudio       func(audio []byte) // Audio chunk to play
	OnText        func(text string)  // Text for display (speech bubble)
	OnError       func(err error)
}

Callbacks are event handlers for UI integration.

type GeminiLive

type GeminiLive struct {
	// contains filtered or unexported fields
}

GeminiLive implements LiveProvider using the Google GenAI Live API.

func NewGeminiLive

func NewGeminiLive() *GeminiLive

NewGeminiLive creates a Gemini Live provider.

func (*GeminiLive) Close

func (g *GeminiLive) Close() error

func (*GeminiLive) Connect

func (g *GeminiLive) Connect(ctx context.Context, cfg LiveConfig) error

func (*GeminiLive) Name

func (g *GeminiLive) Name() string

func (*GeminiLive) Receive

func (g *GeminiLive) Receive(ctx context.Context) (*LiveMessage, error)

func (*GeminiLive) SendAudio

func (g *GeminiLive) SendAudio(chunk []byte) error

func (*GeminiLive) SendText

func (g *GeminiLive) SendText(text string) error

type IdleConfig

type IdleConfig struct {
	ReminderAfter   time.Duration // Default: 5 minutes
	DeactivateAfter time.Duration // Default: 15 minutes
}

IdleConfig configures the idle timer behavior.

func DefaultIdleConfig

func DefaultIdleConfig() IdleConfig

DefaultIdleConfig returns sensible defaults.

type IdleTimer

type IdleTimer struct {
	// contains filtered or unexported fields
}

IdleTimer manages reminder and auto-deactivation for Voice Agent.

func NewIdleTimer

func NewIdleTimer(cfg IdleConfig, session *Session) *IdleTimer

NewIdleTimer creates an idle timer bound to a session.

func (*IdleTimer) Reset

func (t *IdleTimer) Reset()

Reset restarts the idle countdown. Call after each user interaction.

func (*IdleTimer) Stop

func (t *IdleTimer) Stop()

Stop cancels all timers.

type LiveConfig

type LiveConfig struct {
	Model          string // e.g. "gemini-2.5-flash-native-audio-preview-12-2025"
	APIKey         string
	Voice          string // Voice name
	SystemPrompt   string
	VocabularyHint string
	Locale         string
}

LiveConfig configures a real-time session.

type LiveMessage

type LiveMessage struct {
	Audio []byte // PCM audio chunk (24kHz 16-bit mono)
	Text  string // Text transcript (may be partial or empty)
	Done  bool   // True when the model's turn is complete
}

LiveMessage is a message received from the real-time model.

type LiveProvider

type LiveProvider interface {
	// Connect establishes a WebSocket session to the real-time model.
	Connect(ctx context.Context, cfg LiveConfig) error

	// SendAudio streams PCM audio chunks to the model.
	// Format: 16-bit signed int, little-endian, mono, 16kHz.
	SendAudio(chunk []byte) error

	// Receive blocks until the next server message arrives.
	// Returns audio chunks and/or text from the model.
	Receive(ctx context.Context) (*LiveMessage, error)

	// SendText injects a text prompt into the session (for idle reminders).
	SendText(text string) error

	// Close terminates the WebSocket session.
	Close() error

	// Name returns the provider identifier.
	Name() string
}

LiveProvider abstracts a real-time audio-to-audio model connection.

type Session

type Session struct {
	// contains filtered or unexported fields
}

Session manages a Voice Agent conversation.

func NewSession

func NewSession(provider LiveProvider, callbacks Callbacks) *Session

NewSession creates a Voice Agent session with the given provider.

func (*Session) CurrentState

func (s *Session) CurrentState() State

State returns the current session state.

func (*Session) SendAudio

func (s *Session) SendAudio(chunk []byte) error

SendAudio forwards a PCM audio chunk to the real-time model.

func (*Session) Start

func (s *Session) Start(ctx context.Context, cfg LiveConfig, idleCfg IdleConfig) error

Start activates the Voice Agent session.

func (*Session) Stop

func (s *Session) Stop()

Stop deactivates the Voice Agent session.

type State

type State string

State represents the current state of the Voice Agent session.

const (
	StateInactive     State = "inactive"
	StateConnecting   State = "connecting"
	StateListening    State = "listening"
	StateProcessing   State = "processing"
	StateSpeaking     State = "speaking"
	StateDeactivating State = "deactivating"
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL