audio

package

v1.2.0 Latest Latest Go to latest Published: Feb 15, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AltairaLabs/PromptKit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package audio provides voice activity detection (VAD), turn detection, and audio session management for real-time voice AI applications.

The package follows industry-standard patterns for voice AI:

VAD (Voice Activity Detection): Detects when someone is speaking vs. silent
Turn Detection: Determines when a speaker has finished their turn
Interruption Handling: Manages user interrupting bot output

Architecture ¶

Audio processing follows a two-stage approach:

VADAnalyzer detects voice activity in real-time
TurnDetector uses VAD output plus additional signals to detect turn boundaries

Usage Example ¶

vad := audio.NewSimpleVAD(audio.DefaultVADParams())
detector := audio.NewSilenceDetector(500 * time.Millisecond)

for chunk := range audioStream {
    vad.Analyze(ctx, chunk)
    if detector.DetectTurnEnd(ctx, vad) {
        // User finished speaking
    }
}

Package audio provides audio processing utilities.

Index ¶

Constants
func Resample24kTo16k(input []byte) ([]byte, error)
func ResamplePCM16(input []byte, fromRate, toRate int) ([]byte, error)
type AccumulatingTurnDetector
type InterruptionCallback
type InterruptionHandler
- func NewInterruptionHandler(strategy InterruptionStrategy, vad VADAnalyzer) *InterruptionHandler
- func (h *InterruptionHandler) IsBotSpeaking() bool
- func (h *InterruptionHandler) NotifySentenceBoundary()
- func (h *InterruptionHandler) OnInterrupt(callback InterruptionCallback)
- func (h *InterruptionHandler) ProcessAudio(ctx context.Context, audio []byte) (bool, error)
- func (h *InterruptionHandler) ProcessVADState(ctx context.Context, state VADState) (bool, error)
- func (h *InterruptionHandler) Reset()
- func (h *InterruptionHandler) SetBotSpeaking(speaking bool)
- func (h *InterruptionHandler) WasInterrupted() bool
type InterruptionStrategy
- func (s InterruptionStrategy) String() string
type SilenceDetector
- func NewSilenceDetector(threshold time.Duration) *SilenceDetector
- func (d *SilenceDetector) GetAccumulatedAudio() []byte
- func (d *SilenceDetector) IsUserSpeaking() bool
- func (d *SilenceDetector) Name() string
- func (d *SilenceDetector) OnTurnComplete(callback TurnCallback)
- func (d *SilenceDetector) ProcessAudio(ctx context.Context, audio []byte) (bool, error)
- func (d *SilenceDetector) ProcessVADState(ctx context.Context, state VADState) (bool, error)
- func (d *SilenceDetector) Reset()
- func (d *SilenceDetector) SetTranscript(transcript string)
type SimpleVAD
- func NewSimpleVAD(params VADParams) (*SimpleVAD, error)
- func (v *SimpleVAD) Analyze(ctx context.Context, audio []byte) (float64, error)
- func (v *SimpleVAD) Name() string
- func (v *SimpleVAD) OnStateChange() <-chan VADEvent
- func (v *SimpleVAD) Reset()
- func (v *SimpleVAD) State() VADState
type TurnCallback
type TurnDetector
type VADAnalyzer
type VADEvent
type VADParams
- func DefaultVADParams() VADParams
- func (p VADParams) Validate() error
type VADState
- func (s VADState) String() string
type ValidationError
- func (e *ValidationError) Error() string

Constants ¶

View Source

const (
	SampleRate24kHz = 24000 // Common TTS output rate
	SampleRate16kHz = 16000 // Common STT/ASR input rate
)

Standard audio sample rates for common use cases.

View Source

const (
	DefaultVADConfidence = 0.5
	DefaultVADStartSecs  = 0.2
	DefaultVADStopSecs   = 0.8
	DefaultVADMinVolume  = 0.01
	DefaultVADSampleRate = 16000
)

Default VAD parameter values.

Variables ¶

This section is empty.

Functions ¶

func Resample24kTo16k ¶

func Resample24kTo16k(input []byte) ([]byte, error)

Resample24kTo16k is a convenience function for the common case of resampling from 24kHz (TTS output) to 16kHz (Gemini input).

func ResamplePCM16 ¶

func ResamplePCM16(input []byte, fromRate, toRate int) ([]byte, error)

ResamplePCM16 resamples PCM16 audio data from one sample rate to another. Uses linear interpolation for reasonable quality resampling. Input and output are little-endian 16-bit signed PCM samples.

Types ¶

type AccumulatingTurnDetector ¶

type AccumulatingTurnDetector interface {
	TurnDetector

	// OnTurnComplete registers a callback for when a complete turn is detected.
	OnTurnComplete(callback TurnCallback)

	// GetAccumulatedAudio returns audio accumulated so far (may be incomplete turn).
	GetAccumulatedAudio() []byte

	// SetTranscript sets the transcript for the current turn (from external STT).
	SetTranscript(transcript string)
}

AccumulatingTurnDetector is a TurnDetector that accumulates audio during a turn.

type InterruptionCallback ¶

type InterruptionCallback func()

InterruptionCallback is called when user interrupts the bot.

type InterruptionHandler ¶

type InterruptionHandler struct {
	// contains filtered or unexported fields
}

InterruptionHandler manages user interruption logic during bot output.

func NewInterruptionHandler ¶

func NewInterruptionHandler(strategy InterruptionStrategy, vad VADAnalyzer) *InterruptionHandler

NewInterruptionHandler creates an InterruptionHandler with the given strategy and VAD.

func (*InterruptionHandler) IsBotSpeaking ¶

func (h *InterruptionHandler) IsBotSpeaking() bool

IsBotSpeaking returns true if the bot is currently outputting audio.

func (*InterruptionHandler) NotifySentenceBoundary ¶

func (h *InterruptionHandler) NotifySentenceBoundary()

NotifySentenceBoundary notifies the handler of a sentence boundary. For deferred interruption strategy, this may trigger the pending interruption.

func (*InterruptionHandler) OnInterrupt ¶

func (h *InterruptionHandler) OnInterrupt(callback InterruptionCallback)

OnInterrupt registers a callback for when interruption occurs.

func (*InterruptionHandler) ProcessAudio ¶

func (h *InterruptionHandler) ProcessAudio(ctx context.Context, audio []byte) (bool, error)

ProcessAudio processes audio and detects user interruption. Returns true if an interruption was detected and should be acted upon.

func (*InterruptionHandler) ProcessVADState ¶

func (h *InterruptionHandler) ProcessVADState(ctx context.Context, state VADState) (bool, error)

ProcessVADState processes a VAD state update for interruption detection. Returns true if an interruption was detected and should be acted upon.

func (*InterruptionHandler) Reset ¶

func (h *InterruptionHandler) Reset()

Reset clears interruption state for a new turn.

func (*InterruptionHandler) SetBotSpeaking ¶

func (h *InterruptionHandler) SetBotSpeaking(speaking bool)

SetBotSpeaking sets whether the bot is currently outputting audio.

func (*InterruptionHandler) WasInterrupted ¶

func (h *InterruptionHandler) WasInterrupted() bool

WasInterrupted returns true if an interruption occurred.

type InterruptionStrategy ¶

type InterruptionStrategy int

InterruptionStrategy determines how to handle user interrupting bot.

const (
	// InterruptionIgnore ignores user speech during bot output.
	InterruptionIgnore InterruptionStrategy = iota
	// InterruptionImmediate immediately stops bot and starts listening.
	InterruptionImmediate
	// InterruptionDeferred waits for bot's current sentence, then switches.
	InterruptionDeferred
)

func (InterruptionStrategy) String ¶

func (s InterruptionStrategy) String() string

String returns a human-readable representation of the interruption strategy.

type SilenceDetector ¶

type SilenceDetector struct {
	// Threshold is the silence duration required to trigger turn end.
	Threshold time.Duration
	// contains filtered or unexported fields
}

SilenceDetector detects turn boundaries based on silence duration. It triggers end-of-turn when silence exceeds a configurable threshold.

func NewSilenceDetector ¶

func NewSilenceDetector(threshold time.Duration) *SilenceDetector

NewSilenceDetector creates a SilenceDetector with the given threshold. threshold is the duration of silence required to trigger end-of-turn.

func (*SilenceDetector) GetAccumulatedAudio ¶

func (d *SilenceDetector) GetAccumulatedAudio() []byte

GetAccumulatedAudio returns audio accumulated so far.

func (*SilenceDetector) IsUserSpeaking ¶

func (d *SilenceDetector) IsUserSpeaking() bool

IsUserSpeaking returns true if user is currently speaking.

func (*SilenceDetector) Name ¶

func (d *SilenceDetector) Name() string

Name returns the detector identifier.

func (*SilenceDetector) OnTurnComplete ¶

func (d *SilenceDetector) OnTurnComplete(callback TurnCallback)

OnTurnComplete registers a callback for when a complete turn is detected.

func (*SilenceDetector) ProcessAudio ¶

func (d *SilenceDetector) ProcessAudio(ctx context.Context, audio []byte) (bool, error)

ProcessAudio processes an incoming audio chunk. This implementation delegates to ProcessVADState and expects VAD to be run separately. Returns true if end of turn is detected.

func (*SilenceDetector) ProcessVADState ¶

func (d *SilenceDetector) ProcessVADState(ctx context.Context, state VADState) (bool, error)

ProcessVADState processes a VAD state update and detects turn boundaries. Returns true if end of turn is detected.

func (*SilenceDetector) Reset ¶

func (d *SilenceDetector) Reset()

Reset clears state for a new conversation.

func (*SilenceDetector) SetTranscript ¶

func (d *SilenceDetector) SetTranscript(transcript string)

SetTranscript sets the transcript for the current turn.

type SimpleVAD ¶

type SimpleVAD struct {
	// contains filtered or unexported fields
}

SimpleVAD is a basic voice activity detector using RMS (Root Mean Square) analysis. It provides a lightweight VAD implementation without requiring external ML models. For more accurate detection, consider using SileroVAD.

func NewSimpleVAD ¶

func NewSimpleVAD(params VADParams) (*SimpleVAD, error)

NewSimpleVAD creates a SimpleVAD analyzer with the given parameters.

func (*SimpleVAD) Analyze ¶

func (v *SimpleVAD) Analyze(ctx context.Context, audio []byte) (float64, error)

Analyze processes audio and returns voice probability based on RMS volume.

func (*SimpleVAD) Name ¶

func (v *SimpleVAD) Name() string

Name returns the analyzer identifier.

func (*SimpleVAD) OnStateChange ¶

func (v *SimpleVAD) OnStateChange() <-chan VADEvent

OnStateChange returns a channel that receives state transitions.

func (*SimpleVAD) Reset ¶

func (v *SimpleVAD) Reset()

Reset clears accumulated state for a new conversation.

func (*SimpleVAD) State ¶

func (v *SimpleVAD) State() VADState

State returns the current VAD state.

type TurnCallback ¶

type TurnCallback func(audio []byte, transcript string)

TurnCallback is called when a complete user turn is detected. audio contains the accumulated audio for the turn. transcript contains any accumulated transcript (may be empty).

type TurnDetector ¶

type TurnDetector interface {
	// Name returns the detector identifier.
	Name() string

	// ProcessAudio processes an incoming audio chunk.
	// Returns true if end of turn is detected.
	ProcessAudio(ctx context.Context, audio []byte) (bool, error)

	// ProcessVADState processes a VAD state update.
	// Returns true if end of turn is detected based on VAD state.
	ProcessVADState(ctx context.Context, state VADState) (bool, error)

	// IsUserSpeaking returns true if user is currently speaking.
	IsUserSpeaking() bool

	// Reset clears state for a new conversation.
	Reset()
}

TurnDetector determines when a speaker has finished their turn. This is separate from VAD - VAD detects voice activity, turn detection determines conversation boundaries.

type VADAnalyzer ¶

type VADAnalyzer interface {
	// Name returns the analyzer identifier.
	Name() string

	// Analyze processes audio and returns voice probability (0.0-1.0).
	// audio should be raw PCM samples at the configured sample rate.
	Analyze(ctx context.Context, audio []byte) (float64, error)

	// State returns the current VAD state based on accumulated analysis.
	State() VADState

	// OnStateChange returns a channel that receives state transitions.
	// The channel is buffered and may drop events if not consumed.
	OnStateChange() <-chan VADEvent

	// Reset clears accumulated state for a new conversation.
	Reset()
}

VADAnalyzer analyzes audio for voice activity.

type VADEvent ¶

type VADEvent struct {
	State      VADState
	PrevState  VADState
	Timestamp  time.Time
	Duration   time.Duration // How long in the previous state
	Confidence float64       // Voice confidence at transition
}

VADEvent represents a state transition in VAD.

type VADParams ¶

type VADParams struct {
	// Confidence threshold for voice detection (0.0-1.0, default: 0.5).
	// Higher values require more confidence before triggering.
	Confidence float64

	// StartSecs is seconds of speech required to trigger VADStateSpeaking (default: 0.2).
	// Prevents false starts from brief noise.
	StartSecs float64

	// StopSecs is seconds of silence required to trigger VADStateQuiet (default: 0.8).
	// Allows natural pauses without ending turn.
	StopSecs float64

	// MinVolume is the minimum RMS volume threshold (default: 0.01).
	// Audio below this is treated as silence.
	MinVolume float64

	// SampleRate is the audio sample rate in Hz (default: 16000).
	SampleRate int
}

VADParams configures voice activity detection behavior.

func DefaultVADParams ¶

func DefaultVADParams() VADParams

DefaultVADParams returns sensible defaults for voice activity detection.

func (VADParams) Validate ¶

func (p VADParams) Validate() error

Validate checks that VAD parameters are within acceptable ranges.

type VADState ¶

type VADState int

VADState represents the current voice activity state.

const (
	// VADStateQuiet indicates no voice activity detected.
	VADStateQuiet VADState = iota
	// VADStateStarting indicates voice is starting (within start threshold).
	VADStateStarting
	// VADStateSpeaking indicates active speech.
	VADStateSpeaking
	// VADStateStopping indicates voice is stopping (within stop threshold).
	VADStateStopping
)

func (VADState) String ¶

func (s VADState) String() string

String returns a human-readable representation of the VAD state.

type ValidationError ¶

type ValidationError struct {
	Field   string
	Message string
}

ValidationError represents a parameter validation error.

func (*ValidationError) Error ¶

func (e *ValidationError) Error() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL