turn

package

v0.2.0 Latest Latest Go to latest Published: May 16, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Voxray-AI/Voxray

Links

Open Source Insights

Documentation ¶

Overview ¶

Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).

Index ¶

Constants
type Analyzer
type EndOfTurnResult
type EndOfTurnState
- func (s EndOfTurnState) String() string
type Params
type SilenceTurnAnalyzer
- func NewSilenceTurnAnalyzer(params Params) *SilenceTurnAnalyzer
- func NewSilenceTurnAnalyzerWithSampleRate(params Params, sampleRate int) *SilenceTurnAnalyzer
type SilenceUserTurnStopStrategy
type UserTurnController
- func NewUserTurnController(start UserTurnStartStrategy, stop UserTurnStopStrategy, ...) *UserTurnController
type UserTurnStartStrategy
type UserTurnStopStrategy
type VADUserTurnStartStrategy

Constants ¶

View Source

const (
	DefaultStopSecs        = 3
	DefaultPreSpeechMs     = 500
	DefaultMaxDurationSecs = 8
)

Default silence-based turn parameters.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Analyzer ¶

type Analyzer interface {
	// AppendAudio adds audio and returns the current end-of-turn state. It must
	// be fast and non-blocking; callers may invoke this on every audio frame.
	AppendAudio(buffer []byte, isSpeech bool) EndOfTurnState
	// AnalyzeEndOfTurn returns the last state synchronously as a cheap snapshot.
	AnalyzeEndOfTurn(ctx context.Context) (EndOfTurnState, error)
	// AnalyzeEndOfTurnAsync runs analysis in a goroutine and returns a channel
	// that receives one EndOfTurnResult then closes. Callers should select on
	// ctx.Done() and the channel. This is intended for analyzers that do
	// heavier work (e.g. ML); the silence implementation runs the same logic
	// in a goroutine but still follows the same contract.
	AnalyzeEndOfTurnAsync(ctx context.Context) <-chan EndOfTurnResult
	// SpeechTriggered reports whether speech has been detected and analysis is active.
	SpeechTriggered() bool
	// SetSampleRate sets the sample rate for audio processing.
	SetSampleRate(rate int)
	// Clear resets the analyzer to initial state.
	Clear()
	// UpdateVADStartSecs updates the VAD start trigger time (for pre-speech padding).
	UpdateVADStartSecs(secs float64)
	// UpdateParams updates turn parameters (e.g. StopSecs for IVR mode). Used when receiving VADParamsUpdateFrame.
	UpdateParams(p Params)
}

Analyzer determines when a user has finished speaking (end of turn). It matches the Python BaseTurnAnalyzer interface.

Implementations should keep AppendAudio cheap and non-blocking; it is called for every incoming audio chunk to update lightweight internal state and bookkeeping. AnalyzeEndOfTurn returns a synchronous snapshot of the current end-of-turn state derived from that internal state.

AnalyzeEndOfTurnAsync exposes the same information via a goroutine-backed channel. Implementations may perform heavier work inside the goroutine (for example, ML-based models), but they must:

Send exactly one EndOfTurnResult on the returned channel, then close it.
Be safe to call repeatedly; implementations should internally deduplicate or cache work if repeated async calls would otherwise be expensive.
Respect ctx.Done(): if the context is cancelled before a result is sent, they should send a non-Complete state with Err set to ctx.Err().

type EndOfTurnResult ¶

type EndOfTurnResult struct {
	State EndOfTurnState
	Err   error
}

EndOfTurnResult is the result of an async end-of-turn analysis.

type EndOfTurnState ¶

type EndOfTurnState int

EndOfTurnState represents whether the current turn is complete.

const (
	// Incomplete indicates the user is still speaking or may continue.
	Incomplete EndOfTurnState = iota
	// Complete indicates the user has finished their turn.
	Complete
)

func (EndOfTurnState) String ¶

func (s EndOfTurnState) String() string

type Params ¶

type Params struct {
	StopSecs        float64 // silence duration in seconds to end turn
	PreSpeechMs     float64 // milliseconds of audio before speech start to include
	MaxDurationSecs float64 // maximum segment duration in seconds
}

Params holds configuration for turn analysis (e.g. silence-based).

type SilenceTurnAnalyzer ¶

type SilenceTurnAnalyzer struct {
	// contains filtered or unexported fields
}

SilenceTurnAnalyzer implements Analyzer using silence duration after speech. When silence exceeds StopSecs after speech has been detected, the turn is Complete.

func NewSilenceTurnAnalyzer ¶

func NewSilenceTurnAnalyzer(params Params) *SilenceTurnAnalyzer

NewSilenceTurnAnalyzer creates a silence-based turn analyzer with the given params. Zero values in params use DefaultStopSecs, DefaultPreSpeechMs, DefaultMaxDurationSecs.

func NewSilenceTurnAnalyzerWithSampleRate ¶

func NewSilenceTurnAnalyzerWithSampleRate(params Params, sampleRate int) *SilenceTurnAnalyzer

NewSilenceTurnAnalyzerWithSampleRate is like NewSilenceTurnAnalyzer but sets an initial sample rate.

func (*SilenceTurnAnalyzer) AnalyzeEndOfTurn ¶

func (s *SilenceTurnAnalyzer) AnalyzeEndOfTurn(ctx context.Context) (EndOfTurnState, error)

func (*SilenceTurnAnalyzer) AnalyzeEndOfTurnAsync ¶

func (s *SilenceTurnAnalyzer) AnalyzeEndOfTurnAsync(ctx context.Context) <-chan EndOfTurnResult

AnalyzeEndOfTurnAsync runs the end-of-turn analysis in a separate goroutine and returns a channel that receives one EndOfTurnResult then closes. Callers should select on the channel and ctx.Done(). For the silence impl the work is trivial; the async pattern supports future ML-based analyzers that do heavier work in the goroutine.

func (*SilenceTurnAnalyzer) AppendAudio ¶

func (s *SilenceTurnAnalyzer) AppendAudio(buffer []byte, isSpeech bool) EndOfTurnState

func (*SilenceTurnAnalyzer) Clear ¶

func (s *SilenceTurnAnalyzer) Clear()

func (*SilenceTurnAnalyzer) SetSampleRate ¶

func (s *SilenceTurnAnalyzer) SetSampleRate(rate int)

func (*SilenceTurnAnalyzer) SpeechTriggered ¶

func (s *SilenceTurnAnalyzer) SpeechTriggered() bool

func (*SilenceTurnAnalyzer) UpdateParams ¶

func (s *SilenceTurnAnalyzer) UpdateParams(p Params)

UpdateParams updates turn parameters (e.g. from VADParamsUpdateFrame).

func (*SilenceTurnAnalyzer) UpdateVADStartSecs ¶

func (s *SilenceTurnAnalyzer) UpdateVADStartSecs(secs float64)

type SilenceUserTurnStopStrategy ¶

type SilenceUserTurnStopStrategy struct {
	// contains filtered or unexported fields
}

SilenceUserTurnStopStrategy stops a turn when VAD reports that the user has stopped speaking. More sophisticated behavior (e.g. using the turn analyzer state, additional silence windows) can be added later.

func (*SilenceUserTurnStopStrategy) OnUserStoppedSpeaking ¶

func (s *SilenceUserTurnStopStrategy) OnUserStoppedSpeaking()

func (*SilenceUserTurnStopStrategy) Reset ¶

func (s *SilenceUserTurnStopStrategy) Reset()

func (*SilenceUserTurnStopStrategy) ShouldStopTurn ¶

func (s *SilenceUserTurnStopStrategy) ShouldStopTurn() bool

type UserTurnController ¶

type UserTurnController struct {
	// contains filtered or unexported fields
}

UserTurnController manages high-level user turn state: when a user turn starts and stops, and when the user has been idle for a configured timeout. It is a lightweight analogue of UserTurnController + UserIdleController, adapted to the existing Go pipeline.

func NewUserTurnController ¶

func NewUserTurnController(
	start UserTurnStartStrategy,
	stop UserTurnStopStrategy,
	userTurnStopTimeout float64,
	userIdleTimeout float64,
	onPushFrame func(ctx context.Context, f frames.Frame) error,
) *UserTurnController

NewUserTurnController creates a new controller with the given strategies and timeouts.

func (*UserTurnController) NotifyBotStartedSpeaking ¶

func (c *UserTurnController) NotifyBotStartedSpeaking()

NotifyBotStartedSpeaking cancels any pending idle detection.

func (*UserTurnController) NotifyBotStoppedSpeaking ¶

func (c *UserTurnController) NotifyBotStoppedSpeaking(ctx context.Context)

NotifyBotStoppedSpeaking should be called when the bot finishes speaking, so that idle detection can begin.

func (*UserTurnController) ProcessVADUpdate ¶

func (c *UserTurnController) ProcessVADUpdate(ctx context.Context, isSpeech bool) error

ProcessVADUpdate should be called by the owning processor when VAD indicates whether the user is currently speaking.

type UserTurnStartStrategy ¶

type UserTurnStartStrategy interface {
	Reset()
	// OnUserStartedSpeaking should be called when a VAD- or externally-driven
	// event indicates that the user has started speaking.
	OnUserStartedSpeaking()
	// ShouldStartTurn returns true when this strategy believes a user turn
	// should be started given current internal state.
	ShouldStartTurn() bool
}

UserTurnStartStrategy decides when a user turn starts based on incoming frames.

type UserTurnStopStrategy ¶

type UserTurnStopStrategy interface {
	Reset()
	// OnUserStoppedSpeaking should be called when VAD or other signals indicate
	// the user has stopped speaking.
	OnUserStoppedSpeaking()
	// ShouldStopTurn returns true when this strategy believes the user turn
	// should be stopped (e.g. after sufficient silence).
	ShouldStopTurn() bool
}

UserTurnStopStrategy decides when a user turn stops based on incoming frames.

type VADUserTurnStartStrategy ¶

type VADUserTurnStartStrategy struct {
	// contains filtered or unexported fields
}

VADUserTurnStartStrategy starts a turn as soon as VAD reports speech. It is intentionally simple for now; more advanced heuristics (e.g. min duration, transcription-based triggers) can be layered in later.

func (*VADUserTurnStartStrategy) OnUserStartedSpeaking ¶

func (s *VADUserTurnStartStrategy) OnUserStartedSpeaking()

func (*VADUserTurnStartStrategy) Reset ¶

func (s *VADUserTurnStartStrategy) Reset()

func (*VADUserTurnStartStrategy) ShouldStartTurn ¶

func (s *VADUserTurnStartStrategy) ShouldStartTurn() bool

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL