Documentation
¶
Overview ¶
Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).
Index ¶
- Constants
- type Analyzer
- type EndOfTurnResult
- type EndOfTurnState
- type Params
- type SilenceTurnAnalyzer
- func (s *SilenceTurnAnalyzer) AnalyzeEndOfTurn(ctx context.Context) (EndOfTurnState, error)
- func (s *SilenceTurnAnalyzer) AnalyzeEndOfTurnAsync(ctx context.Context) <-chan EndOfTurnResult
- func (s *SilenceTurnAnalyzer) AppendAudio(buffer []byte, isSpeech bool) EndOfTurnState
- func (s *SilenceTurnAnalyzer) Clear()
- func (s *SilenceTurnAnalyzer) SetSampleRate(rate int)
- func (s *SilenceTurnAnalyzer) SpeechTriggered() bool
- func (s *SilenceTurnAnalyzer) UpdateParams(p Params)
- func (s *SilenceTurnAnalyzer) UpdateVADStartSecs(secs float64)
- type SilenceUserTurnStopStrategy
- type UserTurnController
- type UserTurnStartStrategy
- type UserTurnStopStrategy
- type VADUserTurnStartStrategy
Constants ¶
const ( DefaultStopSecs = 3 DefaultPreSpeechMs = 500 DefaultMaxDurationSecs = 8 )
Default silence-based turn parameters.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Analyzer ¶
type Analyzer interface {
// AppendAudio adds audio and returns the current end-of-turn state. It must
// be fast and non-blocking; callers may invoke this on every audio frame.
AppendAudio(buffer []byte, isSpeech bool) EndOfTurnState
// AnalyzeEndOfTurn returns the last state synchronously as a cheap snapshot.
AnalyzeEndOfTurn(ctx context.Context) (EndOfTurnState, error)
// AnalyzeEndOfTurnAsync runs analysis in a goroutine and returns a channel
// that receives one EndOfTurnResult then closes. Callers should select on
// ctx.Done() and the channel. This is intended for analyzers that do
// heavier work (e.g. ML); the silence implementation runs the same logic
// in a goroutine but still follows the same contract.
AnalyzeEndOfTurnAsync(ctx context.Context) <-chan EndOfTurnResult
// SpeechTriggered reports whether speech has been detected and analysis is active.
SpeechTriggered() bool
// SetSampleRate sets the sample rate for audio processing.
SetSampleRate(rate int)
// Clear resets the analyzer to initial state.
Clear()
// UpdateVADStartSecs updates the VAD start trigger time (for pre-speech padding).
UpdateVADStartSecs(secs float64)
// UpdateParams updates turn parameters (e.g. StopSecs for IVR mode). Used when receiving VADParamsUpdateFrame.
UpdateParams(p Params)
}
Analyzer determines when a user has finished speaking (end of turn). It matches the Python BaseTurnAnalyzer interface.
Implementations should keep AppendAudio cheap and non-blocking; it is called for every incoming audio chunk to update lightweight internal state and bookkeeping. AnalyzeEndOfTurn returns a synchronous snapshot of the current end-of-turn state derived from that internal state.
AnalyzeEndOfTurnAsync exposes the same information via a goroutine-backed channel. Implementations may perform heavier work inside the goroutine (for example, ML-based models), but they must:
- Send exactly one EndOfTurnResult on the returned channel, then close it.
- Be safe to call repeatedly; implementations should internally deduplicate or cache work if repeated async calls would otherwise be expensive.
- Respect ctx.Done(): if the context is cancelled before a result is sent, they should send a non-Complete state with Err set to ctx.Err().
type EndOfTurnResult ¶
type EndOfTurnResult struct {
State EndOfTurnState
Err error
}
EndOfTurnResult is the result of an async end-of-turn analysis.
type EndOfTurnState ¶
type EndOfTurnState int
EndOfTurnState represents whether the current turn is complete.
const ( // Incomplete indicates the user is still speaking or may continue. Incomplete EndOfTurnState = iota // Complete indicates the user has finished their turn. Complete )
func (EndOfTurnState) String ¶
func (s EndOfTurnState) String() string
type Params ¶
type Params struct {
StopSecs float64 // silence duration in seconds to end turn
PreSpeechMs float64 // milliseconds of audio before speech start to include
MaxDurationSecs float64 // maximum segment duration in seconds
}
Params holds configuration for turn analysis (e.g. silence-based).
type SilenceTurnAnalyzer ¶
type SilenceTurnAnalyzer struct {
// contains filtered or unexported fields
}
SilenceTurnAnalyzer implements Analyzer using silence duration after speech. When silence exceeds StopSecs after speech has been detected, the turn is Complete.
func NewSilenceTurnAnalyzer ¶
func NewSilenceTurnAnalyzer(params Params) *SilenceTurnAnalyzer
NewSilenceTurnAnalyzer creates a silence-based turn analyzer with the given params. Zero values in params use DefaultStopSecs, DefaultPreSpeechMs, DefaultMaxDurationSecs.
func NewSilenceTurnAnalyzerWithSampleRate ¶
func NewSilenceTurnAnalyzerWithSampleRate(params Params, sampleRate int) *SilenceTurnAnalyzer
NewSilenceTurnAnalyzerWithSampleRate is like NewSilenceTurnAnalyzer but sets an initial sample rate.
func (*SilenceTurnAnalyzer) AnalyzeEndOfTurn ¶
func (s *SilenceTurnAnalyzer) AnalyzeEndOfTurn(ctx context.Context) (EndOfTurnState, error)
func (*SilenceTurnAnalyzer) AnalyzeEndOfTurnAsync ¶
func (s *SilenceTurnAnalyzer) AnalyzeEndOfTurnAsync(ctx context.Context) <-chan EndOfTurnResult
AnalyzeEndOfTurnAsync runs the end-of-turn analysis in a separate goroutine and returns a channel that receives one EndOfTurnResult then closes. Callers should select on the channel and ctx.Done(). For the silence impl the work is trivial; the async pattern supports future ML-based analyzers that do heavier work in the goroutine.
func (*SilenceTurnAnalyzer) AppendAudio ¶
func (s *SilenceTurnAnalyzer) AppendAudio(buffer []byte, isSpeech bool) EndOfTurnState
func (*SilenceTurnAnalyzer) Clear ¶
func (s *SilenceTurnAnalyzer) Clear()
func (*SilenceTurnAnalyzer) SetSampleRate ¶
func (s *SilenceTurnAnalyzer) SetSampleRate(rate int)
func (*SilenceTurnAnalyzer) SpeechTriggered ¶
func (s *SilenceTurnAnalyzer) SpeechTriggered() bool
func (*SilenceTurnAnalyzer) UpdateParams ¶
func (s *SilenceTurnAnalyzer) UpdateParams(p Params)
UpdateParams updates turn parameters (e.g. from VADParamsUpdateFrame).
func (*SilenceTurnAnalyzer) UpdateVADStartSecs ¶
func (s *SilenceTurnAnalyzer) UpdateVADStartSecs(secs float64)
type SilenceUserTurnStopStrategy ¶
type SilenceUserTurnStopStrategy struct {
// contains filtered or unexported fields
}
SilenceUserTurnStopStrategy stops a turn when VAD reports that the user has stopped speaking. More sophisticated behavior (e.g. using the turn analyzer state, additional silence windows) can be added later.
func (*SilenceUserTurnStopStrategy) OnUserStoppedSpeaking ¶
func (s *SilenceUserTurnStopStrategy) OnUserStoppedSpeaking()
func (*SilenceUserTurnStopStrategy) Reset ¶
func (s *SilenceUserTurnStopStrategy) Reset()
func (*SilenceUserTurnStopStrategy) ShouldStopTurn ¶
func (s *SilenceUserTurnStopStrategy) ShouldStopTurn() bool
type UserTurnController ¶
type UserTurnController struct {
// contains filtered or unexported fields
}
UserTurnController manages high-level user turn state: when a user turn starts and stops, and when the user has been idle for a configured timeout. It is a lightweight analogue of UserTurnController + UserIdleController, adapted to the existing Go pipeline.
func NewUserTurnController ¶
func NewUserTurnController( start UserTurnStartStrategy, stop UserTurnStopStrategy, userTurnStopTimeout float64, userIdleTimeout float64, onPushFrame func(ctx context.Context, f frames.Frame) error, ) *UserTurnController
NewUserTurnController creates a new controller with the given strategies and timeouts.
func (*UserTurnController) NotifyBotStartedSpeaking ¶
func (c *UserTurnController) NotifyBotStartedSpeaking()
NotifyBotStartedSpeaking cancels any pending idle detection.
func (*UserTurnController) NotifyBotStoppedSpeaking ¶
func (c *UserTurnController) NotifyBotStoppedSpeaking(ctx context.Context)
NotifyBotStoppedSpeaking should be called when the bot finishes speaking, so that idle detection can begin.
func (*UserTurnController) ProcessVADUpdate ¶
func (c *UserTurnController) ProcessVADUpdate(ctx context.Context, isSpeech bool) error
ProcessVADUpdate should be called by the owning processor when VAD indicates whether the user is currently speaking.
type UserTurnStartStrategy ¶
type UserTurnStartStrategy interface {
Reset()
// OnUserStartedSpeaking should be called when a VAD- or externally-driven
// event indicates that the user has started speaking.
OnUserStartedSpeaking()
// ShouldStartTurn returns true when this strategy believes a user turn
// should be started given current internal state.
ShouldStartTurn() bool
}
UserTurnStartStrategy decides when a user turn starts based on incoming frames.
type UserTurnStopStrategy ¶
type UserTurnStopStrategy interface {
Reset()
// OnUserStoppedSpeaking should be called when VAD or other signals indicate
// the user has stopped speaking.
OnUserStoppedSpeaking()
// ShouldStopTurn returns true when this strategy believes the user turn
// should be stopped (e.g. after sufficient silence).
ShouldStopTurn() bool
}
UserTurnStopStrategy decides when a user turn stops based on incoming frames.
type VADUserTurnStartStrategy ¶
type VADUserTurnStartStrategy struct {
// contains filtered or unexported fields
}
VADUserTurnStartStrategy starts a turn as soon as VAD reports speech. It is intentionally simple for now; more advanced heuristics (e.g. min duration, transcription-based triggers) can be layered in later.
func (*VADUserTurnStartStrategy) OnUserStartedSpeaking ¶
func (s *VADUserTurnStartStrategy) OnUserStartedSpeaking()
func (*VADUserTurnStartStrategy) Reset ¶
func (s *VADUserTurnStartStrategy) Reset()
func (*VADUserTurnStartStrategy) ShouldStartTurn ¶
func (s *VADUserTurnStartStrategy) ShouldStartTurn() bool