Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
ErrSileroUnavailable is returned when Silero support cannot be used.
Functions ¶
This section is empty.
Types ¶
type Analyzer ¶
type Analyzer interface {
// SetSampleRate configures the audio sample rate. Must be called before
// Analyze; implementations may clamp/validate as needed.
SetSampleRate(sampleRate int)
// SetParams updates the VAD parameters. Zero-values pick sensible defaults.
SetParams(params Params)
// Params returns the current VAD parameters.
Params() Params
// Analyze consumes audio for this stream, updates internal state, and
// returns the current state, last confidence, and last smoothed volume.
//
// Audio is expected to be 16-bit PCM mono, matching audio.Frame/Data.
Analyze(buf []byte) (State, float64, float64, error)
}
Analyzer is the high-level VAD interface, similar to the Python VADAnalyzer.
Implementations are safe for concurrent use from a single audio stream (the usual case in this project); calls are internally serialised via a mutex.
func NewEnergyAnalyzer ¶
NewEnergyAnalyzer returns an Analyzer that uses EnergyAnalyzerBackend.
type AnalyzerDetector ¶
type AnalyzerDetector struct {
Analyzer Analyzer
}
AnalyzerDetector bridges a VAD Analyzer to the existing Detector interface.
func (*AnalyzerDetector) IsSpeech ¶
func (d *AnalyzerDetector) IsSpeech(f audio.Frame) (bool, error)
IsSpeech reports true when the underlying analyzer is in StateSpeaking.
func (*AnalyzerDetector) SetSampleRate ¶
func (d *AnalyzerDetector) SetSampleRate(sampleRate int)
SetSampleRate configures the analyzer for the given pipeline input rate.
type Detector ¶
Detector decides whether a given audio frame contains speech. Implementations should assume 16-bit PCM mono by default. SetSampleRate configures the detector for the given pipeline input rate (e.g. 16000).
type EnergyAnalyzerBackend ¶
type EnergyAnalyzerBackend struct {
Threshold float64
}
EnergyAnalyzerBackend is a simple confidence backend based on RMS energy.
type EnergyDetector ¶
type EnergyDetector struct {
Threshold float64
// contains filtered or unexported fields
}
EnergyDetector is preserved for compatibility; it wraps an AnalyzerDetector using an internal EnergyAnalyzer.
func NewEnergyDetector ¶
func NewEnergyDetector() *EnergyDetector
NewEnergyDetector creates an EnergyDetector with a reasonable default threshold.
func NewEnergyDetectorWithParams ¶
func NewEnergyDetectorWithParams(p Params) *EnergyDetector
NewEnergyDetectorWithParams allows callers to override Params; zero-values pick defaults.
func (*EnergyDetector) IsSpeech ¶
func (e *EnergyDetector) IsSpeech(f audio.Frame) (bool, error)
IsSpeech delegates to the internal AnalyzerDetector.
func (*EnergyDetector) SetSampleRate ¶
func (e *EnergyDetector) SetSampleRate(sampleRate int)
SetSampleRate sets the sample rate on the internal analyzer (e.g. 16000 for pipeline input).
type Params ¶
type Params struct {
// Confidence is the minimum voice confidence (0..1) required to treat audio
// as speech.
Confidence float64
// StartSecs is how long speech must be continuously detected before we move
// from Quiet->Starting->Speaking.
StartSecs float64
// StopSecs is how long silence must be observed before we move from
// Speaking->Stopping->Quiet.
StopSecs float64
// MinVolume is the minimum smoothed volume (0..1) required to treat audio
// as speech. This is a second gate in addition to Confidence.
MinVolume float64
// Threshold is the RMS energy threshold for energy-based VAD (e.g. 0.02).
// Used by EnergyAnalyzerBackend; zero means use default.
Threshold float64
}
Params configures Voice Activity Detection behaviour.