vad

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrSileroUnavailable = errors.New("silero VAD is not available in this build")

ErrSileroUnavailable is returned when Silero support cannot be used.

Functions

This section is empty.

Types

type Analyzer

type Analyzer interface {
	// SetSampleRate configures the audio sample rate. Must be called before
	// Analyze; implementations may clamp/validate as needed.
	SetSampleRate(sampleRate int)
	// SetParams updates the VAD parameters. Zero-values pick sensible defaults.
	SetParams(params Params)
	// Params returns the current VAD parameters.
	Params() Params
	// Analyze consumes audio for this stream, updates internal state, and
	// returns the current state, last confidence, and last smoothed volume.
	//
	// Audio is expected to be 16-bit PCM mono, matching audio.Frame/Data.
	Analyze(buf []byte) (State, float64, float64, error)
}

Analyzer is the high-level VAD interface, similar to the Python VADAnalyzer.

Implementations are safe for concurrent use from a single audio stream (the usual case in this project); calls are internally serialised via a mutex.

func NewEnergyAnalyzer

func NewEnergyAnalyzer(p Params) Analyzer

NewEnergyAnalyzer returns an Analyzer that uses EnergyAnalyzerBackend.

func NewSileroAnalyzer

func NewSileroAnalyzer(p Params, sampleRate int) (Analyzer, error)

NewSileroAnalyzer returns (nil, ErrSileroUnavailable) when Silero support is not compiled in.

type AnalyzerDetector

type AnalyzerDetector struct {
	Analyzer Analyzer
}

AnalyzerDetector bridges a VAD Analyzer to the existing Detector interface.

func (*AnalyzerDetector) IsSpeech

func (d *AnalyzerDetector) IsSpeech(f audio.Frame) (bool, error)

IsSpeech reports true when the underlying analyzer is in StateSpeaking.

func (*AnalyzerDetector) SetSampleRate

func (d *AnalyzerDetector) SetSampleRate(sampleRate int)

SetSampleRate configures the analyzer for the given pipeline input rate.

type Detector

type Detector interface {
	IsSpeech(f audio.Frame) (bool, error)
	SetSampleRate(sampleRate int)
}

Detector decides whether a given audio frame contains speech. Implementations should assume 16-bit PCM mono by default. SetSampleRate configures the detector for the given pipeline input rate (e.g. 16000).

type EnergyAnalyzerBackend

type EnergyAnalyzerBackend struct {
	Threshold float64
}

EnergyAnalyzerBackend is a simple confidence backend based on RMS energy.

type EnergyDetector

type EnergyDetector struct {
	Threshold float64
	// contains filtered or unexported fields
}

EnergyDetector is preserved for compatibility; it wraps an AnalyzerDetector using an internal EnergyAnalyzer.

func NewEnergyDetector

func NewEnergyDetector() *EnergyDetector

NewEnergyDetector creates an EnergyDetector with a reasonable default threshold.

func NewEnergyDetectorWithParams

func NewEnergyDetectorWithParams(p Params) *EnergyDetector

NewEnergyDetectorWithParams allows callers to override Params; zero-values pick defaults.

func (*EnergyDetector) IsSpeech

func (e *EnergyDetector) IsSpeech(f audio.Frame) (bool, error)

IsSpeech delegates to the internal AnalyzerDetector.

func (*EnergyDetector) SetSampleRate

func (e *EnergyDetector) SetSampleRate(sampleRate int)

SetSampleRate sets the sample rate on the internal analyzer (e.g. 16000 for pipeline input).

type Params

type Params struct {
	// Confidence is the minimum voice confidence (0..1) required to treat audio
	// as speech.
	Confidence float64
	// StartSecs is how long speech must be continuously detected before we move
	// from Quiet->Starting->Speaking.
	StartSecs float64
	// StopSecs is how long silence must be observed before we move from
	// Speaking->Stopping->Quiet.
	StopSecs float64
	// MinVolume is the minimum smoothed volume (0..1) required to treat audio
	// as speech. This is a second gate in addition to Confidence.
	MinVolume float64
	// Threshold is the RMS energy threshold for energy-based VAD (e.g. 0.02).
	// Used by EnergyAnalyzerBackend; zero means use default.
	Threshold float64
}

Params configures Voice Activity Detection behaviour.

type State

type State int

State represents the high-level VAD state.

const (
	StateQuiet State = iota + 1
	StateStarting
	StateSpeaking
	StateStopping
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL