vad

package
v0.40.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package vad is the voice-activity-detection layer. The Silero ONNX path (silero.go) is currently a no-op stub pending the sherpa-onnx VAD migration; the RMS-level fallback (level_vad.go) arms the dictation silence-cutoff watcher in the meantime. The upstream Silero implementation will return when the ONNX runtime ABI clean-up lands.

Audit 2026-05-24 maintainability sweep.

Index

Constants

View Source
const (
	SampleRate     = 16000
	FrameSize      = 512
	BytesPerSample = 2
	FrameBytes     = FrameSize * BytesPerSample
)

Variables

View Source
var ErrUnavailable = errors.New("vad: silero VAD temporarily unavailable on this build — tracked in beads (replace yalue/onnxruntime_go usage with sherpa-onnx VAD to avoid double-runtime conflict)")

ErrUnavailable is returned by NewSileroVAD until VAD is re-implemented on the unified sherpa-onnx runtime. See package-level note for rationale.

Functions

This section is empty.

Types

type Detector

type Detector interface {
	ProcessFrame([]int16) (float32, error)
	Reset()
}

Detector is the speech-probability contract consumed by dictation processors.

type LevelVAD added in v0.37.8

type LevelVAD struct {
	// contains filtered or unexported fields
}

LevelVAD is a tiny RMS-based voice activity detector that needs no ONNX runtime. It exists as a fallback for the dictation pipeline while SileroVAD is disabled due to the onnxruntime/sherpa-onnx ABI conflict (see silero.go for the full story).

Speech detection is intentionally crude: compute the RMS of the frame, normalise to [0, 1] against the int16 full scale, and apply a piecewise mapping that produces 0 for noise-floor levels, 1 for clearly voiced audio, and a linear ramp in between. Consumers see the same float32 probability the Silero model would have returned, so they can keep using the existing threshold (DictationSegmenter uses 0.5).

Trade-offs:

  • No frequency awareness — keyboard typing, fan noise, or paper rustle above the threshold counts as speech and resets the silence timer.
  • No hangover smoothing — short consonant gaps register as silence at the frame level. The DictationSegmenter's pause threshold (default 700 ms) absorbs this in practice.

The numbers are tuned against the desktop logs (`Overlay audio: raw=0.006` for room silence, `raw=0.012-0.026` for normal voice). The wider window here errs toward "speech" so the silence-cutoff timer waits for genuine quiet rather than tripping on a breath.

func NewLevelVAD added in v0.37.8

func NewLevelVAD() *LevelVAD

NewLevelVAD constructs a level-based detector with the production defaults. Tests can override the thresholds by mutating the struct directly — there is no setter API yet because the only caller is the dictation bootstrap.

func (*LevelVAD) ProcessFrame added in v0.37.8

func (v *LevelVAD) ProcessFrame(pcm []int16) (float32, error)

ProcessFrame implements Detector by returning a per-frame speech probability. The contract is the same as the Silero binding: 0 means silence, 1 means active speech, and the consumer compares against its own threshold.

func (*LevelVAD) Reset added in v0.37.8

func (v *LevelVAD) Reset()

Reset is a no-op — the detector keeps no per-session state.

type SileroVAD

type SileroVAD struct {
	// contains filtered or unexported fields
}

SileroVAD is the public type the package previously exported. Kept as an empty struct so callers compile; constructor returns ErrUnavailable.

func NewSileroVAD

func NewSileroVAD(_ string) (*SileroVAD, error)

NewSileroVAD always returns ErrUnavailable in this build. Adapters (cmd/speechkit/dictation_session.go) detect the error, log a warning and degrade to manual-release hotkey behaviour.

func (*SileroVAD) Close

func (v *SileroVAD) Close()

Close is a no-op for the stub.

func (*SileroVAD) ProcessFrame

func (v *SileroVAD) ProcessFrame(_ []int16) (float32, error)

ProcessFrame is a no-op for the stub. The real implementation will return the speech probability from the sherpa-onnx VAD model.

func (*SileroVAD) Reset

func (v *SileroVAD) Reset()

Reset is a no-op for the stub.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL