vad

package

v0.40.2 Latest Latest Go to latest Published: May 26, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kombifyio/SpeechKit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package vad is the voice-activity-detection layer. The Silero ONNX path (silero.go) is currently a no-op stub pending the sherpa-onnx VAD migration; the RMS-level fallback (level_vad.go) arms the dictation silence-cutoff watcher in the meantime. The upstream Silero implementation will return when the ONNX runtime ABI clean-up lands.

Audit 2026-05-24 maintainability sweep.

Index ¶

Constants
Variables
type Detector
type LevelVAD
- func NewLevelVAD() *LevelVAD
- func (v *LevelVAD) ProcessFrame(pcm []int16) (float32, error)
- func (v *LevelVAD) Reset()
type SileroVAD
- func NewSileroVAD(_ string) (*SileroVAD, error)

Constants ¶

View Source

const (
	SampleRate     = 16000
	FrameSize      = 512
	BytesPerSample = 2
	FrameBytes     = FrameSize * BytesPerSample
)

Variables ¶

View Source

var ErrUnavailable = errors.New("vad: silero VAD temporarily unavailable on this build — tracked in beads (replace yalue/onnxruntime_go usage with sherpa-onnx VAD to avoid double-runtime conflict)")

ErrUnavailable is returned by NewSileroVAD until VAD is re-implemented on the unified sherpa-onnx runtime. See package-level note for rationale.

Functions ¶

This section is empty.

Types ¶

type Detector ¶

type Detector interface {
	ProcessFrame([]int16) (float32, error)
	Reset()
}

Detector is the speech-probability contract consumed by dictation processors.

type LevelVAD ¶ added in v0.37.8

type LevelVAD struct {
	// contains filtered or unexported fields
}

LevelVAD is a tiny RMS-based voice activity detector that needs no ONNX runtime. It exists as a fallback for the dictation pipeline while SileroVAD is disabled due to the onnxruntime/sherpa-onnx ABI conflict (see silero.go for the full story).

Speech detection is intentionally crude: compute the RMS of the frame, normalise to [0, 1] against the int16 full scale, and apply a piecewise mapping that produces 0 for noise-floor levels, 1 for clearly voiced audio, and a linear ramp in between. Consumers see the same float32 probability the Silero model would have returned, so they can keep using the existing threshold (DictationSegmenter uses 0.5).

Trade-offs:

No frequency awareness — keyboard typing, fan noise, or paper rustle above the threshold counts as speech and resets the silence timer.
No hangover smoothing — short consonant gaps register as silence at the frame level. The DictationSegmenter's pause threshold (default 700 ms) absorbs this in practice.

The numbers are tuned against the desktop logs (`Overlay audio: raw=0.006` for room silence, `raw=0.012-0.026` for normal voice). The wider window here errs toward "speech" so the silence-cutoff timer waits for genuine quiet rather than tripping on a breath.

func NewLevelVAD ¶ added in v0.37.8

func NewLevelVAD() *LevelVAD

NewLevelVAD constructs a level-based detector with the production defaults. Tests can override the thresholds by mutating the struct directly — there is no setter API yet because the only caller is the dictation bootstrap.

func (*LevelVAD) ProcessFrame ¶ added in v0.37.8

func (v *LevelVAD) ProcessFrame(pcm []int16) (float32, error)

ProcessFrame implements Detector by returning a per-frame speech probability. The contract is the same as the Silero binding: 0 means silence, 1 means active speech, and the consumer compares against its own threshold.

func (*LevelVAD) Reset ¶ added in v0.37.8

func (v *LevelVAD) Reset()

Reset is a no-op — the detector keeps no per-session state.

type SileroVAD ¶

type SileroVAD struct {
	// contains filtered or unexported fields
}

SileroVAD is the public type the package previously exported. Kept as an empty struct so callers compile; constructor returns ErrUnavailable.

func NewSileroVAD ¶

func NewSileroVAD(_ string) (*SileroVAD, error)

NewSileroVAD always returns ErrUnavailable in this build. Adapters (cmd/speechkit/dictation_session.go) detect the error, log a warning and degrade to manual-release hotkey behaviour.

func (*SileroVAD) Close ¶

func (v *SileroVAD) Close()

Close is a no-op for the stub.

func (*SileroVAD) ProcessFrame ¶

func (v *SileroVAD) ProcessFrame(_ []int16) (float32, error)

ProcessFrame is a no-op for the stub. The real implementation will return the speech probability from the sherpa-onnx VAD model.

func (*SileroVAD) Reset ¶

func (v *SileroVAD) Reset()

Reset is a no-op for the stub.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL