audio

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

README

Audio

Package audio provides audio types, VAD (voice activity detection), turn detection (silence-based), codecs (µ-law, A-law), resampling, mixing, WAV, and DTMF WAV generation. Used by the voice pipeline and transport (e.g. WebRTC PCM/Opus).

Purpose

  • Types: Frame (PCM chunk with sample rate, channels, timestamp), Stream (Read/Write/Close).
  • VAD: vad.Detector (IsSpeech); EnergyAnalyzerBackend (RMS threshold); optional Silero backend.
  • Turn: turn.Analyzer (AppendAudio, AnalyzeEndOfTurn, AnalyzeEndOfTurnAsync); silence-based implementation; user strategies and controller for when to emit user-turn segments.
  • Codecs: µ-law, A-law encode/decode for telephony.
  • Resample: Sample rate conversion for STT/TTS (e.g. 48k → 16k).
  • Mix: Mix multiple PCM streams (e.g. user + bot for recording).
  • WAV: Read/write 16-bit PCM WAV.
  • DTMF: Generate DTMF tone WAV files.

Audio pipeline (VAD and turn)

flowchart LR
    PCM["PCM input\n16-bit mono"] --> VAD["vad.Detector\nIsSpeech"]
    VAD -->|"speech flag"| Turn["turn.Analyzer\nAppendAudio"]
    Turn -->|"EndOfTurnState"| Segment["User segment\nemit to STT"]
    Turn --> Silence["turn/silence\nsilence timeout"]
    Segment --> STT["STT pipeline"]
  • Incoming audio is passed to VAD; the pipeline (e.g. voice processor) appends audio to the turn analyzer. When the analyzer reports Complete (e.g. silence after speech), the buffered segment is sent to STT.

Exported symbols (root)

Symbol Type Description
Frame struct Data, SampleRate, NumChannels, Timestamp
Stream interface Read, Write, Close
PCM16MonoNumFrames(bytes) func len(bytes)/2
DefaultInSampleRate, DefaultOutSampleRate const 16000, 24000
Resample* funcs Sample rate conversion (see resample.go)
Mix* funcs Mix PCM buffers (see mix.go)
EncodeULaw, DecodeULaw, EncodeALaw, DecodeALaw funcs µ-law/A-law codecs
WAV, DTMF helpers funcs See wav.go, dtmf_wav.go

Subpackages

Path Description
vad Detector, EnergyAnalyzerBackend; IsSpeech, SetSampleRate; optional Silero
turn Analyzer interface; Params, EndOfTurnState, EndOfTurnResult; silence impl; user_controller, user_strategies
filters Audio filters (see filters.go)
mixers Mixer utilities (see mixers.go)
interruptions Interruption/barge-in handling

Concurrency

  • Frame/Stream: No internal state; safe for concurrent use where documented.
  • vad.Detector: Implementations may or may not be safe for concurrent use; typically one detector per pipeline.
  • turn.Analyzer: AppendAudio is called from one goroutine; AnalyzeEndOfTurn(Async) may be called from another; implementations use mutexes where needed.

Files (root)

File Description
audio.go PCM16MonoNumFrames, DefaultInSampleRate, DefaultOutSampleRate
types.go Frame, Stream
resample.go Resample helpers
mix.go Mix helpers
alaw.go, ulaw.go A-law, µ-law codecs
wav.go WAV read/write
dtmf_wav.go DTMF tone WAV generation

See also

Documentation

Overview

Package audio provides A-law (G.711 PCMA) encode/decode for 16-bit mono PCM.

Package audio provides minimal audio processing utilities for the Voxray system, focusing on 16-bit PCM mono audio formats used by STT and TTS services.

Package audio provides mix and interleave helpers for 16-bit mono PCM.

Package audio provides optional sample rate conversion (resample) for 16-bit mono PCM.

Package audio provides μ-law (G.711 PCMU) encode/decode for 16-bit mono PCM.

Package audio provides WAV decoding utilities.

Index

Constants

View Source
const (
	// DefaultInSampleRate is the typical sample rate for STT input (16kHz).
	DefaultInSampleRate = 16000
	// DefaultOutSampleRate is the typical sample rate for TTS output (24kHz).
	DefaultOutSampleRate = 24000
)

Default audio configuration constants for the Voxray system.

Variables

This section is empty.

Functions

func DecodeALaw

func DecodeALaw(alaw []byte) []byte

DecodeALaw converts A-law (PCMA) bytes to 16-bit little-endian PCM. Output length is len(alaw)*2.

func DecodeULaw

func DecodeULaw(ulaw []byte) []byte

DecodeULaw converts μ-law (PCMU) bytes to 16-bit little-endian PCM. Output length is len(ulaw)*2.

func DecodeWAVToPCM

func DecodeWAVToPCM(wav []byte) (pcm []byte, sampleRate int, err error)

DecodeWAVToPCM extracts raw 16-bit little-endian PCM and sample rate from WAV bytes. Returns (pcm, sampleRate, nil) or (nil, 0, error) for invalid/unsupported WAV. Supports standard PCM WAV with "fmt " and "data" chunks.

func EncodeALaw

func EncodeALaw(pcm []byte) []byte

EncodeALaw converts 16-bit little-endian PCM to A-law (PCMA) bytes. len(pcm) must be even; output length is len(pcm)/2.

func EncodeULaw

func EncodeULaw(pcm []byte) []byte

EncodeULaw converts 16-bit little-endian PCM to μ-law (PCMU) bytes. len(pcm) must be even; output length is len(pcm)/2.

func GenerateDTMFPCM

func GenerateDTMFPCM(sampleRate int, key string, toneDuration, gapDuration float64) ([]byte, error)

GenerateDTMFPCM generates 16-bit little-endian mono PCM for a DTMF key. key: "0"-"9", "star", "pound". toneDuration and gapDuration are in seconds.

func InterleaveStereo

func InterleaveStereo(left, right []byte) []byte

InterleaveStereo interleaves two 16-bit mono buffers as left, right, left, right, ... If lengths differ, the shorter channel is padded with zeros so both have the same length. Returns a new slice of length 4*max(samples in left, samples in right).

func MixMono

func MixMono(user, bot []byte) []byte

MixMono mixes two 16-bit little-endian mono PCM buffers by averaging samples. If lengths differ, the shorter buffer is padded with zeros to match the longer. Returns a new slice of length max(len(user), len(bot)).

func PCM16MonoNumFrames

func PCM16MonoNumFrames(bytes []byte) int

PCM16MonoNumFrames calculates the number of audio frames in a PCM16 mono buffer. Since each frame is 16 bits (2 bytes), the number of frames is total bytes divided by 2.

func Resample16Mono

func Resample16Mono(in []byte, inRate, outRate int, out []byte) []byte

Resample16Mono converts 16-bit mono PCM from inRate to outRate using linear interpolation. It performs a sample rate conversion to match the target service's requirements. in and out can be the same slice if inRate == outRate (no-op). Otherwise out must have capacity for the resampled length: len(out) >= len(in) * outRate / inRate (rounded up).

func Resample16MonoAlloc

func Resample16MonoAlloc(in []byte, inRate, outRate int) []byte

Resample16MonoAlloc returns a new slice with 16-bit mono PCM resampled from inRate to outRate. It is a convenience wrapper around Resample16Mono that handles allocation.

func WritePCM16MonoWAV

func WritePCM16MonoWAV(path string, pcm []byte, sampleRate int) error

WritePCM16MonoWAV writes 16-bit mono PCM to a WAV file.

Types

type Frame

type Frame struct {
	Data        []byte
	SampleRate  int
	NumChannels int
	Timestamp   time.Time
}

Frame represents a chunk of PCM audio in the Voxray system. It is a lightweight helper around raw bytes, separate from pipeline frames. Audio is expected to be 16-bit PCM little-endian.

type Stream

type Stream interface {
	Read() (Frame, error)
	Write(Frame) error
	Close() error
}

Stream represents a bidirectional audio source or sink. Implementations may wrap microphones, speakers, files, or network streams.

Directories

Path Synopsis
Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).
Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL