audio

package

v0.2.0 Latest Latest Go to latest Published: May 16, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Voxray-AI/Voxray

Links

Open Source Insights

README ¶

Audio

Package audio provides audio types, VAD (voice activity detection), turn detection (silence-based), codecs (µ-law, A-law), resampling, mixing, WAV, and DTMF WAV generation. Used by the voice pipeline and transport (e.g. WebRTC PCM/Opus).

Purpose

Types: Frame (PCM chunk with sample rate, channels, timestamp), Stream (Read/Write/Close).
VAD: vad.Detector (IsSpeech); EnergyAnalyzerBackend (RMS threshold); optional Silero backend.
Turn: turn.Analyzer (AppendAudio, AnalyzeEndOfTurn, AnalyzeEndOfTurnAsync); silence-based implementation; user strategies and controller for when to emit user-turn segments.
Codecs: µ-law, A-law encode/decode for telephony.
Resample: Sample rate conversion for STT/TTS (e.g. 48k → 16k).
Mix: Mix multiple PCM streams (e.g. user + bot for recording).
WAV: Read/write 16-bit PCM WAV.
DTMF: Generate DTMF tone WAV files.

Audio pipeline (VAD and turn)

flowchart LR
    PCM["PCM input\n16-bit mono"] --> VAD["vad.Detector\nIsSpeech"]
    VAD -->|"speech flag"| Turn["turn.Analyzer\nAppendAudio"]
    Turn -->|"EndOfTurnState"| Segment["User segment\nemit to STT"]
    Turn --> Silence["turn/silence\nsilence timeout"]
    Segment --> STT["STT pipeline"]

Incoming audio is passed to VAD; the pipeline (e.g. voice processor) appends audio to the turn analyzer. When the analyzer reports Complete (e.g. silence after speech), the buffered segment is sent to STT.

Exported symbols (root)

Symbol	Type	Description
`Frame`	struct	Data, SampleRate, NumChannels, Timestamp
`Stream`	interface	Read, Write, Close
`PCM16MonoNumFrames(bytes)`	func	len(bytes)/2
`DefaultInSampleRate`, `DefaultOutSampleRate`	const	16000, 24000
`Resample*`	funcs	Sample rate conversion (see resample.go)
`Mix*`	funcs	Mix PCM buffers (see mix.go)
`EncodeULaw`, `DecodeULaw`, `EncodeALaw`, `DecodeALaw`	funcs	µ-law/A-law codecs
WAV, DTMF helpers	funcs	See wav.go, dtmf_wav.go

Subpackages

Path	Description
`vad`	Detector, EnergyAnalyzerBackend; IsSpeech, SetSampleRate; optional Silero
`turn`	Analyzer interface; Params, EndOfTurnState, EndOfTurnResult; silence impl; user_controller, user_strategies
`filters`	Audio filters (see filters.go)
`mixers`	Mixer utilities (see mixers.go)
`interruptions`	Interruption/barge-in handling

Concurrency

Frame/Stream: No internal state; safe for concurrent use where documented.
vad.Detector: Implementations may or may not be safe for concurrent use; typically one detector per pipeline.
turn.Analyzer: AppendAudio is called from one goroutine; AnalyzeEndOfTurn(Async) may be called from another; implementations use mutexes where needed.

Files (root)

File	Description
`audio.go`	PCM16MonoNumFrames, DefaultInSampleRate, DefaultOutSampleRate
`types.go`	Frame, Stream
`resample.go`	Resample helpers
`mix.go`	Mix helpers
`alaw.go`, `ulaw.go`	A-law, µ-law codecs
`wav.go`	WAV read/write
`dtmf_wav.go`	DTMF tone WAV generation

Documentation ¶

Overview ¶

Package audio provides A-law (G.711 PCMA) encode/decode for 16-bit mono PCM.

Package audio provides minimal audio processing utilities for the Voxray system, focusing on 16-bit PCM mono audio formats used by STT and TTS services.

Package audio provides mix and interleave helpers for 16-bit mono PCM.

Package audio provides optional sample rate conversion (resample) for 16-bit mono PCM.

Package audio provides μ-law (G.711 PCMU) encode/decode for 16-bit mono PCM.

Package audio provides WAV decoding utilities.

Index ¶

Constants
func DecodeALaw(alaw []byte) []byte
func DecodeULaw(ulaw []byte) []byte
func DecodeWAVToPCM(wav []byte) (pcm []byte, sampleRate int, err error)
func EncodeALaw(pcm []byte) []byte
func EncodeULaw(pcm []byte) []byte
func GenerateDTMFPCM(sampleRate int, key string, toneDuration, gapDuration float64) ([]byte, error)
func InterleaveStereo(left, right []byte) []byte
func MixMono(user, bot []byte) []byte
func PCM16MonoNumFrames(bytes []byte) int
func Resample16Mono(in []byte, inRate, outRate int, out []byte) []byte
func Resample16MonoAlloc(in []byte, inRate, outRate int) []byte
func WritePCM16MonoWAV(path string, pcm []byte, sampleRate int) error
type Frame
type Stream

Constants ¶

View Source

const (
	// DefaultInSampleRate is the typical sample rate for STT input (16kHz).
	DefaultInSampleRate = 16000
	// DefaultOutSampleRate is the typical sample rate for TTS output (24kHz).
	DefaultOutSampleRate = 24000
)

Default audio configuration constants for the Voxray system.

Variables ¶

This section is empty.

Functions ¶

func DecodeALaw ¶

func DecodeALaw(alaw []byte) []byte

DecodeALaw converts A-law (PCMA) bytes to 16-bit little-endian PCM. Output length is len(alaw)*2.

func DecodeULaw ¶

func DecodeULaw(ulaw []byte) []byte

DecodeULaw converts μ-law (PCMU) bytes to 16-bit little-endian PCM. Output length is len(ulaw)*2.

func DecodeWAVToPCM ¶

func DecodeWAVToPCM(wav []byte) (pcm []byte, sampleRate int, err error)

DecodeWAVToPCM extracts raw 16-bit little-endian PCM and sample rate from WAV bytes. Returns (pcm, sampleRate, nil) or (nil, 0, error) for invalid/unsupported WAV. Supports standard PCM WAV with "fmt " and "data" chunks.

func EncodeALaw ¶

func EncodeALaw(pcm []byte) []byte

EncodeALaw converts 16-bit little-endian PCM to A-law (PCMA) bytes. len(pcm) must be even; output length is len(pcm)/2.

func EncodeULaw ¶

func EncodeULaw(pcm []byte) []byte

EncodeULaw converts 16-bit little-endian PCM to μ-law (PCMU) bytes. len(pcm) must be even; output length is len(pcm)/2.

func GenerateDTMFPCM ¶

func GenerateDTMFPCM(sampleRate int, key string, toneDuration, gapDuration float64) ([]byte, error)

GenerateDTMFPCM generates 16-bit little-endian mono PCM for a DTMF key. key: "0"-"9", "star", "pound". toneDuration and gapDuration are in seconds.

func InterleaveStereo ¶

func InterleaveStereo(left, right []byte) []byte

InterleaveStereo interleaves two 16-bit mono buffers as left, right, left, right, ... If lengths differ, the shorter channel is padded with zeros so both have the same length. Returns a new slice of length 4*max(samples in left, samples in right).

func MixMono ¶

func MixMono(user, bot []byte) []byte

MixMono mixes two 16-bit little-endian mono PCM buffers by averaging samples. If lengths differ, the shorter buffer is padded with zeros to match the longer. Returns a new slice of length max(len(user), len(bot)).

func PCM16MonoNumFrames ¶

func PCM16MonoNumFrames(bytes []byte) int

PCM16MonoNumFrames calculates the number of audio frames in a PCM16 mono buffer. Since each frame is 16 bits (2 bytes), the number of frames is total bytes divided by 2.

func Resample16Mono ¶

func Resample16Mono(in []byte, inRate, outRate int, out []byte) []byte

Resample16Mono converts 16-bit mono PCM from inRate to outRate using linear interpolation. It performs a sample rate conversion to match the target service's requirements. in and out can be the same slice if inRate == outRate (no-op). Otherwise out must have capacity for the resampled length: len(out) >= len(in) * outRate / inRate (rounded up).

func Resample16MonoAlloc ¶

func Resample16MonoAlloc(in []byte, inRate, outRate int) []byte

Resample16MonoAlloc returns a new slice with 16-bit mono PCM resampled from inRate to outRate. It is a convenience wrapper around Resample16Mono that handles allocation.

func WritePCM16MonoWAV ¶

func WritePCM16MonoWAV(path string, pcm []byte, sampleRate int) error

WritePCM16MonoWAV writes 16-bit mono PCM to a WAV file.

Types ¶

type Frame ¶

type Frame struct {
	Data        []byte
	SampleRate  int
	NumChannels int
	Timestamp   time.Time
}

Frame represents a chunk of PCM audio in the Voxray system. It is a lightweight helper around raw bytes, separate from pipeline frames. Audio is expected to be 16-bit PCM little-endian.

type Stream ¶

type Stream interface {
	Read() (Frame, error)
	Write(Frame) error
	Close() error
}

Stream represents a bidirectional audio source or sink. Implementations may wrap microphones, speakers, files, or network streams.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
filters
interruptions
mixers
turn Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).	Package turn provides end-of-turn detection for audio conversations (base turn analyzer + silence-based smart turn).
vad

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL