Documentation
¶
Overview ¶
Package audio is the platform-neutral audio I/O kernel. Pure-Go pieces (sample-rate conversion, ring buffers, WAV framing) live in this package directly; OS-specific capture/playback backends (malgo/WASAPI on Windows, oto/v3 on Linux, etc.) live in build-tag gated files alongside.
Audit 2026-05-24 maintainability sweep.
Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego. The build tag lets plain-Go cross-compiles for Linux skip this file, which is relevant when developer machines don't have a Linux C toolchain. Production server builds (Dockerfile.server) enable cgo, so this file compiles in — even though the Server-Target never plays audio locally, transitive imports from internal/stt must stay safe.
Streaming audio player — see player.go for the rationale behind the build tag.
Index ¶
- Constants
- Variables
- func Get() []byte
- func PCMDurationSecs(pcm []byte) float64
- func PCMLevel(pcm []byte) float64
- func PCMToWAV(pcm []byte) []byte
- func Put(buf []byte)
- func RegisterBackend(name Backend, factory Factory) error
- type Backend
- type Capturer
- type Config
- type DeviceInfo
- type Event
- type EventType
- type Factory
- type FramePool
- type Player
- type PooledPCMHandler
- type Session
- type Stats
- type StreamPlayer
Constants ¶
const ( SampleRate = 16000 Channels = 1 BitsPerSample = 16 BytesPerSample = BitsPerSample / 8 )
const DefaultFrameCapacity = 4096
DefaultFrameCapacity is the per-buffer capacity returned by the package-level frame pool. It is sized for the worst-case malgo callback chunk we have observed in production (16 kHz mono S16 at 32 ms FrameSizeMs → 1024 bytes; doubled for a generous safety margin).
Variables ¶
var ( ErrUnsupportedBackend = errors.New("unsupported audio backend") )
Functions ¶
func Get ¶ added in v0.40.1
func Get() []byte
Get returns a recyclable buffer from the default pool.
func PCMDurationSecs ¶
PCMDurationSecs returns the duration of PCM audio in seconds.
func RegisterBackend ¶
Types ¶
type Capturer ¶
type Capturer = Session
Capturer is kept as an alias while the app migrates to the session terminology.
func NewCapturer ¶
func NewCapturerWithConfig ¶
type DeviceInfo ¶
type DeviceInfo struct {
ID string `json:"deviceId"`
Name string `json:"label"`
IsDefault bool `json:"isDefault"`
}
DeviceInfo describes a capture device that can be presented to the user.
func ListCaptureDevices ¶
func ListCaptureDevices(cfg Config) ([]DeviceInfo, error)
ListCaptureDevices returns the available microphone devices for the selected backend.
func ListOutputDevices ¶ added in v0.22.1
func ListOutputDevices(cfg Config) ([]DeviceInfo, error)
ListOutputDevices returns the available speaker devices for the selected backend.
type FramePool ¶ added in v0.40.1
type FramePool struct {
// Capacity is the cap() of fresh buffers returned by Get. Zero
// falls back to DefaultFrameCapacity at first use.
Capacity int
// contains filtered or unexported fields
}
FramePool returns recyclable byte slices for short-lived PCM frame buffers. The hot path it addresses is the malgo capture callback (internal/audio/capturer_windows_cgo.go) which used to allocate a fresh slice per callback (~33×/sec per active capture, ~3300×/sec at 100 concurrent server sessions).
Contract ¶
- Get returns a []byte with len(buf) == 0 and cap(buf) >= pool.Capacity (DefaultFrameCapacity if unset). Callers append to it; reslicing past cap will reallocate the way the language normally would.
- Put returns the buffer to the pool. After Put the caller MUST NOT read from or write to the slice — another goroutine may pick it up immediately.
- Put is idempotent on the nil slice (no-op).
- Buffers whose grown capacity exceeds 4× the configured Capacity are dropped on Put rather than retained, so pathological growth does not pin large allocations in the pool.
Concurrency ¶
FramePool wraps sync.Pool — safe for concurrent Get / Put from many goroutines. The pool may discard entries at any GC cycle; callers MUST treat Get as may-allocate-fresh.
Observability ¶
HitRatio() returns a coarse hit/miss ratio over the pool's lifetime. It is intended for occasional sanity checks and metric exports; the underlying counters use atomic adds so HitRatio is safe to call concurrently with Get and Put.
var DefaultFramePool FramePool
DefaultFramePool is the package-level pool. Most callers should use the package functions Get / Put rather than constructing their own pool. Tests that need isolation construct their own FramePool.
func (*FramePool) Get ¶ added in v0.40.1
Get returns a recyclable buffer with len 0 and capacity at least p.Capacity (or DefaultFrameCapacity when unset).
func (*FramePool) HitRatio ¶ added in v0.40.1
HitRatio returns the fraction of Get calls served from the cache (1.0 = always hit, 0.0 = always allocated fresh). Returns 0 when no operations have happened yet. The return is a snapshot — the counters may advance immediately after the call.
Concretely: HitRatio = (gets - misses) / gets, where misses is incremented inside sync.Pool.New (i.e. every fresh allocation counts as a miss).
type Player ¶
type Player struct {
// contains filtered or unexported fields
}
Player plays audio through the system's default output device.
func NewPlayer ¶
NewPlayer creates an audio player for TTS output. Call once at app startup; reuse for all playback.
func (*Player) Close ¶
func (p *Player) Close()
Close releases audio resources. Call on app shutdown.
func (*Player) OnFinished ¶
func (p *Player) OnFinished(fn func())
OnFinished sets a callback that fires when playback completes naturally (not when stopped via Stop()).
func (*Player) PlayMP3 ¶
PlayMP3 decodes and plays MP3 audio data. Blocks until playback completes or Stop() is called.
func (*Player) PlayPCM ¶
PlayPCM plays raw PCM audio (16-bit signed int, little-endian, mono). IMPORTANT: The oto context is initialized at 24kHz. Audio with a different sample rate will play at the wrong pitch/speed. Callers must resample to 24kHz before calling this method, or use PlayMP3 which handles decoding.
type PooledPCMHandler ¶ added in v0.40.1
type PooledPCMHandler func(buf []byte, release func())
PooledPCMHandler receives one captured PCM frame with explicit buffer-ownership semantics. The release closure MUST be invoked exactly once when the handler is done with buf — either before returning, or asynchronously once any retained reference is released. The buffer MUST NOT be read or written after release.
See internal/audio.FramePool for the underlying lifecycle. The optimisation only matters for sustained capture (~33 callbacks/sec per session); short-lived recording paths can stay on the legacy SetPCMHandler API without ceremony.
type Session ¶
type Session interface {
Start() error
Stop() ([]byte, error)
IsRunning() bool
Events() <-chan Event
SetLevelHandler(func(float64))
SetPCMHandler(func([]byte))
// SetPooledPCMHandler installs the pool-aware variant of the PCM
// callback. When set (non-nil), the capture backend leases the
// per-frame buffer from internal/audio's package-level FramePool
// instead of allocating fresh, and invokes the handler with a
// release closure. The handler MUST call release exactly once
// before returning OR before retaining any reference to the
// slice. Forgetting to release leaks one pool slot per frame but
// does not corrupt data.
//
// When both SetPCMHandler and SetPooledPCMHandler are set, the
// pool-aware variant wins — the legacy handler is not invoked
// for that frame, so callers that adopt the pooled API should
// also unset the legacy one to avoid surprise.
//
// Backends not yet wired to honour the pool MAY no-op this
// setter; the legacy SetPCMHandler path remains the canonical
// contract for all existing callers.
SetPooledPCMHandler(PooledPCMHandler)
Close() error
}
Session records microphone PCM and exposes both level and live-audio callbacks.
type Stats ¶ added in v0.40.1
Stats are the raw lifetime counters. Useful for OTel meter callbacks that prefer integer reports over a derived ratio.
type StreamPlayer ¶ added in v0.18.0
type StreamPlayer struct {
// contains filtered or unexported fields
}
StreamPlayer plays a continuous stream of PCM audio chunks through a single playback backend. Unlike Player.PlayPCM which stops previous playback on each call, StreamPlayer buffers chunks and plays them sequentially. Designed for real-time voice agent audio output (Gemini Live, OpenAI Realtime).
func NewStreamPlayer ¶ added in v0.18.0
func NewStreamPlayer() (*StreamPlayer, error)
NewStreamPlayer creates a StreamPlayer using the system default output device.
func NewStreamPlayerWithOutputDevice ¶ added in v0.22.1
func NewStreamPlayerWithOutputDevice(outputDeviceID string) (*StreamPlayer, error)
NewStreamPlayerWithOutputDevice creates a StreamPlayer for the selected output device. An empty device ID uses the system default output device.
func (*StreamPlayer) Close ¶ added in v0.18.0
func (sp *StreamPlayer) Close()
func (*StreamPlayer) IsActive ¶ added in v0.18.0
func (sp *StreamPlayer) IsActive() bool
func (*StreamPlayer) Start ¶ added in v0.18.0
func (sp *StreamPlayer) Start(ctx context.Context)
func (*StreamPlayer) StopAndDrain ¶ added in v0.18.0
func (sp *StreamPlayer) StopAndDrain()
func (*StreamPlayer) WriteChunk ¶ added in v0.18.0
func (sp *StreamPlayer) WriteChunk(chunk []byte)