audio

package
v0.40.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package audio is the platform-neutral audio I/O kernel. Pure-Go pieces (sample-rate conversion, ring buffers, WAV framing) live in this package directly; OS-specific capture/playback backends (malgo/WASAPI on Windows, oto/v3 on Linux, etc.) live in build-tag gated files alongside.

Audit 2026-05-24 maintainability sweep.

Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego. The build tag lets plain-Go cross-compiles for Linux skip this file, which is relevant when developer machines don't have a Linux C toolchain. Production server builds (Dockerfile.server) enable cgo, so this file compiles in — even though the Server-Target never plays audio locally, transitive imports from internal/stt must stay safe.

Streaming audio player — see player.go for the rationale behind the build tag.

Index

Constants

View Source
const (
	SampleRate     = 16000
	Channels       = 1
	BitsPerSample  = 16
	BytesPerSample = BitsPerSample / 8
)
View Source
const DefaultFrameCapacity = 4096

DefaultFrameCapacity is the per-buffer capacity returned by the package-level frame pool. It is sized for the worst-case malgo callback chunk we have observed in production (16 kHz mono S16 at 32 ms FrameSizeMs → 1024 bytes; doubled for a generous safety margin).

Variables

View Source
var (
	ErrUnsupportedBackend = errors.New("unsupported audio backend")
	ErrBackendUnavailable = errors.New("audio backend unavailable in this build")
)

Functions

func Get added in v0.40.1

func Get() []byte

Get returns a recyclable buffer from the default pool.

func PCMDurationSecs

func PCMDurationSecs(pcm []byte) float64

PCMDurationSecs returns the duration of PCM audio in seconds.

func PCMLevel

func PCMLevel(pcm []byte) float64

PCMLevel estimates a normalized RMS level from 16-bit PCM samples.

func PCMToWAV

func PCMToWAV(pcm []byte) []byte

PCMToWAV wraps raw 16kHz S16 Mono PCM data in a WAV header.

func Put added in v0.40.1

func Put(buf []byte)

Put returns a buffer to the default pool.

func RegisterBackend

func RegisterBackend(name Backend, factory Factory) error

Types

type Backend

type Backend string
const (
	BackendAuto                Backend = "auto"
	BackendWindowsWASAPIMalgo  Backend = "windows-wasapi-malgo"
	BackendWindowsWASAPINative Backend = "windows-wasapi-native"
)

type Capturer

type Capturer = Session

Capturer is kept as an alias while the app migrates to the session terminology.

func NewCapturer

func NewCapturer() (Capturer, error)

func NewCapturerWithConfig

func NewCapturerWithConfig(cfg Config) (Capturer, error)

type Config

type Config struct {
	Backend     Backend
	DeviceID    string
	SampleRate  int
	Channels    int
	FrameSizeMs int
	LatencyHint string
}

type DeviceInfo

type DeviceInfo struct {
	ID        string `json:"deviceId"`
	Name      string `json:"label"`
	IsDefault bool   `json:"isDefault"`
}

DeviceInfo describes a capture device that can be presented to the user.

func ListCaptureDevices

func ListCaptureDevices(cfg Config) ([]DeviceInfo, error)

ListCaptureDevices returns the available microphone devices for the selected backend.

func ListOutputDevices added in v0.22.1

func ListOutputDevices(cfg Config) ([]DeviceInfo, error)

ListOutputDevices returns the available speaker devices for the selected backend.

type Event

type Event struct {
	Type    EventType
	Backend Backend
	Message string
	Err     error
}

type EventType

type EventType string
const (
	EventStarted EventType = "started"
	EventStopped EventType = "stopped"
	EventWarning EventType = "warning"
	EventError   EventType = "error"
)

type Factory

type Factory func(Config) (Session, error)

type FramePool added in v0.40.1

type FramePool struct {
	// Capacity is the cap() of fresh buffers returned by Get. Zero
	// falls back to DefaultFrameCapacity at first use.
	Capacity int
	// contains filtered or unexported fields
}

FramePool returns recyclable byte slices for short-lived PCM frame buffers. The hot path it addresses is the malgo capture callback (internal/audio/capturer_windows_cgo.go) which used to allocate a fresh slice per callback (~33×/sec per active capture, ~3300×/sec at 100 concurrent server sessions).

Contract

  • Get returns a []byte with len(buf) == 0 and cap(buf) >= pool.Capacity (DefaultFrameCapacity if unset). Callers append to it; reslicing past cap will reallocate the way the language normally would.
  • Put returns the buffer to the pool. After Put the caller MUST NOT read from or write to the slice — another goroutine may pick it up immediately.
  • Put is idempotent on the nil slice (no-op).
  • Buffers whose grown capacity exceeds 4× the configured Capacity are dropped on Put rather than retained, so pathological growth does not pin large allocations in the pool.

Concurrency

FramePool wraps sync.Pool — safe for concurrent Get / Put from many goroutines. The pool may discard entries at any GC cycle; callers MUST treat Get as may-allocate-fresh.

Observability

HitRatio() returns a coarse hit/miss ratio over the pool's lifetime. It is intended for occasional sanity checks and metric exports; the underlying counters use atomic adds so HitRatio is safe to call concurrently with Get and Put.

var DefaultFramePool FramePool

DefaultFramePool is the package-level pool. Most callers should use the package functions Get / Put rather than constructing their own pool. Tests that need isolation construct their own FramePool.

func (*FramePool) Get added in v0.40.1

func (p *FramePool) Get() []byte

Get returns a recyclable buffer with len 0 and capacity at least p.Capacity (or DefaultFrameCapacity when unset).

func (*FramePool) HitRatio added in v0.40.1

func (p *FramePool) HitRatio() float64

HitRatio returns the fraction of Get calls served from the cache (1.0 = always hit, 0.0 = always allocated fresh). Returns 0 when no operations have happened yet. The return is a snapshot — the counters may advance immediately after the call.

Concretely: HitRatio = (gets - misses) / gets, where misses is incremented inside sync.Pool.New (i.e. every fresh allocation counts as a miss).

func (*FramePool) Put added in v0.40.1

func (p *FramePool) Put(buf []byte)

Put returns the buffer to the pool. nil buffers are silently dropped. Buffers whose capacity exceeds 4× the configured Capacity are dropped to prevent unbounded growth from pinning the pool.

func (*FramePool) Stats added in v0.40.1

func (p *FramePool) Stats() Stats

Stats returns the lifetime counters. Hits can be derived as stats.Gets - stats.Misses.

type Player

type Player struct {
	// contains filtered or unexported fields
}

Player plays audio through the system's default output device.

func NewPlayer

func NewPlayer() (*Player, error)

NewPlayer creates an audio player for TTS output. Call once at app startup; reuse for all playback.

func (*Player) Close

func (p *Player) Close()

Close releases audio resources. Call on app shutdown.

func (*Player) IsPlaying

func (p *Player) IsPlaying() bool

IsPlaying returns true if audio is currently being played.

func (*Player) OnFinished

func (p *Player) OnFinished(fn func())

OnFinished sets a callback that fires when playback completes naturally (not when stopped via Stop()).

func (*Player) PlayMP3

func (p *Player) PlayMP3(ctx context.Context, data []byte) error

PlayMP3 decodes and plays MP3 audio data. Blocks until playback completes or Stop() is called.

func (*Player) PlayPCM

func (p *Player) PlayPCM(ctx context.Context, data []byte, sampleRate int) error

PlayPCM plays raw PCM audio (16-bit signed int, little-endian, mono). IMPORTANT: The oto context is initialized at 24kHz. Audio with a different sample rate will play at the wrong pitch/speed. Callers must resample to 24kHz before calling this method, or use PlayMP3 which handles decoding.

func (*Player) Stop

func (p *Player) Stop()

Stop immediately stops current playback (for barge-in support).

type PooledPCMHandler added in v0.40.1

type PooledPCMHandler func(buf []byte, release func())

PooledPCMHandler receives one captured PCM frame with explicit buffer-ownership semantics. The release closure MUST be invoked exactly once when the handler is done with buf — either before returning, or asynchronously once any retained reference is released. The buffer MUST NOT be read or written after release.

See internal/audio.FramePool for the underlying lifecycle. The optimisation only matters for sustained capture (~33 callbacks/sec per session); short-lived recording paths can stay on the legacy SetPCMHandler API without ceremony.

type Session

type Session interface {
	Start() error
	Stop() ([]byte, error)
	IsRunning() bool
	Events() <-chan Event
	SetLevelHandler(func(float64))
	SetPCMHandler(func([]byte))
	// SetPooledPCMHandler installs the pool-aware variant of the PCM
	// callback. When set (non-nil), the capture backend leases the
	// per-frame buffer from internal/audio's package-level FramePool
	// instead of allocating fresh, and invokes the handler with a
	// release closure. The handler MUST call release exactly once
	// before returning OR before retaining any reference to the
	// slice. Forgetting to release leaks one pool slot per frame but
	// does not corrupt data.
	//
	// When both SetPCMHandler and SetPooledPCMHandler are set, the
	// pool-aware variant wins — the legacy handler is not invoked
	// for that frame, so callers that adopt the pooled API should
	// also unset the legacy one to avoid surprise.
	//
	// Backends not yet wired to honour the pool MAY no-op this
	// setter; the legacy SetPCMHandler path remains the canonical
	// contract for all existing callers.
	SetPooledPCMHandler(PooledPCMHandler)
	Close() error
}

Session records microphone PCM and exposes both level and live-audio callbacks.

func Open

func Open(cfg Config) (Session, error)

type Stats added in v0.40.1

type Stats struct {
	Gets   uint64
	Misses uint64
}

Stats are the raw lifetime counters. Useful for OTel meter callbacks that prefer integer reports over a derived ratio.

type StreamPlayer added in v0.18.0

type StreamPlayer struct {
	// contains filtered or unexported fields
}

StreamPlayer plays a continuous stream of PCM audio chunks through a single playback backend. Unlike Player.PlayPCM which stops previous playback on each call, StreamPlayer buffers chunks and plays them sequentially. Designed for real-time voice agent audio output (Gemini Live, OpenAI Realtime).

func NewStreamPlayer added in v0.18.0

func NewStreamPlayer() (*StreamPlayer, error)

NewStreamPlayer creates a StreamPlayer using the system default output device.

func NewStreamPlayerWithOutputDevice added in v0.22.1

func NewStreamPlayerWithOutputDevice(outputDeviceID string) (*StreamPlayer, error)

NewStreamPlayerWithOutputDevice creates a StreamPlayer for the selected output device. An empty device ID uses the system default output device.

func (*StreamPlayer) Close added in v0.18.0

func (sp *StreamPlayer) Close()

func (*StreamPlayer) IsActive added in v0.18.0

func (sp *StreamPlayer) IsActive() bool

func (*StreamPlayer) Start added in v0.18.0

func (sp *StreamPlayer) Start(ctx context.Context)

func (*StreamPlayer) StopAndDrain added in v0.18.0

func (sp *StreamPlayer) StopAndDrain()

func (*StreamPlayer) WriteChunk added in v0.18.0

func (sp *StreamPlayer) WriteChunk(chunk []byte)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL