Documentation
¶
Overview ¶
Package segmenter provides audio segmentation functionality.
The segmenter reads audio from any supported format (using FFmpeg) and outputs fixed-duration chunks of raw PCM samples. It supports:
- Fixed segment sizes (e.g., 30 seconds)
- Silence-based segmentation (break on silence boundaries)
- Sample rate conversion
- Mono output (single channel)
Use cases include:
- Speech-to-text preprocessing (splitting audio into segments)
- Audio analysis (processing chunks at a time)
- Streaming audio processing
Index ¶
Constants ¶
const ( DefaultSilenceThreshold = 0.01 // Default silence threshold (RMS) DefaultSilenceDuration = time.Millisecond * 500 // Default silence duration MinSegmentDuration = time.Millisecond * 100 // Minimum segment/silence duration )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Opt ¶ added in v1.7.6
type Opt func(*opts) error
Opt is a function which applies options to a Segmenter
func WithDefaultSilence ¶ added in v1.8.0
func WithDefaultSilence() Opt
WithDefaultSilence enables silence detection with default threshold and duration. Uses threshold of 0.01 (1% RMS) and silence duration of 500ms.
func WithFFmpegOpt ¶ added in v1.8.0
WithFFmpegOpt wraps an ffmpeg.Opt as a segmenter.Opt for use with segmenter.NewFromReader
func WithSegmentSize ¶ added in v1.7.7
WithSegmentSize sets the target segment duration. Segments will be output when they reach approximately this duration. Minimum is 100ms.
func WithSilenceSize ¶ added in v1.7.7
WithSilenceSize sets the silence duration that triggers a segment boundary. Only takes effect when silence detection is enabled. Minimum is 100ms.
func WithSilenceThreshold ¶ added in v1.8.0
WithSilenceThreshold enables silence detection with a custom threshold. The threshold is the RMS energy level (0.0-1.0) below which audio is considered silence. A typical value is 0.01 (1%).
type SegmentFuncFloat32 ¶
SegmentFuncFloat32 is a callback function which is called for each segment of audio samples. The first argument is the timestamp of the segment start. Return nil to continue, io.EOF to stop early, or any other error to abort.
type SegmentFuncInt16 ¶
SegmentFuncInt16 is a callback function which is called for each segment of audio samples. The first argument is the timestamp of the segment start. Return nil to continue, io.EOF to stop early, or any other error to abort.
type Segmenter ¶
type Segmenter struct {
// contains filtered or unexported fields
}
Segmenter reads audio samples from a reader and segments them into fixed-size chunks. It can be used to process audio samples in chunks for speech recognition, audio analysis, etc.
func New ¶ added in v1.8.0
New creates a new segmenter from a file path or URL. The sampleRate is the output sample rate in Hz (e.g., 16000 for speech). The output is always mono (single channel).
Options:
- WithSegmentSize(duration) - output segments of approximately this duration
- WithDefaultSilence() - also break segments on silence boundaries
- WithSilenceThreshold(threshold) - custom silence threshold (0.0-1.0)
- WithSilenceSize(duration) - minimum silence duration to trigger a break
func NewFromReader ¶ added in v1.8.0
NewFromReader creates a new segmenter from an io.Reader. See New for parameter documentation.
func (*Segmenter) DecodeFloat32 ¶
func (s *Segmenter) DecodeFloat32(ctx context.Context, fn SegmentFuncFloat32) error
DecodeFloat32 decodes the audio stream into float32 samples and calls the callback function for each segment. Samples are in the range [-1.0, 1.0].
The "best" audio stream is automatically selected. Output is mono at the configured sample rate.
func (*Segmenter) DecodeInt16 ¶
func (s *Segmenter) DecodeInt16(ctx context.Context, fn SegmentFuncInt16) error
DecodeInt16 decodes the audio stream into int16 samples and calls the callback function for each segment. Samples are in the range [-32768, 32767].
The "best" audio stream is automatically selected. Output is mono at the configured sample rate.
func (*Segmenter) Duration ¶
Duration returns the total duration of the media stream. Returns zero if duration is unknown.
func (*Segmenter) SampleRate ¶ added in v1.8.0
SampleRate returns the configured output sample rate.