segmenter

package
v1.8.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2026 License: Apache-2.0 Imports: 7 Imported by: 2

Documentation

Overview

Package segmenter provides audio segmentation functionality.

The segmenter reads audio from any supported format (using FFmpeg) and outputs fixed-duration chunks of raw PCM samples. It supports:

  • Fixed segment sizes (e.g., 30 seconds)
  • Silence-based segmentation (break on silence boundaries)
  • Sample rate conversion
  • Mono output (single channel)

Use cases include:

  • Speech-to-text preprocessing (splitting audio into segments)
  • Audio analysis (processing chunks at a time)
  • Streaming audio processing

Index

Constants

View Source
const (
	DefaultSilenceThreshold = 0.01                   // Default silence threshold (RMS)
	DefaultSilenceDuration  = time.Millisecond * 500 // Default silence duration
	MinSegmentDuration      = time.Millisecond * 100 // Minimum segment/silence duration
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Opt added in v1.7.6

type Opt func(*opts) error

Opt is a function which applies options to a Segmenter

func WithDefaultSilence added in v1.8.0

func WithDefaultSilence() Opt

WithDefaultSilence enables silence detection with default threshold and duration. Uses threshold of 0.01 (1% RMS) and silence duration of 500ms.

func WithFFmpegOpt added in v1.8.0

func WithFFmpegOpt(opt ffmpeg.Opt) Opt

WithFFmpegOpt wraps an ffmpeg.Opt as a segmenter.Opt for use with segmenter.NewFromReader

func WithSegmentSize added in v1.7.7

func WithSegmentSize(v time.Duration) Opt

WithSegmentSize sets the target segment duration. Segments will be output when they reach approximately this duration. Minimum is 100ms.

func WithSilenceSize added in v1.7.7

func WithSilenceSize(v time.Duration) Opt

WithSilenceSize sets the silence duration that triggers a segment boundary. Only takes effect when silence detection is enabled. Minimum is 100ms.

func WithSilenceThreshold added in v1.8.0

func WithSilenceThreshold(threshold float64) Opt

WithSilenceThreshold enables silence detection with a custom threshold. The threshold is the RMS energy level (0.0-1.0) below which audio is considered silence. A typical value is 0.01 (1%).

type SegmentFuncFloat32

type SegmentFuncFloat32 func(timestamp time.Duration, samples []float32) error

SegmentFuncFloat32 is a callback function which is called for each segment of audio samples. The first argument is the timestamp of the segment start. Return nil to continue, io.EOF to stop early, or any other error to abort.

type SegmentFuncInt16

type SegmentFuncInt16 func(timestamp time.Duration, samples []int16) error

SegmentFuncInt16 is a callback function which is called for each segment of audio samples. The first argument is the timestamp of the segment start. Return nil to continue, io.EOF to stop early, or any other error to abort.

type Segmenter

type Segmenter struct {
	// contains filtered or unexported fields
}

Segmenter reads audio samples from a reader and segments them into fixed-size chunks. It can be used to process audio samples in chunks for speech recognition, audio analysis, etc.

func New added in v1.8.0

func New(path string, sampleRate int, opts ...Opt) (*Segmenter, error)

New creates a new segmenter from a file path or URL. The sampleRate is the output sample rate in Hz (e.g., 16000 for speech). The output is always mono (single channel).

Options:

  • WithSegmentSize(duration) - output segments of approximately this duration
  • WithDefaultSilence() - also break segments on silence boundaries
  • WithSilenceThreshold(threshold) - custom silence threshold (0.0-1.0)
  • WithSilenceSize(duration) - minimum silence duration to trigger a break

func NewFromReader added in v1.8.0

func NewFromReader(r io.Reader, sampleRate int, opts ...Opt) (*Segmenter, error)

NewFromReader creates a new segmenter from an io.Reader. See New for parameter documentation.

func (*Segmenter) Close

func (s *Segmenter) Close() error

Close releases all resources associated with the segmenter.

func (*Segmenter) DecodeFloat32

func (s *Segmenter) DecodeFloat32(ctx context.Context, fn SegmentFuncFloat32) error

DecodeFloat32 decodes the audio stream into float32 samples and calls the callback function for each segment. Samples are in the range [-1.0, 1.0].

The "best" audio stream is automatically selected. Output is mono at the configured sample rate.

func (*Segmenter) DecodeInt16

func (s *Segmenter) DecodeInt16(ctx context.Context, fn SegmentFuncInt16) error

DecodeInt16 decodes the audio stream into int16 samples and calls the callback function for each segment. Samples are in the range [-32768, 32767].

The "best" audio stream is automatically selected. Output is mono at the configured sample rate.

func (*Segmenter) Duration

func (s *Segmenter) Duration() time.Duration

Duration returns the total duration of the media stream. Returns zero if duration is unknown.

func (*Segmenter) SampleRate added in v1.8.0

func (s *Segmenter) SampleRate() int

SampleRate returns the configured output sample rate.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL