segmenter

package

v0.0.19 Latest Latest Go to latest Published: Aug 10, 2024 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/mutablelogic/go-whisper

Links

Open Source Insights

Documentation ¶

Overview ¶

segmenter package provides a segmenter for audio files and streams

Index ¶

type SegmentFunc
type Segmenter
- func NewReader(r io.Reader, dur time.Duration, sample_rate int) (*Segmenter, error)
- func (s *Segmenter) Close() error
- func (s *Segmenter) Decode(ctx context.Context, fn SegmentFunc) error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type SegmentFunc ¶

type SegmentFunc func(time.Duration, []float32) error

SegmentFunc is a callback function which is called when a segment is ready to be processed. The first argument is the timestamp of the segment.

type Segmenter ¶

type Segmenter struct {
	// contains filtered or unexported fields
}

A segmenter reads audio samples from a reader and segments them into fixed-size chunks. The segmenter can be used to process audio samples

func NewReader ¶

func NewReader(r io.Reader, dur time.Duration, sample_rate int) (*Segmenter, error)

Create a new segmenter with a reader r which segments raw audio of 'dur' length. If dur is zero then no segmenting is performed, the whole audio file is read, which could cause some memory issues.

The sample rate is the number of samples per second.

At the moment, the audio format is auto-detected, but there should be a way to specify the audio format.

func (*Segmenter) Close ¶

func (s *Segmenter) Close() error

Close the segmenter

func (*Segmenter) Decode ¶

func (s *Segmenter) Decode(ctx context.Context, fn SegmentFunc) error

Segments are output through a callback, with the samples and a timestamp TODO: we could do some basic silence and voice detection to segment to ensure we don't overtax the CPU/GPU with silence and non-speech TODO: We whould be able to select the audio stream to use. At the moment the "best" audio stream is used, based on ffmpeg heuristic.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL