dictation

package
v0.35.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Overview

Package dictation implements pause-based segmentation for Dictation Mode: it consumes VAD speech-probability frames and emits one transcription request per natural pause.

The package is platform-neutral and reusable: Device-Target and Server-Target both call into it. Audio capture and the STT call itself live in sibling packages (audio, stt) — this package only owns the "where does an utterance end" decision.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	PauseThreshold time.Duration
	MinSegment     time.Duration
	Padding        time.Duration
	Overlap        time.Duration
}

Config controls pause-based dictation segmentation.

type Processor

type Processor struct {
	// contains filtered or unexported fields
}

Processor segments PCM audio into speech chunks using a VAD.

func NewProcessor

func NewProcessor(detector vad.Detector, cfg Config) *Processor

NewProcessor creates a dictation processor with sane defaults.

func (*Processor) FeedPCM

func (p *Processor) FeedPCM(pcm []byte) ([]Segment, error)

FeedPCM ingests raw S16 mono PCM and returns any segments flushed while processing.

func (*Processor) Flush

func (p *Processor) Flush() ([]Segment, error)

Flush returns the trailing buffered segment, if any, and resets session state.

func (*Processor) Reset

func (p *Processor) Reset()

Reset clears the current dictation session and VAD state.

type Segment

type Segment struct {
	PCM       []byte
	Duration  time.Duration
	Paragraph bool
	Final     bool
}

Segment is a transcribable utterance extracted from a dictation session.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL