dictation

package

v0.35.2 Latest Latest Go to latest Published: May 18, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kombifyio/SpeechKit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package dictation implements pause-based segmentation for Dictation Mode: it consumes VAD speech-probability frames and emits one transcription request per natural pause.

The package is platform-neutral and reusable: Device-Target and Server-Target both call into it. Audio capture and the STT call itself live in sibling packages (audio, stt) — this package only owns the "where does an utterance end" decision.

Index ¶

type Config
type Processor
- func NewProcessor(detector vad.Detector, cfg Config) *Processor
type Segment

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Config ¶

type Config struct {
	PauseThreshold time.Duration
	MinSegment     time.Duration
	Padding        time.Duration
	Overlap        time.Duration
}

Config controls pause-based dictation segmentation.

type Processor ¶

type Processor struct {
	// contains filtered or unexported fields
}

Processor segments PCM audio into speech chunks using a VAD.

func NewProcessor ¶

func NewProcessor(detector vad.Detector, cfg Config) *Processor

NewProcessor creates a dictation processor with sane defaults.

func (*Processor) FeedPCM ¶

func (p *Processor) FeedPCM(pcm []byte) ([]Segment, error)

FeedPCM ingests raw S16 mono PCM and returns any segments flushed while processing.

func (*Processor) Flush ¶

func (p *Processor) Flush() ([]Segment, error)

Flush returns the trailing buffered segment, if any, and resets session state.

func (*Processor) Reset ¶

func (p *Processor) Reset()

Reset clears the current dictation session and VAD state.

type Segment ¶

type Segment struct {
	PCM       []byte
	Duration  time.Duration
	Paragraph bool
	Final     bool
}

Segment is a transcribable utterance extracted from a dictation session.

Source Files ¶

View all Source files

processor.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL