speechkit

package
v0.16.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 11, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package speechkit provides the public SDK for embedding SpeechKit voice capture and transcription into host applications.

The central type is Runtime, which manages shared state and event delivery. An Engine is the full voice pipeline; RecordingController and TranscriptionWorker can be composed independently for custom pipelines.

Index

Constants

View Source
const (
	DefaultDictationMinSegment = 1200 * time.Millisecond
	DefaultDictationPadding    = 160 * time.Millisecond
	DefaultDictationOverlap    = 200 * time.Millisecond
)
View Source
const DefaultMinPCMBytes = 3200
View Source
const DefaultProcessingMessage = "Recording stopped · Transcribing"

Variables

View Source
var (
	ErrMissingRunner      = errors.New("speechkit: transcription worker requires a runner")
	ErrMissingTranscriber = errors.New("speechkit: transcription runner requires a transcriber")
	ErrWorkerClosed       = errors.New("speechkit: transcription worker is closed")
	ErrWorkerQueueFull    = errors.New("speechkit: transcription worker queue is full")
)
View Source
var ErrCommandHandlerUnavailable = errors.New("speechkit: no command handler configured")

ErrCommandHandlerUnavailable is returned by [CommandBus.Dispatch] when no command handler has been configured on the Runtime.

Functions

func FallbackDictationSegments

func FallbackDictationSegments(fullPCM []byte) []dictation.Segment

FallbackDictationSegments wraps all of fullPCM in a single segment. Used when VAD-based segmentation is unavailable or produces no output.

Types

type AudioRecorder

type AudioRecorder interface {
	Start() error
	Stop() ([]byte, error)
	SetPCMHandler(func([]byte))
}

AudioRecorder is the hardware abstraction for microphone capture.

type Command

type Command struct {
	Type     CommandType
	Text     string
	NoteID   int64
	Target   string
	Metadata map[string]string
}

Command is a request dispatched through the CommandBus.

func (Command) Clone

func (c Command) Clone() Command

type CommandBus

type CommandBus interface {
	Dispatch(context.Context, Command) error
}

CommandBus delivers Command values to the registered handler.

type CommandType

type CommandType string

CommandType identifies the action a Command requests.

const (
	CommandShowDashboard           CommandType = "dashboard.show"
	CommandStartDictation          CommandType = "dictation.start"
	CommandStopDictation           CommandType = "dictation.stop"
	CommandSetActiveMode           CommandType = "mode.set_active"
	CommandOpenQuickNote           CommandType = "quicknote.open"
	CommandOpenQuickCapture        CommandType = "quicknote.capture.open"
	CommandCloseQuickCapture       CommandType = "quicknote.capture.close"
	CommandArmQuickNoteRecording   CommandType = "quicknote.record.arm"
	CommandCopyLastTranscription   CommandType = "transcription.copy_last"
	CommandInsertLastTranscription CommandType = "transcription.insert_last"
	CommandSummarizeSelection      CommandType = "selection.summarize"
)

type CommitObserver

type CommitObserver interface {
	OnCommit(completion Completion)
}

CommitObserver is notified after each successful TranscriptionRunner.Commit.

type Completion

type Completion struct {
	Transcript             Transcript
	QuickNoteCommitted     bool
	QuickNoteCreated       bool
	QuickNoteID            int64
	TranscriptionPersisted bool
}

Completion describes the outcome of a TranscriptionRunner.Commit call.

type DictationSegmenter

type DictationSegmenter struct {
	// contains filtered or unexported fields
}

DictationSegmenter implements SegmentCollector using VAD-based pause detection to split continuous speech into discrete segments.

func NewDictationSegmenter

func NewDictationSegmenter(detector vad.Detector, pauseThreshold time.Duration) *DictationSegmenter

func (*DictationSegmenter) CollectStopSegments

func (s *DictationSegmenter) CollectStopSegments(fullPCM []byte) ([]dictation.Segment, error)

func (*DictationSegmenter) FeedPCM

func (s *DictationSegmenter) FeedPCM(pcm []byte) error

type Engine

type Engine interface {
	Start(context.Context) error
	Stop(context.Context) error
	Events() <-chan Event
	Commands() CommandBus
	State() Snapshot
}

Engine is the interface implemented by a full SpeechKit voice pipeline.

type Event

type Event struct {
	Type      EventType
	Time      time.Time
	Message   string
	Text      string
	Provider  string
	QuickNote bool
	Err       error
	Shortcut  string
}

Event is a notification published to the event channel returned by Runtime.Events. Consumers should switch on Type and inspect the relevant fields.

type EventType

type EventType string

EventType identifies the kind of event published to the event channel.

const (
	EventStateChanged        EventType = "state.changed"
	EventRecordingStarted    EventType = "recording.started"
	EventProcessingStarted   EventType = "processing.started"
	EventTranscriptionReady  EventType = "transcription.ready"
	EventTranscriptCommitted EventType = "transcription.committed"
	EventQuickNoteModeArmed  EventType = "quicknote.mode_armed"
	EventQuickNoteUpdated    EventType = "quicknote.updated"
	EventWarningRaised       EventType = "warning.raised"
	EventErrorRaised         EventType = "error.raised"
	EventShortcutMatched     EventType = "shortcut.matched"
)

type Hooks

type Hooks struct {
	Start         func(context.Context) error
	Stop          func(context.Context) error
	HandleCommand func(context.Context, Command) error
}

Hooks are the lifecycle callbacks wired into a Runtime. Nil hooks are silently skipped.

type JobSubmitter

type JobSubmitter interface {
	Submit(TranscriptionJob) error
}

JobSubmitter accepts a TranscriptionJob for async processing.

type Persistence

type Persistence interface {
	QuickNoteStore
	TranscriptionStore
}

Persistence combines QuickNoteStore and TranscriptionStore.

type QuickNoteStore

type QuickNoteStore interface {
	SaveQuickNote(ctx context.Context, text, language, provider string, durationMs, latencyMs int64, audioData []byte) (int64, error)
	GetQuickNoteText(ctx context.Context, id int64) (string, error)
	UpdateQuickNote(ctx context.Context, id int64, text string) error
	UpdateQuickNoteCapture(ctx context.Context, id int64, text, provider string, durationMs, latencyMs int64, audioData []byte) error
}

QuickNoteStore persists and retrieves Quick Note records.

type RecordingController

type RecordingController struct {
	// contains filtered or unexported fields
}

RecordingController manages the start/stop lifecycle of a single recording session and hands audio segments to the submission queue.

func NewRecordingController

func NewRecordingController(recorder AudioRecorder, submitter JobSubmitter, observer RecordingObserver, segmenterFactory SegmentCollectorFactory) *RecordingController

func (*RecordingController) IsRecording

func (c *RecordingController) IsRecording() bool

func (*RecordingController) Start

func (*RecordingController) Stop

type RecordingObserver

type RecordingObserver interface {
	OnState(status, text string)
	OnLog(message, kind string)
}

type RecordingStartOptions

type RecordingStartOptions struct {
	Label       string
	Target      any
	Language    string
	QuickNote   bool
	QuickNoteID int64
}

type RecordingStopOptions

type RecordingStopOptions struct {
	Label string
}

type Runtime

type Runtime struct {
	// contains filtered or unexported fields
}

Runtime manages shared observable state and event delivery for a SpeechKit session. Create one with NewRuntime and wire it into the host application via Runtime.Events and Runtime.Commands.

func NewRuntime

func NewRuntime(initial Snapshot, hooks Hooks) *Runtime

func (*Runtime) Close

func (r *Runtime) Close()

func (*Runtime) Commands

func (r *Runtime) Commands() CommandBus

func (*Runtime) Events

func (r *Runtime) Events() <-chan Event

func (*Runtime) Publish

func (r *Runtime) Publish(event Event) bool

func (*Runtime) SetState

func (r *Runtime) SetState(snapshot Snapshot)

func (*Runtime) Start

func (r *Runtime) Start(ctx context.Context) error

func (*Runtime) State

func (r *Runtime) State() Snapshot

func (*Runtime) Stop

func (r *Runtime) Stop(ctx context.Context) error

func (*Runtime) UpdateState

func (r *Runtime) UpdateState(update func(*Snapshot)) Snapshot

type SegmentCollector

type SegmentCollector interface {
	FeedPCM([]byte) error
	CollectStopSegments(fullPCM []byte) ([]dictation.Segment, error)
}

SegmentCollector accumulates real-time PCM frames and splits them into dictation segments when recording stops.

type SegmentCollectorFactory

type SegmentCollectorFactory func() SegmentCollector

type Snapshot

type Snapshot struct {
	Status                string
	Text                  string
	Level                 float64
	Hotkey                string
	ActiveMode            string
	Providers             []string
	ActiveProfiles        map[string]string
	Transcriptions        int
	QuickNoteMode         bool
	QuickCaptureMode      bool
	LastTranscriptionText string
}

Snapshot is a point-in-time copy of the Runtime's observable state. All slice and map fields are safe to read without holding any lock.

func (Snapshot) Clone

func (s Snapshot) Clone() Snapshot

type Submission

type Submission struct {
	PCM          []byte
	WAV          []byte
	DurationSecs float64
	Language     string
	Prefix       string
	QuickNote    bool
	QuickNoteID  int64
}

Submission carries a single audio segment and its metadata into the transcription pipeline.

type Transcriber

type Transcriber interface {
	Transcribe(ctx context.Context, audio []byte, durationSecs float64, language string) (Transcript, error)
}

Transcriber converts raw WAV audio into a Transcript.

type Transcript

type Transcript struct {
	Text       string
	Language   string
	Duration   time.Duration
	Provider   string
	Model      string
	Confidence float64
}

Transcript holds the result of a single transcription call.

type TranscriptInterceptor

type TranscriptInterceptor interface {
	Intercept(ctx context.Context, transcript Transcript, target any) (bool, error)
}

TranscriptInterceptor can handle a transcript before it reaches the normal output path. Return (true, nil) to signal that the transcript was consumed.

type TranscriptOutput

type TranscriptOutput interface {
	Deliver(ctx context.Context, transcript Transcript, target any) error
}

TranscriptOutput delivers a completed Transcript to the host application (e.g. clipboard injection or text-field paste).

type TranscriptionJob

type TranscriptionJob struct {
	Submission
	Target any
}

TranscriptionJob pairs a Submission with its delivery target.

func (TranscriptionJob) Clone

type TranscriptionObserver

type TranscriptionObserver interface {
	OnState(status, text string)
	OnLog(message, kind string)
	OnTranscriptCommitted(transcript Transcript, quickNote bool)
}

TranscriptionObserver receives real-time status and log updates from a TranscriptionWorker during processing.

type TranscriptionRunner

type TranscriptionRunner struct {
	// contains filtered or unexported fields
}

TranscriptionRunner transcribes audio submissions and persists results. Create one with NewTranscriptionRunner.

func NewTranscriptionRunner

func NewTranscriptionRunner(transcriber Transcriber, store Persistence) *TranscriptionRunner

NewTranscriptionRunner creates a TranscriptionRunner backed by the given transcriber and persistence store. Either argument may be nil.

func (*TranscriptionRunner) Commit

func (r *TranscriptionRunner) Commit(ctx context.Context, submission Submission, transcript Transcript) (Completion, error)

func (*TranscriptionRunner) WithObserver

func (r *TranscriptionRunner) WithObserver(observer CommitObserver) *TranscriptionRunner

type TranscriptionStore

type TranscriptionStore interface {
	SaveTranscription(ctx context.Context, text, language, provider, model string, durationMs, latencyMs int64, audioData []byte) error
}

TranscriptionStore persists completed dictation transcriptions.

type TranscriptionWorker

type TranscriptionWorker struct {
	// contains filtered or unexported fields
}

TranscriptionWorker processes TranscriptionJob values from an internal queue on a single goroutine. Start it with TranscriptionWorker.Start and submit work with TranscriptionWorker.Submit.

func (*TranscriptionWorker) Close

func (w *TranscriptionWorker) Close()

func (*TranscriptionWorker) Start

func (w *TranscriptionWorker) Start(ctx context.Context)

func (*TranscriptionWorker) Submit

func (*TranscriptionWorker) Wait

func (w *TranscriptionWorker) Wait()

type TranscriptionWorkerConfig

type TranscriptionWorkerConfig struct {
	Timeout     time.Duration
	QueueSize   int
	Runner      *TranscriptionRunner
	Output      TranscriptOutput
	Interceptor TranscriptInterceptor
	Observer    TranscriptionObserver
}

TranscriptionWorkerConfig configures a TranscriptionWorker. Runner is required; all other fields are optional.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL