speechkit

package
v0.40.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package speechkit provides the public SDK for embedding SpeechKit voice capture, transcription, and assist/voice-agent pipelines into host applications.

Surface

The kernel exposes three strict modes:

  • Dictation — speech to text only, no AI rewriting.
  • Assist — speech (or text) to a one-shot result, with optional TTS.
  • Voice Agent — realtime audio-to-audio dialogue.

Each mode is constructed via a small subpackage so host apps depend only on what they use:

Central types in this package

Runtime owns shared state and the event channel that host apps read from. Engine is the full voice pipeline; RecordingController and TranscriptionWorker can be composed independently for custom pipelines. [Catalog] exposes the provider/model/mode metadata for setup UIs and readiness checks; RuntimePolicy lets the host pin profiles and gate fallbacks.

Stability

pkg/speechkit is the OSS public surface. Symbols here follow semver from v1.0 onward. Before v1.0 the surface may still evolve — see CHANGELOG.md and the release notes for breaking-change calls.

Package-level documentation lives in doc.go.

Index

Constants

View Source
const (
	AudioSampleRate     = 16000
	AudioChannels       = 1
	AudioBitsPerSample  = 16
	AudioBytesPerSample = AudioBitsPerSample / 8
)
View Source
const (
	DefaultDictationMinSegment = 1200 * time.Millisecond
	DefaultDictationPadding    = 160 * time.Millisecond
	DefaultDictationOverlap    = 200 * time.Millisecond
)
View Source
const DefaultLocalBuiltInLLMModel = "ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M"
View Source
const DefaultMinPCMBytes = 3200
View Source
const DefaultProcessingMessage = "Recording stopped · Transcribing"
View Source
const ReadinessSchemaVersion = "provider-readiness.v1"

Variables

View Source
var (
	ErrMissingRunner      = errors.New("speechkit: transcription worker requires a runner")
	ErrMissingTranscriber = errors.New("speechkit: transcription runner requires a transcriber")
	ErrWorkerClosed       = errors.New("speechkit: transcription worker is closed")
	ErrWorkerQueueFull    = errors.New("speechkit: transcription worker queue is full")
)
View Source
var ErrCommandHandlerUnavailable = errors.New("speechkit: no command handler configured")

ErrCommandHandlerUnavailable is returned by [CommandBus.Dispatch] when no command handler has been configured on the Runtime.

Functions

func PCMDurationSecs added in v0.24.0

func PCMDurationSecs(pcm []byte) float64

PCMDurationSecs returns the duration of 16kHz S16 mono PCM audio in seconds.

func PCMToWAV added in v0.24.0

func PCMToWAV(pcm []byte) []byte

PCMToWAV wraps raw 16kHz S16 mono PCM data in a WAV header.

func ValidateDefaultCatalog added in v0.24.0

func ValidateDefaultCatalog() error

ValidateDefaultCatalog verifies the framework invariant that every strict mode exposes all four provider groups and every visible profile satisfies its mode contract. v0.37 added ModeTTS as a model-selection axis with the same four-provider-group invariant (Local Built-in via Piper, Local Provider via Kokoro/openedai-speech, Cloud Provider via Hugging Face Parler, Direct Provider via OpenAI + Google).

func ValidateModeSettingsForPolicy added in v0.24.0

func ValidateModeSettingsForPolicy(profiles []ProviderProfile, settings ModeSettings, policy RuntimePolicy) error

ValidateModeSettingsForPolicy checks mode selections against a RuntimePolicy.

func ValidateProfileForMode added in v0.24.0

func ValidateProfileForMode(profile ProviderProfile, mode Mode) error

ValidateProfileForMode checks the stable v23 mode capability contract.

func ValidateRuntimePolicy added in v0.24.0

func ValidateRuntimePolicy(profiles []ProviderProfile, policy RuntimePolicy) error

ValidateRuntimePolicy checks that a policy references existing profiles and does not require a profile that violates its mode contract.

Types

type AssistRequest added in v0.24.0

type AssistRequest struct {
	Text              string `json:"text"`
	Locale            string `json:"locale,omitempty"`
	Selection         string `json:"selection,omitempty"`
	Context           string `json:"context,omitempty"`
	EditableTarget    bool   `json:"editableTarget,omitempty"`
	ProviderProfileID string `json:"providerProfileId,omitempty"`
	SessionKey        string `json:"sessionKey,omitempty"`
}

AssistRequest is the mode-scoped input for Assist integrations.

type AssistResult added in v0.24.0

type AssistResult struct {
	Text       string                `json:"text"`
	SpeakText  string                `json:"speakText,omitempty"`
	Action     string                `json:"action,omitempty"`
	Kind       string                `json:"kind,omitempty"`
	Surface    AssistSurfaceDecision `json:"surface"`
	ShortcutID string                `json:"shortcutId,omitempty"`
	Locale     string                `json:"locale,omitempty"`
	Audio      *AudioData            `json:"audio,omitempty"`
	Format     string                `json:"format,omitempty"`
}

AssistResult is the public one-shot output contract for Assist Mode.

type AssistService added in v0.24.0

type AssistService interface {
	Process(context.Context, AssistRequest) (AssistResult, error)
}

AssistService is the mode-scoped SDK contract for one-shot utilities and work-product generation.

type AssistSetting added in v0.24.0

type AssistSetting struct {
	ModeSetting
	TTSEnabled      bool   `json:"ttsEnabled"`
	UtilityRegistry string `json:"utilityRegistry,omitempty"`
}

type AssistSurfaceDecision added in v0.24.0

type AssistSurfaceDecision string

AssistSurfaceDecision describes where an Assist result should be presented.

const (
	AssistSurfacePanel     AssistSurfaceDecision = "panel"
	AssistSurfaceInsert    AssistSurfaceDecision = "insert"
	AssistSurfaceReplace   AssistSurfaceDecision = "replace"
	AssistSurfaceActionAck AssistSurfaceDecision = "action_ack"
	AssistSurfaceSilent    AssistSurfaceDecision = "silent"
)

type AudioData added in v0.40.1

type AudioData []byte

AudioData carries optional synthesized audio without making AssistResult non-comparable for existing SDK consumers.

func NewAudioData added in v0.40.1

func NewAudioData(data []byte) *AudioData

func (*AudioData) Bytes added in v0.40.1

func (a *AudioData) Bytes() []byte

func (*AudioData) Len added in v0.40.1

func (a *AudioData) Len() int

type AudioRecorder

type AudioRecorder interface {
	Start() error
	Stop() ([]byte, error)
	SetPCMHandler(func([]byte))
}

AudioRecorder is the hardware abstraction for microphone capture.

type AudioSegment added in v0.24.0

type AudioSegment struct {
	PCM       []byte
	Duration  time.Duration
	Paragraph bool
	Final     bool
}

AudioSegment is a transcribable utterance extracted from a dictation recording. PCM is raw 16kHz S16 mono audio.

func FallbackDictationSegments

func FallbackDictationSegments(fullPCM []byte) []AudioSegment

FallbackDictationSegments wraps all of fullPCM in a single segment. Used when VAD-based segmentation is unavailable or produces no output.

type Capability added in v0.24.0

type Capability string

Capability is a mode capability declared by a provider profile.

const (
	CapabilityTranscription         Capability = "transcription"
	CapabilitySTT                   Capability = "stt"
	CapabilityAudioInput            Capability = "audio_input"
	CapabilityLLM                   Capability = "llm"
	CapabilityTTS                   Capability = "tts"
	CapabilityRealtimeAudio         Capability = "realtime_audio"
	CapabilityPipelineFallback      Capability = "pipeline_fallback"
	CapabilityToolCalling           Capability = "tool_calling"
	CapabilityDictionaryPrompt      Capability = "dictionary_prompt"
	CapabilityDictionaryNativeHints Capability = "dictionary_native_hints"
	CapabilitySessionSummary        Capability = "session_summary"
)

func RequiredCapabilities added in v0.24.0

func RequiredCapabilities(mode Mode, nativeRealtime bool) []Capability

RequiredCapabilities returns the minimum capability set for a profile to satisfy a mode contract.

type Command

type Command struct {
	Type     CommandType
	Text     string
	NoteID   int64
	Target   string
	Metadata map[string]string
}

Command is a request dispatched through the CommandBus.

func (Command) Clone

func (c Command) Clone() Command

type CommandBus

type CommandBus interface {
	Dispatch(context.Context, Command) error
}

CommandBus delivers Command values to the registered handler.

type CommandType

type CommandType string

CommandType identifies the action a Command requests.

const (
	CommandShowDashboard           CommandType = "dashboard.show"
	CommandStartDictation          CommandType = "dictation.start"
	CommandStopDictation           CommandType = "dictation.stop"
	CommandStartMode               CommandType = "mode.start"
	CommandStopMode                CommandType = "mode.stop"
	CommandSetActiveMode           CommandType = "mode.set_active"
	CommandOpenQuickNote           CommandType = "quicknote.open"
	CommandOpenQuickCapture        CommandType = "quicknote.capture.open"
	CommandCloseQuickCapture       CommandType = "quicknote.capture.close"
	CommandArmQuickNoteRecording   CommandType = "quicknote.record.arm"
	CommandCopyLastTranscription   CommandType = "transcription.copy_last"
	CommandInsertLastTranscription CommandType = "transcription.insert_last"
	CommandSummarizeSelection      CommandType = "selection.summarize"
)

type CommitObserver

type CommitObserver interface {
	OnCommit(completion Completion)
}

CommitObserver is notified after each successful TranscriptionRunner.Commit.

type Completion

type Completion struct {
	Transcript             Transcript
	QuickNoteCommitted     bool
	QuickNoteCreated       bool
	QuickNoteID            int64
	TranscriptionPersisted bool
}

Completion describes the outcome of a TranscriptionRunner.Commit call.

type DictationRun added in v0.24.0

type DictationRun struct {
	ID               string     `json:"id,omitempty"`
	Transcript       Transcript `json:"transcript"`
	StartedAt        time.Time  `json:"startedAt,omitempty"`
	CompletedAt      time.Time  `json:"completedAt,omitempty"`
	ProviderProfile  string     `json:"providerProfile,omitempty"`
	DictionaryTerms  []string   `json:"dictionaryTerms,omitempty"`
	AudioDurationMs  int64      `json:"audioDurationMs,omitempty"`
	ProcessingTimeMs int64      `json:"processingTimeMs,omitempty"`
}

DictationRun is the public record produced by a completed Dictation request. Hosts may persist it directly or map it into their own history model.

type DictationSegmenter

type DictationSegmenter struct {
	// contains filtered or unexported fields
}

DictationSegmenter implements SegmentCollector using VAD-based pause detection to split continuous speech into discrete segments.

func NewDictationSegmenter

func NewDictationSegmenter(detector VoiceActivityDetector, pauseThreshold time.Duration) *DictationSegmenter

func (*DictationSegmenter) CollectStopSegments

func (s *DictationSegmenter) CollectStopSegments(fullPCM []byte) ([]AudioSegment, error)

func (*DictationSegmenter) FeedPCM

func (s *DictationSegmenter) FeedPCM(pcm []byte) error

func (*DictationSegmenter) IdleSince added in v0.35.21

func (s *DictationSegmenter) IdleSince() time.Time

IdleSince returns the wall-clock time at which the segmenter most recently transitioned out of speech (or, for a fresh session that has not yet seen speech, the construction time). Returns the zero value when speech is currently being captured — the poller treats zero as "user is actively speaking, silence timer should reset."

Satisfies the IdleObserver contract consumed by RecordingController to drive silence-based auto-stop.

type DictationService added in v0.24.0

type DictationService interface {
	Start(context.Context) error
	Stop(context.Context) (DictationRun, error)
}

DictationService is the mode-scoped SDK contract for text-only dictation.

type DictationSetting added in v0.24.0

type DictationSetting struct {
	ModeSetting
	DictionaryEnabled bool `json:"dictionaryEnabled"`
}

type Engine

type Engine interface {
	Start(context.Context) error
	Stop(context.Context) error
	Events() <-chan Event
	Commands() CommandBus
	State() Snapshot
}

Engine is the interface implemented by a full SpeechKit voice pipeline.

type Event

type Event struct {
	Type      EventType
	Time      time.Time
	Message   string
	Text      string
	Provider  string
	Mode      string
	SessionID string
	QuickNote bool
	Err       error
	Shortcut  string
	Metadata  *Metadata
}

Event is a notification published to the event channel returned by Runtime.Events. Consumers should switch on Type and inspect the relevant fields.

func (Event) Clone added in v0.40.1

func (e Event) Clone() Event

type EventType

type EventType string

EventType identifies the kind of event published to the event channel.

const (
	EventStateChanged            EventType = "state.changed"
	EventRecordingStarted        EventType = "recording.started"
	EventProcessingStarted       EventType = "processing.started"
	EventTranscriptionReady      EventType = "transcription.ready"
	EventTranscriptCommitted     EventType = "transcription.committed"
	EventQuickNoteModeArmed      EventType = "quicknote.mode_armed"
	EventQuickNoteUpdated        EventType = "quicknote.updated"
	EventWarningRaised           EventType = "warning.raised"
	EventErrorRaised             EventType = "error.raised"
	EventShortcutMatched         EventType = "shortcut.matched"
	EventWakeFired               EventType = "wake.fired"
	EventSkillExecuted           EventType = "skill.executed"
	EventCompanionSessionStarted EventType = "companion.session.started"
	EventCompanionSessionEnded   EventType = "companion.session.ended"
	EventVoiceAgentTurnFinalized EventType = "voiceagent.turn.finalized"
	EventTTSStarted              EventType = "tts.started"
	EventTTSFinished             EventType = "tts.finished"
)

type ExecutionMode added in v0.24.0

type ExecutionMode string

ExecutionMode describes the technical runtime behind a provider profile.

const (
	ExecutionModeLocal          ExecutionMode = "local"
	ExecutionModeSelfHostedHTTP ExecutionMode = "self_hosted_http"
	ExecutionModeHFRouted       ExecutionMode = "hf_routed"
	ExecutionModeOpenAI         ExecutionMode = "openai_api"
	ExecutionModeGroq           ExecutionMode = "groq_api"
	ExecutionModeGoogle         ExecutionMode = "google_api"
	ExecutionModeOllama         ExecutionMode = "ollama_local"
	ExecutionModeOpenRouter     ExecutionMode = "openrouter_api"
)

type Hooks

type Hooks struct {
	Start         func(context.Context) error
	Stop          func(context.Context) error
	HandleCommand func(context.Context, Command) error
}

Hooks are the lifecycle callbacks wired into a Runtime. Nil hooks are silently skipped.

type IdleObserver added in v0.35.21

type IdleObserver interface {
	IdleSince() time.Time
}

IdleObserver is implemented by SegmentCollectors that want to drive silence-based auto-stop. Returning the zero value tells the watcher "user is actively speaking; reset the timer." Returning a non-zero time tells the watcher "user has been silent since T."

type IntelligenceKind added in v0.24.0

type IntelligenceKind string

IntelligenceKind names the mode-specific intelligence contract.

const (
	IntelligenceUser          IntelligenceKind = "user"
	IntelligenceUtility       IntelligenceKind = "utility"
	IntelligenceBrainstorming IntelligenceKind = "brainstorming"
	// IntelligenceVoiceOutput is the contract for the TTS mode: render
	// generated text to audio. No user intelligence, no utility tools,
	// no brainstorming — strictly text in, audio out.
	IntelligenceVoiceOutput IntelligenceKind = "voice_output"
)

type JobSubmitter

type JobSubmitter interface {
	Submit(TranscriptionJob) error
}

JobSubmitter accepts a TranscriptionJob for async processing.

type Metadata added in v0.40.1

type Metadata map[string]string

Metadata carries optional event key/value data without making Event non-comparable for existing SDK consumers.

func NewMetadata added in v0.40.1

func NewMetadata(values map[string]string) *Metadata

func (*Metadata) Clone added in v0.40.1

func (m *Metadata) Clone() *Metadata

func (*Metadata) Get added in v0.40.1

func (m *Metadata) Get(key string) string

func (*Metadata) Map added in v0.40.1

func (m *Metadata) Map() map[string]string

type Mode added in v0.24.0

type Mode string

Mode identifies one of SpeechKit's strict product modes.

const (
	ModeNone       Mode = "none"
	ModeDictation  Mode = "dictation"
	ModeAssist     Mode = "assist"
	ModeVoiceAgent Mode = "voice_agent"
	// ModeTTS exposes Text-to-Speech as a first-class model-selection axis
	// alongside the three product modes. The host activates an Assist or
	// Voice-Agent session and the TTS profile selected here drives which
	// provider speaks the response. v0.37 introduced this alongside the
	// Voice-Companion hands-free flow — Thalia + Companion Live need a
	// stable place to pin a TTS voice across deployments.
	ModeTTS Mode = "tts"
)

func NormalizeMode added in v0.24.0

func NormalizeMode(mode Mode) Mode

type ModeBehavior added in v0.24.0

type ModeBehavior string

ModeBehavior describes how much mode-specific intelligence a host enables.

const (
	// ModeBehaviorClean keeps a mode on its core contract, such as strict STT
	// for Dictation or deterministic utility handling for Assist.
	ModeBehaviorClean ModeBehavior = "clean"
	// ModeBehaviorIntelligence allows optional intelligence layers such as
	// LLM utility handling, TTS, summaries, or realtime tool use.
	ModeBehaviorIntelligence ModeBehavior = "intelligence"
)

type ModeContract added in v0.24.0

type ModeContract struct {
	Mode         Mode             `json:"mode"`
	Intelligence IntelligenceKind `json:"intelligence"`
	Input        string           `json:"input"`
	Output       string           `json:"output"`
	Allowed      []Capability     `json:"allowed"`
	Forbidden    []Capability     `json:"forbidden"`
}

ModeContract documents what a mode may and may not do. Hosts can use this to validate custom adapters before exposing them to users.

func DefaultModeContracts added in v0.24.0

func DefaultModeContracts() []ModeContract

type ModeSetting added in v0.24.0

type ModeSetting struct {
	Enabled           bool   `json:"enabled"`
	Hotkey            string `json:"hotkey,omitempty"`
	HotkeyBehavior    string `json:"hotkeyBehavior,omitempty"`
	PrimaryProfileID  string `json:"primaryProfileId,omitempty"`
	FallbackProfileID string `json:"fallbackProfileId,omitempty"`
	// ModeSource is "local" (default) or "server". When "server", this mode
	// runs against the speechkit-server pointed to by ServerConnection
	// instead of the in-process Framework kernel. Empty/missing is treated
	// as "local" for backwards compatibility with pre-0.26 hosts.
	ModeSource string `json:"modeSource,omitempty"`
}

ModeSetting is the public per-mode configuration shape used by the SDK and the versioned HTTP control plane.

type ModeSettings added in v0.24.0

type ModeSettings struct {
	Dictation        DictationSetting        `json:"dictation"`
	Assist           AssistSetting           `json:"assist"`
	VoiceAgent       VoiceAgentSetting       `json:"voiceAgent"`
	ServerConnection ServerConnectionSetting `json:"serverConnection"`
}

type ModelVariant added in v0.24.0

type ModelVariant struct {
	ID          string `json:"id"`
	Name        string `json:"name"`
	ModelID     string `json:"modelId"`
	Description string `json:"description,omitempty"`
	Recommended bool   `json:"recommended,omitempty"`
}

ModelVariant is a concrete model choice inside a provider profile group.

type Persistence

type Persistence interface {
	QuickNoteStore
	TranscriptionStore
}

Persistence combines QuickNoteStore and TranscriptionStore.

type ProviderKind added in v0.24.0

type ProviderKind string

ProviderKind is the product-facing provider group shown for every mode.

const (
	ProviderKindLocalBuiltIn   ProviderKind = "local_built_in"
	ProviderKindLocalProvider  ProviderKind = "local_provider"
	ProviderKindCloudProvider  ProviderKind = "cloud_provider"
	ProviderKindDirectProvider ProviderKind = "direct_provider"
)

func ProviderKindsForMode added in v0.24.0

func ProviderKindsForMode(mode Mode) []ProviderKind

type ProviderProfile added in v0.24.0

type ProviderProfile struct {
	ID             string         `json:"id"`
	Mode           Mode           `json:"mode"`
	Name           string         `json:"name"`
	ProviderKind   ProviderKind   `json:"providerKind"`
	ExecutionMode  ExecutionMode  `json:"executionMode,omitempty"`
	ModelID        string         `json:"modelId,omitempty"`
	Source         string         `json:"source,omitempty"`
	Description    string         `json:"description,omitempty"`
	License        string         `json:"license,omitempty"`
	Capabilities   []Capability   `json:"capabilities,omitempty"`
	AdapterKind    string         `json:"adapterKind,omitempty"`
	Variants       []ModelVariant `json:"variants,omitempty"`
	AllowInference bool           `json:"inferenceAllowed,omitempty"`
	Default        bool           `json:"default,omitempty"`
	Recommended    bool           `json:"recommended,omitempty"`
	Experimental   bool           `json:"experimental,omitempty"`
}

ProviderProfile is the public catalog entry host applications can present or activate. ProviderKind is the stable user-facing grouping; ExecutionMode is the technical adapter underneath it.

func DefaultProviderProfiles added in v0.24.0

func DefaultProviderProfiles() []ProviderProfile

DefaultProviderProfiles returns the built-in framework provider catalog for the three strict SpeechKit modes. The Windows desktop host adapts this public catalog into its internal runtime model; the catalog itself belongs to the reusable framework layer.

func FilterProviderProfiles added in v0.24.0

func FilterProviderProfiles(profiles []ProviderProfile, policy RuntimePolicy) []ProviderProfile

FilterProviderProfiles returns the profiles visible under policy.

func ProfilesForMode added in v0.24.0

func ProfilesForMode(mode Mode) []ProviderProfile

func (ProviderProfile) HasCapability added in v0.24.0

func (p ProviderProfile) HasCapability(capability Capability) bool

type QuickNoteStore

type QuickNoteStore interface {
	SaveQuickNote(ctx context.Context, text, language, provider string, durationMs, latencyMs int64, audioData []byte) (int64, error)
	GetQuickNoteText(ctx context.Context, id int64) (string, error)
	UpdateQuickNote(ctx context.Context, id int64, text string) error
	UpdateQuickNoteCapture(ctx context.Context, id int64, text, provider string, durationMs, latencyMs int64, audioData []byte) error
}

QuickNoteStore persists and retrieves Quick Note records.

type Readiness added in v0.24.0

type Readiness struct {
	SchemaVersion    string                 `json:"schemaVersion,omitempty"`
	ProfileID        string                 `json:"profileId"`
	Mode             Mode                   `json:"mode"`
	ProviderKind     ProviderKind           `json:"providerKind"`
	ExecutionMode    ExecutionMode          `json:"executionMode,omitempty"`
	ModelID          string                 `json:"modelId,omitempty"`
	Source           string                 `json:"source,omitempty"`
	Active           bool                   `json:"active"`
	Default          bool                   `json:"default"`
	Configured       bool                   `json:"configured"`
	CredentialsReady bool                   `json:"credentialsReady"`
	RuntimeReady     bool                   `json:"runtimeReady"`
	CapabilityReady  bool                   `json:"capabilityReady"`
	Ready            bool                   `json:"ready"`
	Missing          []string               `json:"missing,omitempty"`
	Requirements     []ReadinessRequirement `json:"requirements,omitempty"`
	Actions          []ReadinessAction      `json:"actions,omitempty"`
	Artifacts        []ReadinessArtifact    `json:"artifacts,omitempty"`
}

Readiness describes whether a provider profile can be used right now.

type ReadinessAction added in v0.24.0

type ReadinessAction struct {
	ID     string `json:"id"`
	Label  string `json:"label"`
	Kind   string `json:"kind"`
	Target string `json:"target,omitempty"`
}

ReadinessAction describes the next setup command a host can expose when a requirement is not ready.

type ReadinessArtifact added in v0.24.0

type ReadinessArtifact struct {
	ID             string `json:"id"`
	Name           string `json:"name"`
	Kind           string `json:"kind"`
	SizeLabel      string `json:"sizeLabel,omitempty"`
	SizeBytes      int64  `json:"sizeBytes,omitempty"`
	Available      bool   `json:"available"`
	Selected       bool   `json:"selected"`
	RuntimeReady   bool   `json:"runtimeReady,omitempty"`
	RuntimeProblem string `json:"runtimeProblem,omitempty"`
	Recommended    bool   `json:"recommended,omitempty"`
}

ReadinessArtifact describes downloadable or pullable model artifacts tied to a provider profile. Local Built-in profiles use this to expose concrete model choices through the same readiness API as credentials and runtime checks.

type ReadinessRequirement added in v0.24.0

type ReadinessRequirement struct {
	ID       string `json:"id"`
	Label    string `json:"label"`
	Category string `json:"category"`
	Required bool   `json:"required"`
	Ready    bool   `json:"ready"`
	Missing  string `json:"missing,omitempty"`
}

ReadinessRequirement is a machine-readable setup check for a provider profile. Hosts can render these checks directly instead of hard-coding provider-specific setup rules.

type RecordingController

type RecordingController struct {
	// contains filtered or unexported fields
}

RecordingController manages the start/stop lifecycle of a single recording session and hands audio segments to the submission queue.

func NewRecordingController

func NewRecordingController(recorder AudioRecorder, submitter JobSubmitter, observer RecordingObserver, segmenterFactory SegmentCollectorFactory) *RecordingController

func (*RecordingController) IsRecording

func (c *RecordingController) IsRecording() bool

func (*RecordingController) SetIdleWatchInterval added in v0.35.21

func (c *RecordingController) SetIdleWatchInterval(d time.Duration)

SetIdleWatchInterval overrides the polling interval used by the silence-based auto-stop watcher. Tests use this to keep the unit tests fast (e.g. 5ms polling). Production should never touch this.

func (*RecordingController) Start

func (*RecordingController) Stop

type RecordingObserver

type RecordingObserver interface {
	OnState(status, text string)
	OnLog(message, kind string)
}

type RecordingStartOptions

type RecordingStartOptions struct {
	Label       string
	Target      any
	Language    string
	QuickNote   bool
	QuickNoteID int64
	// IdleTimeout, when greater than zero AND the underlying collector
	// implements [IdleObserver], arms a watcher that calls
	// OnIdleTimeoutCallback once the user has been silent for this long.
	// Zero (default) disables the watcher — typical for hold-to-talk
	// hotkey sessions that already terminate on KeyUp.
	IdleTimeout time.Duration
	// OnIdleTimeoutCallback fires once if IdleTimeout elapses without
	// observed speech. Wired by the host to dispatch a Stop command so
	// the dictate session ends after a silence window. The watcher
	// guarantees at-most-one invocation per Start() call.
	OnIdleTimeoutCallback func()
}

type RecordingStopOptions

type RecordingStopOptions struct {
	Label string
}

type Runtime

type Runtime struct {
	// contains filtered or unexported fields
}

Runtime manages shared observable state and event delivery for a SpeechKit session. Create one with NewRuntime and wire it into the host application via Runtime.Events and Runtime.Commands.

func NewRuntime

func NewRuntime(initial Snapshot, hooks Hooks) *Runtime

func (*Runtime) Close

func (r *Runtime) Close()

func (*Runtime) Commands

func (r *Runtime) Commands() CommandBus

func (*Runtime) Events

func (r *Runtime) Events() <-chan Event

func (*Runtime) Publish

func (r *Runtime) Publish(event Event) bool

func (*Runtime) SetState

func (r *Runtime) SetState(snapshot Snapshot)

func (*Runtime) Start

func (r *Runtime) Start(ctx context.Context) error

func (*Runtime) State

func (r *Runtime) State() Snapshot

func (*Runtime) Stop

func (r *Runtime) Stop(ctx context.Context) error

func (*Runtime) UpdateState

func (r *Runtime) UpdateState(update func(*Snapshot)) Snapshot

type RuntimePolicy added in v0.24.0

type RuntimePolicy struct {
	EnabledModes    []Mode                `json:"enabledModes,omitempty"`
	AllowedProfiles []string              `json:"allowedProfiles,omitempty"`
	FixedProfiles   map[Mode]string       `json:"fixedProfiles,omitempty"`
	AllowFallbacks  bool                  `json:"allowFallbacks,omitempty"`
	ModeBehaviors   map[Mode]ModeBehavior `json:"modeBehaviors,omitempty"`
}

RuntimePolicy constrains which parts of the SpeechKit framework a host application exposes. Empty EnabledModes or AllowedProfiles mean "all".

type SegmentCollector

type SegmentCollector interface {
	FeedPCM([]byte) error
	CollectStopSegments(fullPCM []byte) ([]AudioSegment, error)
}

SegmentCollector accumulates real-time PCM frames and splits them into dictation segments when recording stops.

type SegmentCollectorFactory

type SegmentCollectorFactory func() SegmentCollector

type ServerConnectionSetting added in v0.26.0

type ServerConnectionSetting struct {
	Enabled           bool                     `json:"enabled"`
	ActiveTargetID    string                   `json:"activeTargetId,omitempty"`
	URL               string                   `json:"url"`
	BearerTokenEnv    string                   `json:"bearerTokenEnv,omitempty"`
	AuthMode          string                   `json:"authMode,omitempty"`
	BearerTokenSet    bool                     `json:"bearerTokenSet"`
	FallbackToLocal   bool                     `json:"fallbackToLocal"`
	RequestTimeoutSec int                      `json:"requestTimeoutSec"`
	Targets           []ServerConnectionTarget `json:"targets,omitempty"`
}

ServerConnectionSetting exposes the [server_connection] config section to the control-plane API + frontend. The bearer token is never sent across this boundary — only the env var name + connection metadata.

type ServerConnectionTarget added in v0.31.0

type ServerConnectionTarget struct {
	ID                string `json:"id"`
	Label             string `json:"label"`
	URL               string `json:"url"`
	AuthMode          string `json:"authMode"`
	BearerTokenEnv    string `json:"bearerTokenEnv,omitempty"`
	BearerTokenSet    bool   `json:"bearerTokenSet"`
	FallbackToLocal   bool   `json:"fallbackToLocal"`
	RequestTimeoutSec int    `json:"requestTimeoutSec"`
}

type Snapshot

type Snapshot struct {
	Status                string
	Text                  string
	Level                 float64
	Hotkey                string
	ActiveMode            string
	Providers             []string
	ActiveProfiles        map[string]string
	Transcriptions        int
	QuickNoteMode         bool
	QuickCaptureMode      bool
	LastTranscriptionText string
}

Snapshot is a point-in-time copy of the Runtime's observable state. All slice and map fields are safe to read without holding any lock.

func (Snapshot) Clone

func (s Snapshot) Clone() Snapshot

type Submission

type Submission struct {
	PCM          []byte
	WAV          []byte
	DurationSecs float64
	Language     string
	Prefix       string
	QuickNote    bool
	QuickNoteID  int64
}

Submission carries a single audio segment and its metadata into the transcription pipeline.

type Transcriber

type Transcriber interface {
	Transcribe(ctx context.Context, audio []byte, durationSecs float64, language string) (Transcript, error)
}

Transcriber converts raw WAV audio into a Transcript.

type Transcript

type Transcript struct {
	Text       string
	Language   string
	Duration   time.Duration
	Provider   string
	Model      string
	Confidence float64
}

Transcript holds the result of a single transcription call.

type TranscriptInterceptor

type TranscriptInterceptor interface {
	Intercept(ctx context.Context, transcript Transcript, target any) (bool, error)
}

TranscriptInterceptor can handle a transcript before it reaches the normal output path. Return (true, nil) to signal that the transcript was consumed.

type TranscriptOutput

type TranscriptOutput interface {
	Deliver(ctx context.Context, transcript Transcript, target any) error
}

TranscriptOutput delivers a completed Transcript to the host application (e.g. clipboard injection or text-field paste).

type TranscriptionJob

type TranscriptionJob struct {
	Submission
	Target any
}

TranscriptionJob pairs a Submission with its delivery target.

func (TranscriptionJob) Clone

type TranscriptionObserver

type TranscriptionObserver interface {
	OnState(status, text string)
	OnLog(message, kind string)
	OnTranscriptCommitted(transcript Transcript, quickNote bool)
}

TranscriptionObserver receives real-time status and log updates from a TranscriptionWorker during processing.

type TranscriptionRunner

type TranscriptionRunner struct {
	// contains filtered or unexported fields
}

TranscriptionRunner transcribes audio submissions and persists results. Create one with NewTranscriptionRunner.

func NewTranscriptionRunner

func NewTranscriptionRunner(transcriber Transcriber, store Persistence) *TranscriptionRunner

NewTranscriptionRunner creates a TranscriptionRunner backed by the given transcriber and persistence store. Either argument may be nil.

func (*TranscriptionRunner) Commit

func (r *TranscriptionRunner) Commit(ctx context.Context, submission Submission, transcript Transcript) (Completion, error)

func (*TranscriptionRunner) WithObserver

func (r *TranscriptionRunner) WithObserver(observer CommitObserver) *TranscriptionRunner

type TranscriptionStore

type TranscriptionStore interface {
	SaveTranscription(ctx context.Context, text, language, provider, model string, durationMs, latencyMs int64, audioData []byte) error
}

TranscriptionStore persists completed dictation transcriptions.

type TranscriptionWorker

type TranscriptionWorker struct {
	// contains filtered or unexported fields
}

TranscriptionWorker processes TranscriptionJob values from an internal queue on a single goroutine. Start it with TranscriptionWorker.Start and submit work with TranscriptionWorker.Submit.

func (*TranscriptionWorker) Close

func (w *TranscriptionWorker) Close()

func (*TranscriptionWorker) Start

func (w *TranscriptionWorker) Start(ctx context.Context)

func (*TranscriptionWorker) Submit

func (*TranscriptionWorker) Wait

func (w *TranscriptionWorker) Wait()

type TranscriptionWorkerConfig

type TranscriptionWorkerConfig struct {
	Timeout     time.Duration
	QueueSize   int
	Runner      *TranscriptionRunner
	Output      TranscriptOutput
	Interceptor TranscriptInterceptor
	Observer    TranscriptionObserver
}

TranscriptionWorkerConfig configures a TranscriptionWorker. Runner is required; all other fields are optional.

type VoiceActivityDetector added in v0.24.0

type VoiceActivityDetector interface {
	ProcessFrame([]int16) (float32, error)
	Reset()
}

VoiceActivityDetector is the public VAD contract consumed by DictationSegmenter. It intentionally matches SpeechKit's internal Silero detector shape without exposing internal packages.

type VoiceAgentService added in v0.24.0

type VoiceAgentService interface {
	Start(context.Context) error
	Stop(context.Context) (VoiceAgentSession, error)
	SendText(context.Context, string) error
	CurrentSession(context.Context) (VoiceAgentSession, error)
}

VoiceAgentService is the mode-scoped SDK contract for realtime dialogue.

type VoiceAgentSession added in v0.24.0

type VoiceAgentSession struct {
	ID                string                   `json:"id,omitempty"`
	StartedAt         time.Time                `json:"startedAt,omitempty"`
	EndedAt           time.Time                `json:"endedAt,omitempty"`
	Locale            string                   `json:"locale,omitempty"`
	ProviderProfileID string                   `json:"providerProfileId,omitempty"`
	RuntimeKind       string                   `json:"runtimeKind,omitempty"`
	Turns             []VoiceAgentTurn         `json:"turns,omitempty"`
	Summary           VoiceAgentSessionSummary `json:"summary"`
}

VoiceAgentSession is the public record for a live dialogue.

type VoiceAgentSessionSummary added in v0.24.0

type VoiceAgentSessionSummary struct {
	Title         string   `json:"title,omitempty"`
	Summary       string   `json:"summary"`
	Ideas         []string `json:"ideas,omitempty"`
	Decisions     []string `json:"decisions,omitempty"`
	OpenQuestions []string `json:"openQuestions,omitempty"`
	NextSteps     []string `json:"nextSteps,omitempty"`
	RawText       string   `json:"rawText,omitempty"`
}

VoiceAgentSessionSummary is the structured handoff produced when a Voice Agent session ends.

type VoiceAgentSetting added in v0.24.0

type VoiceAgentSetting struct {
	ModeSetting
	SessionSummary   bool   `json:"sessionSummary"`
	PipelineFallback bool   `json:"pipelineFallback"`
	CloseBehavior    string `json:"closeBehavior,omitempty"`
	AgentProfileID   string `json:"agentProfileId,omitempty"`
	AgentSequenceID  string `json:"agentSequenceId,omitempty"`
}

type VoiceAgentTurn added in v0.24.0

type VoiceAgentTurn struct {
	Role      string    `json:"role"`
	Text      string    `json:"text"`
	CreatedAt time.Time `json:"createdAt,omitempty"`
}

VoiceAgentTurn is one finalized turn in a realtime or fallback dialogue.

Directories

Path Synopsis
Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts.
Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts.
Package assist provides an embeddable Assist Mode service.
Package assist provides an embeddable Assist Mode service.
genkitadapter
Package genkitadapter keeps Genkit-specific Assist wiring out of the core public assist package.
Package genkitadapter keeps Genkit-specific Assist wiring out of the core public assist package.
Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment).
Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment).
Package companion provides small composers for hands-free SpeechKit hosts.
Package companion provides small composers for hands-free SpeechKit hosts.
Package dictation provides an embeddable strict Dictation runtime.
Package dictation provides an embeddable strict Dictation runtime.
Package lifecycle owns mode start/stop orchestration and refcounted shared dependencies for SpeechKit hosts.
Package lifecycle owns mode start/stop orchestration and refcounted shared dependencies for SpeechKit hosts.
Package tts exposes the embeddable SpeechKit text-to-speech surface.
Package tts exposes the embeddable SpeechKit text-to-speech surface.
Package voiceagent provides an embeddable Voice Agent service.
Package voiceagent provides an embeddable Voice Agent service.
live
Package live exposes the low-level Voice Agent realtime-protocol types.
Package live exposes the low-level Voice Agent realtime-protocol types.
Package wakeword exposes embeddable SpeechKit wake-word contracts.
Package wakeword exposes embeddable SpeechKit wake-word contracts.
sherpa
Package sherpa exposes the sherpa-onnx wake-word detector adapter.
Package sherpa exposes the sherpa-onnx wake-word detector adapter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL