Documentation
¶
Overview ¶
Package speechkit provides the public SDK for embedding SpeechKit voice capture, transcription, and assist/voice-agent pipelines into host applications.
Surface ¶
The kernel exposes three strict modes:
- Dictation — speech to text only, no AI rewriting.
- Assist — speech (or text) to a one-shot result, with optional TTS.
- Voice Agent — realtime audio-to-audio dialogue.
Each mode is constructed via a small subpackage so host apps depend only on what they use:
- github.com/kombifyio/SpeechKit/pkg/speechkit/dictation
- github.com/kombifyio/SpeechKit/pkg/speechkit/assist
- github.com/kombifyio/SpeechKit/pkg/speechkit/voiceagent
- github.com/kombifyio/SpeechKit/pkg/speechkit/agentkit (tool registry, session memory, lifecycle hooks for Voice Agent hosts)
- github.com/kombifyio/SpeechKit/pkg/speechkit/client (HTTP client for talking to a remote SpeechKit Server)
Central types in this package ¶
Runtime owns shared state and the event channel that host apps read from. Engine is the full voice pipeline; RecordingController and TranscriptionWorker can be composed independently for custom pipelines. [Catalog] exposes the provider/model/mode metadata for setup UIs and readiness checks; RuntimePolicy lets the host pin profiles and gate fallbacks.
Stability ¶
pkg/speechkit is the OSS public surface. Symbols here follow semver from v1.0 onward. Before v1.0 the surface may still evolve — see CHANGELOG.md and the release notes for breaking-change calls.
Package-level documentation lives in doc.go.
Index ¶
- Constants
- Variables
- func PCMDurationSecs(pcm []byte) float64
- func PCMToWAV(pcm []byte) []byte
- func ValidateDefaultCatalog() error
- func ValidateModeSettingsForPolicy(profiles []ProviderProfile, settings ModeSettings, policy RuntimePolicy) error
- func ValidateProfileForMode(profile ProviderProfile, mode Mode) error
- func ValidateRuntimePolicy(profiles []ProviderProfile, policy RuntimePolicy) error
- type AssistRequest
- type AssistResult
- type AssistService
- type AssistSetting
- type AssistSurfaceDecision
- type AudioData
- type AudioRecorder
- type AudioSegment
- type Capability
- type Command
- type CommandBus
- type CommandType
- type CommitObserver
- type Completion
- type DictationRun
- type DictationSegmenter
- type DictationService
- type DictationSetting
- type Engine
- type Event
- type EventType
- type ExecutionMode
- type Hooks
- type IdleObserver
- type IntelligenceKind
- type JobSubmitter
- type Metadata
- type Mode
- type ModeBehavior
- type ModeContract
- type ModeSetting
- type ModeSettings
- type ModelVariant
- type Persistence
- type ProviderKind
- type ProviderProfile
- type QuickNoteStore
- type Readiness
- type ReadinessAction
- type ReadinessArtifact
- type ReadinessRequirement
- type RecordingController
- type RecordingObserver
- type RecordingStartOptions
- type RecordingStopOptions
- type Runtime
- func (r *Runtime) Close()
- func (r *Runtime) Commands() CommandBus
- func (r *Runtime) Events() <-chan Event
- func (r *Runtime) Publish(event Event) bool
- func (r *Runtime) SetState(snapshot Snapshot)
- func (r *Runtime) Start(ctx context.Context) error
- func (r *Runtime) State() Snapshot
- func (r *Runtime) Stop(ctx context.Context) error
- func (r *Runtime) UpdateState(update func(*Snapshot)) Snapshot
- type RuntimePolicy
- type SegmentCollector
- type SegmentCollectorFactory
- type ServerConnectionSetting
- type ServerConnectionTarget
- type Snapshot
- type Submission
- type Transcriber
- type Transcript
- type TranscriptInterceptor
- type TranscriptOutput
- type TranscriptionJob
- type TranscriptionObserver
- type TranscriptionRunner
- type TranscriptionStore
- type TranscriptionWorker
- type TranscriptionWorkerConfig
- type VoiceActivityDetector
- type VoiceAgentService
- type VoiceAgentSession
- type VoiceAgentSessionSummary
- type VoiceAgentSetting
- type VoiceAgentTurn
Constants ¶
const ( AudioSampleRate = 16000 AudioChannels = 1 AudioBitsPerSample = 16 AudioBytesPerSample = AudioBitsPerSample / 8 )
const ( DefaultDictationMinSegment = 1200 * time.Millisecond DefaultDictationPadding = 160 * time.Millisecond DefaultDictationOverlap = 200 * time.Millisecond )
const DefaultLocalBuiltInLLMModel = "ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M"
const DefaultMinPCMBytes = 3200
const DefaultProcessingMessage = "Recording stopped · Transcribing"
const ReadinessSchemaVersion = "provider-readiness.v1"
Variables ¶
var ( ErrMissingRunner = errors.New("speechkit: transcription worker requires a runner") ErrMissingTranscriber = errors.New("speechkit: transcription runner requires a transcriber") ErrWorkerClosed = errors.New("speechkit: transcription worker is closed") ErrWorkerQueueFull = errors.New("speechkit: transcription worker queue is full") )
ErrCommandHandlerUnavailable is returned by [CommandBus.Dispatch] when no command handler has been configured on the Runtime.
Functions ¶
func PCMDurationSecs ¶ added in v0.24.0
PCMDurationSecs returns the duration of 16kHz S16 mono PCM audio in seconds.
func ValidateDefaultCatalog ¶ added in v0.24.0
func ValidateDefaultCatalog() error
ValidateDefaultCatalog verifies the framework invariant that every strict mode exposes all four provider groups and every visible profile satisfies its mode contract. v0.37 added ModeTTS as a model-selection axis with the same four-provider-group invariant (Local Built-in via Piper, Local Provider via Kokoro/openedai-speech, Cloud Provider via Hugging Face Parler, Direct Provider via OpenAI + Google).
func ValidateModeSettingsForPolicy ¶ added in v0.24.0
func ValidateModeSettingsForPolicy(profiles []ProviderProfile, settings ModeSettings, policy RuntimePolicy) error
ValidateModeSettingsForPolicy checks mode selections against a RuntimePolicy.
func ValidateProfileForMode ¶ added in v0.24.0
func ValidateProfileForMode(profile ProviderProfile, mode Mode) error
ValidateProfileForMode checks the stable v23 mode capability contract.
func ValidateRuntimePolicy ¶ added in v0.24.0
func ValidateRuntimePolicy(profiles []ProviderProfile, policy RuntimePolicy) error
ValidateRuntimePolicy checks that a policy references existing profiles and does not require a profile that violates its mode contract.
Types ¶
type AssistRequest ¶ added in v0.24.0
type AssistRequest struct {
Text string `json:"text"`
Locale string `json:"locale,omitempty"`
Selection string `json:"selection,omitempty"`
Context string `json:"context,omitempty"`
EditableTarget bool `json:"editableTarget,omitempty"`
ProviderProfileID string `json:"providerProfileId,omitempty"`
SessionKey string `json:"sessionKey,omitempty"`
}
AssistRequest is the mode-scoped input for Assist integrations.
type AssistResult ¶ added in v0.24.0
type AssistResult struct {
Text string `json:"text"`
SpeakText string `json:"speakText,omitempty"`
Action string `json:"action,omitempty"`
Kind string `json:"kind,omitempty"`
Surface AssistSurfaceDecision `json:"surface"`
ShortcutID string `json:"shortcutId,omitempty"`
Locale string `json:"locale,omitempty"`
Audio *AudioData `json:"audio,omitempty"`
Format string `json:"format,omitempty"`
}
AssistResult is the public one-shot output contract for Assist Mode.
type AssistService ¶ added in v0.24.0
type AssistService interface {
Process(context.Context, AssistRequest) (AssistResult, error)
}
AssistService is the mode-scoped SDK contract for one-shot utilities and work-product generation.
type AssistSetting ¶ added in v0.24.0
type AssistSetting struct {
ModeSetting
TTSEnabled bool `json:"ttsEnabled"`
UtilityRegistry string `json:"utilityRegistry,omitempty"`
}
type AssistSurfaceDecision ¶ added in v0.24.0
type AssistSurfaceDecision string
AssistSurfaceDecision describes where an Assist result should be presented.
const ( AssistSurfacePanel AssistSurfaceDecision = "panel" AssistSurfaceInsert AssistSurfaceDecision = "insert" AssistSurfaceReplace AssistSurfaceDecision = "replace" AssistSurfaceActionAck AssistSurfaceDecision = "action_ack" AssistSurfaceSilent AssistSurfaceDecision = "silent" )
type AudioData ¶ added in v0.40.1
type AudioData []byte
AudioData carries optional synthesized audio without making AssistResult non-comparable for existing SDK consumers.
func NewAudioData ¶ added in v0.40.1
type AudioRecorder ¶
AudioRecorder is the hardware abstraction for microphone capture.
type AudioSegment ¶ added in v0.24.0
AudioSegment is a transcribable utterance extracted from a dictation recording. PCM is raw 16kHz S16 mono audio.
func FallbackDictationSegments ¶
func FallbackDictationSegments(fullPCM []byte) []AudioSegment
FallbackDictationSegments wraps all of fullPCM in a single segment. Used when VAD-based segmentation is unavailable or produces no output.
type Capability ¶ added in v0.24.0
type Capability string
Capability is a mode capability declared by a provider profile.
const ( CapabilityTranscription Capability = "transcription" CapabilitySTT Capability = "stt" CapabilityAudioInput Capability = "audio_input" CapabilityLLM Capability = "llm" CapabilityTTS Capability = "tts" CapabilityRealtimeAudio Capability = "realtime_audio" CapabilityPipelineFallback Capability = "pipeline_fallback" CapabilityToolCalling Capability = "tool_calling" CapabilityDictionaryPrompt Capability = "dictionary_prompt" CapabilityDictionaryNativeHints Capability = "dictionary_native_hints" CapabilitySessionSummary Capability = "session_summary" )
func RequiredCapabilities ¶ added in v0.24.0
func RequiredCapabilities(mode Mode, nativeRealtime bool) []Capability
RequiredCapabilities returns the minimum capability set for a profile to satisfy a mode contract.
type Command ¶
type Command struct {
Type CommandType
Text string
NoteID int64
Target string
Metadata map[string]string
}
Command is a request dispatched through the CommandBus.
type CommandBus ¶
CommandBus delivers Command values to the registered handler.
type CommandType ¶
type CommandType string
CommandType identifies the action a Command requests.
const ( CommandShowDashboard CommandType = "dashboard.show" CommandStartDictation CommandType = "dictation.start" CommandStopDictation CommandType = "dictation.stop" CommandStartMode CommandType = "mode.start" CommandStopMode CommandType = "mode.stop" CommandSetActiveMode CommandType = "mode.set_active" CommandOpenQuickNote CommandType = "quicknote.open" CommandOpenQuickCapture CommandType = "quicknote.capture.open" CommandCloseQuickCapture CommandType = "quicknote.capture.close" CommandArmQuickNoteRecording CommandType = "quicknote.record.arm" CommandCopyLastTranscription CommandType = "transcription.copy_last" CommandInsertLastTranscription CommandType = "transcription.insert_last" CommandSummarizeSelection CommandType = "selection.summarize" )
type CommitObserver ¶
type CommitObserver interface {
OnCommit(completion Completion)
}
CommitObserver is notified after each successful TranscriptionRunner.Commit.
type Completion ¶
type Completion struct {
Transcript Transcript
QuickNoteCommitted bool
QuickNoteCreated bool
QuickNoteID int64
TranscriptionPersisted bool
}
Completion describes the outcome of a TranscriptionRunner.Commit call.
type DictationRun ¶ added in v0.24.0
type DictationRun struct {
ID string `json:"id,omitempty"`
Transcript Transcript `json:"transcript"`
StartedAt time.Time `json:"startedAt,omitempty"`
CompletedAt time.Time `json:"completedAt,omitempty"`
ProviderProfile string `json:"providerProfile,omitempty"`
DictionaryTerms []string `json:"dictionaryTerms,omitempty"`
AudioDurationMs int64 `json:"audioDurationMs,omitempty"`
ProcessingTimeMs int64 `json:"processingTimeMs,omitempty"`
}
DictationRun is the public record produced by a completed Dictation request. Hosts may persist it directly or map it into their own history model.
type DictationSegmenter ¶
type DictationSegmenter struct {
// contains filtered or unexported fields
}
DictationSegmenter implements SegmentCollector using VAD-based pause detection to split continuous speech into discrete segments.
func NewDictationSegmenter ¶
func NewDictationSegmenter(detector VoiceActivityDetector, pauseThreshold time.Duration) *DictationSegmenter
func (*DictationSegmenter) CollectStopSegments ¶
func (s *DictationSegmenter) CollectStopSegments(fullPCM []byte) ([]AudioSegment, error)
func (*DictationSegmenter) FeedPCM ¶
func (s *DictationSegmenter) FeedPCM(pcm []byte) error
func (*DictationSegmenter) IdleSince ¶ added in v0.35.21
func (s *DictationSegmenter) IdleSince() time.Time
IdleSince returns the wall-clock time at which the segmenter most recently transitioned out of speech (or, for a fresh session that has not yet seen speech, the construction time). Returns the zero value when speech is currently being captured — the poller treats zero as "user is actively speaking, silence timer should reset."
Satisfies the IdleObserver contract consumed by RecordingController to drive silence-based auto-stop.
type DictationService ¶ added in v0.24.0
type DictationService interface {
Start(context.Context) error
Stop(context.Context) (DictationRun, error)
}
DictationService is the mode-scoped SDK contract for text-only dictation.
type DictationSetting ¶ added in v0.24.0
type DictationSetting struct {
ModeSetting
DictionaryEnabled bool `json:"dictionaryEnabled"`
}
type Engine ¶
type Engine interface {
Start(context.Context) error
Stop(context.Context) error
Events() <-chan Event
Commands() CommandBus
State() Snapshot
}
Engine is the interface implemented by a full SpeechKit voice pipeline.
type Event ¶
type Event struct {
Type EventType
Time time.Time
Message string
Text string
Provider string
Mode string
SessionID string
QuickNote bool
Err error
Shortcut string
Metadata *Metadata
}
Event is a notification published to the event channel returned by Runtime.Events. Consumers should switch on Type and inspect the relevant fields.
type EventType ¶
type EventType string
EventType identifies the kind of event published to the event channel.
const ( EventStateChanged EventType = "state.changed" EventRecordingStarted EventType = "recording.started" EventProcessingStarted EventType = "processing.started" EventTranscriptionReady EventType = "transcription.ready" EventTranscriptCommitted EventType = "transcription.committed" EventQuickNoteModeArmed EventType = "quicknote.mode_armed" EventQuickNoteUpdated EventType = "quicknote.updated" EventWarningRaised EventType = "warning.raised" EventErrorRaised EventType = "error.raised" EventShortcutMatched EventType = "shortcut.matched" EventWakeFired EventType = "wake.fired" EventSkillExecuted EventType = "skill.executed" EventCompanionSessionStarted EventType = "companion.session.started" EventCompanionSessionEnded EventType = "companion.session.ended" EventVoiceAgentTurnFinalized EventType = "voiceagent.turn.finalized" EventTTSStarted EventType = "tts.started" EventTTSFinished EventType = "tts.finished" )
type ExecutionMode ¶ added in v0.24.0
type ExecutionMode string
ExecutionMode describes the technical runtime behind a provider profile.
const ( ExecutionModeLocal ExecutionMode = "local" ExecutionModeSelfHostedHTTP ExecutionMode = "self_hosted_http" ExecutionModeHFRouted ExecutionMode = "hf_routed" ExecutionModeOpenAI ExecutionMode = "openai_api" ExecutionModeGroq ExecutionMode = "groq_api" ExecutionModeGoogle ExecutionMode = "google_api" ExecutionModeOllama ExecutionMode = "ollama_local" ExecutionModeOpenRouter ExecutionMode = "openrouter_api" )
type Hooks ¶
type Hooks struct {
Start func(context.Context) error
Stop func(context.Context) error
HandleCommand func(context.Context, Command) error
}
Hooks are the lifecycle callbacks wired into a Runtime. Nil hooks are silently skipped.
type IdleObserver ¶ added in v0.35.21
IdleObserver is implemented by SegmentCollectors that want to drive silence-based auto-stop. Returning the zero value tells the watcher "user is actively speaking; reset the timer." Returning a non-zero time tells the watcher "user has been silent since T."
type IntelligenceKind ¶ added in v0.24.0
type IntelligenceKind string
IntelligenceKind names the mode-specific intelligence contract.
const ( IntelligenceUser IntelligenceKind = "user" IntelligenceUtility IntelligenceKind = "utility" IntelligenceBrainstorming IntelligenceKind = "brainstorming" // IntelligenceVoiceOutput is the contract for the TTS mode: render // generated text to audio. No user intelligence, no utility tools, // no brainstorming — strictly text in, audio out. IntelligenceVoiceOutput IntelligenceKind = "voice_output" )
type JobSubmitter ¶
type JobSubmitter interface {
Submit(TranscriptionJob) error
}
JobSubmitter accepts a TranscriptionJob for async processing.
type Metadata ¶ added in v0.40.1
Metadata carries optional event key/value data without making Event non-comparable for existing SDK consumers.
func NewMetadata ¶ added in v0.40.1
type Mode ¶ added in v0.24.0
type Mode string
Mode identifies one of SpeechKit's strict product modes.
const ( ModeNone Mode = "none" ModeDictation Mode = "dictation" ModeAssist Mode = "assist" ModeVoiceAgent Mode = "voice_agent" // ModeTTS exposes Text-to-Speech as a first-class model-selection axis // alongside the three product modes. The host activates an Assist or // Voice-Agent session and the TTS profile selected here drives which // provider speaks the response. v0.37 introduced this alongside the // Voice-Companion hands-free flow — Thalia + Companion Live need a // stable place to pin a TTS voice across deployments. ModeTTS Mode = "tts" )
func NormalizeMode ¶ added in v0.24.0
type ModeBehavior ¶ added in v0.24.0
type ModeBehavior string
ModeBehavior describes how much mode-specific intelligence a host enables.
const ( // ModeBehaviorClean keeps a mode on its core contract, such as strict STT // for Dictation or deterministic utility handling for Assist. ModeBehaviorClean ModeBehavior = "clean" // ModeBehaviorIntelligence allows optional intelligence layers such as // LLM utility handling, TTS, summaries, or realtime tool use. ModeBehaviorIntelligence ModeBehavior = "intelligence" )
type ModeContract ¶ added in v0.24.0
type ModeContract struct {
Mode Mode `json:"mode"`
Intelligence IntelligenceKind `json:"intelligence"`
Input string `json:"input"`
Output string `json:"output"`
Allowed []Capability `json:"allowed"`
Forbidden []Capability `json:"forbidden"`
}
ModeContract documents what a mode may and may not do. Hosts can use this to validate custom adapters before exposing them to users.
func DefaultModeContracts ¶ added in v0.24.0
func DefaultModeContracts() []ModeContract
type ModeSetting ¶ added in v0.24.0
type ModeSetting struct {
Enabled bool `json:"enabled"`
Hotkey string `json:"hotkey,omitempty"`
HotkeyBehavior string `json:"hotkeyBehavior,omitempty"`
PrimaryProfileID string `json:"primaryProfileId,omitempty"`
FallbackProfileID string `json:"fallbackProfileId,omitempty"`
// ModeSource is "local" (default) or "server". When "server", this mode
// runs against the speechkit-server pointed to by ServerConnection
// instead of the in-process Framework kernel. Empty/missing is treated
// as "local" for backwards compatibility with pre-0.26 hosts.
ModeSource string `json:"modeSource,omitempty"`
}
ModeSetting is the public per-mode configuration shape used by the SDK and the versioned HTTP control plane.
type ModeSettings ¶ added in v0.24.0
type ModeSettings struct {
Dictation DictationSetting `json:"dictation"`
Assist AssistSetting `json:"assist"`
VoiceAgent VoiceAgentSetting `json:"voiceAgent"`
ServerConnection ServerConnectionSetting `json:"serverConnection"`
}
type ModelVariant ¶ added in v0.24.0
type ModelVariant struct {
ID string `json:"id"`
Name string `json:"name"`
ModelID string `json:"modelId"`
Description string `json:"description,omitempty"`
Recommended bool `json:"recommended,omitempty"`
}
ModelVariant is a concrete model choice inside a provider profile group.
type Persistence ¶
type Persistence interface {
QuickNoteStore
TranscriptionStore
}
Persistence combines QuickNoteStore and TranscriptionStore.
type ProviderKind ¶ added in v0.24.0
type ProviderKind string
ProviderKind is the product-facing provider group shown for every mode.
const ( ProviderKindLocalBuiltIn ProviderKind = "local_built_in" ProviderKindLocalProvider ProviderKind = "local_provider" ProviderKindCloudProvider ProviderKind = "cloud_provider" ProviderKindDirectProvider ProviderKind = "direct_provider" )
func ProviderKindsForMode ¶ added in v0.24.0
func ProviderKindsForMode(mode Mode) []ProviderKind
type ProviderProfile ¶ added in v0.24.0
type ProviderProfile struct {
ID string `json:"id"`
Mode Mode `json:"mode"`
Name string `json:"name"`
ProviderKind ProviderKind `json:"providerKind"`
ExecutionMode ExecutionMode `json:"executionMode,omitempty"`
ModelID string `json:"modelId,omitempty"`
Source string `json:"source,omitempty"`
Description string `json:"description,omitempty"`
License string `json:"license,omitempty"`
Capabilities []Capability `json:"capabilities,omitempty"`
AdapterKind string `json:"adapterKind,omitempty"`
Variants []ModelVariant `json:"variants,omitempty"`
AllowInference bool `json:"inferenceAllowed,omitempty"`
Default bool `json:"default,omitempty"`
Recommended bool `json:"recommended,omitempty"`
Experimental bool `json:"experimental,omitempty"`
}
ProviderProfile is the public catalog entry host applications can present or activate. ProviderKind is the stable user-facing grouping; ExecutionMode is the technical adapter underneath it.
func DefaultProviderProfiles ¶ added in v0.24.0
func DefaultProviderProfiles() []ProviderProfile
DefaultProviderProfiles returns the built-in framework provider catalog for the three strict SpeechKit modes. The Windows desktop host adapts this public catalog into its internal runtime model; the catalog itself belongs to the reusable framework layer.
func FilterProviderProfiles ¶ added in v0.24.0
func FilterProviderProfiles(profiles []ProviderProfile, policy RuntimePolicy) []ProviderProfile
FilterProviderProfiles returns the profiles visible under policy.
func ProfilesForMode ¶ added in v0.24.0
func ProfilesForMode(mode Mode) []ProviderProfile
func (ProviderProfile) HasCapability ¶ added in v0.24.0
func (p ProviderProfile) HasCapability(capability Capability) bool
type QuickNoteStore ¶
type QuickNoteStore interface {
SaveQuickNote(ctx context.Context, text, language, provider string, durationMs, latencyMs int64, audioData []byte) (int64, error)
GetQuickNoteText(ctx context.Context, id int64) (string, error)
UpdateQuickNote(ctx context.Context, id int64, text string) error
UpdateQuickNoteCapture(ctx context.Context, id int64, text, provider string, durationMs, latencyMs int64, audioData []byte) error
}
QuickNoteStore persists and retrieves Quick Note records.
type Readiness ¶ added in v0.24.0
type Readiness struct {
SchemaVersion string `json:"schemaVersion,omitempty"`
ProfileID string `json:"profileId"`
Mode Mode `json:"mode"`
ProviderKind ProviderKind `json:"providerKind"`
ExecutionMode ExecutionMode `json:"executionMode,omitempty"`
ModelID string `json:"modelId,omitempty"`
Source string `json:"source,omitempty"`
Active bool `json:"active"`
Default bool `json:"default"`
Configured bool `json:"configured"`
CredentialsReady bool `json:"credentialsReady"`
RuntimeReady bool `json:"runtimeReady"`
CapabilityReady bool `json:"capabilityReady"`
Ready bool `json:"ready"`
Missing []string `json:"missing,omitempty"`
Requirements []ReadinessRequirement `json:"requirements,omitempty"`
Actions []ReadinessAction `json:"actions,omitempty"`
Artifacts []ReadinessArtifact `json:"artifacts,omitempty"`
}
Readiness describes whether a provider profile can be used right now.
type ReadinessAction ¶ added in v0.24.0
type ReadinessAction struct {
ID string `json:"id"`
Label string `json:"label"`
Kind string `json:"kind"`
Target string `json:"target,omitempty"`
}
ReadinessAction describes the next setup command a host can expose when a requirement is not ready.
type ReadinessArtifact ¶ added in v0.24.0
type ReadinessArtifact struct {
ID string `json:"id"`
Name string `json:"name"`
Kind string `json:"kind"`
SizeLabel string `json:"sizeLabel,omitempty"`
SizeBytes int64 `json:"sizeBytes,omitempty"`
Available bool `json:"available"`
Selected bool `json:"selected"`
RuntimeReady bool `json:"runtimeReady,omitempty"`
RuntimeProblem string `json:"runtimeProblem,omitempty"`
Recommended bool `json:"recommended,omitempty"`
}
ReadinessArtifact describes downloadable or pullable model artifacts tied to a provider profile. Local Built-in profiles use this to expose concrete model choices through the same readiness API as credentials and runtime checks.
type ReadinessRequirement ¶ added in v0.24.0
type ReadinessRequirement struct {
ID string `json:"id"`
Label string `json:"label"`
Category string `json:"category"`
Required bool `json:"required"`
Ready bool `json:"ready"`
Missing string `json:"missing,omitempty"`
}
ReadinessRequirement is a machine-readable setup check for a provider profile. Hosts can render these checks directly instead of hard-coding provider-specific setup rules.
type RecordingController ¶
type RecordingController struct {
// contains filtered or unexported fields
}
RecordingController manages the start/stop lifecycle of a single recording session and hands audio segments to the submission queue.
func NewRecordingController ¶
func NewRecordingController(recorder AudioRecorder, submitter JobSubmitter, observer RecordingObserver, segmenterFactory SegmentCollectorFactory) *RecordingController
func (*RecordingController) IsRecording ¶
func (c *RecordingController) IsRecording() bool
func (*RecordingController) SetIdleWatchInterval ¶ added in v0.35.21
func (c *RecordingController) SetIdleWatchInterval(d time.Duration)
SetIdleWatchInterval overrides the polling interval used by the silence-based auto-stop watcher. Tests use this to keep the unit tests fast (e.g. 5ms polling). Production should never touch this.
func (*RecordingController) Start ¶
func (c *RecordingController) Start(opts RecordingStartOptions) error
func (*RecordingController) Stop ¶
func (c *RecordingController) Stop(opts RecordingStopOptions) error
type RecordingObserver ¶
type RecordingStartOptions ¶
type RecordingStartOptions struct {
Label string
Target any
Language string
QuickNote bool
QuickNoteID int64
// IdleTimeout, when greater than zero AND the underlying collector
// implements [IdleObserver], arms a watcher that calls
// OnIdleTimeoutCallback once the user has been silent for this long.
// Zero (default) disables the watcher — typical for hold-to-talk
// hotkey sessions that already terminate on KeyUp.
IdleTimeout time.Duration
// OnIdleTimeoutCallback fires once if IdleTimeout elapses without
// observed speech. Wired by the host to dispatch a Stop command so
// the dictate session ends after a silence window. The watcher
// guarantees at-most-one invocation per Start() call.
OnIdleTimeoutCallback func()
}
type RecordingStopOptions ¶
type RecordingStopOptions struct {
Label string
}
type Runtime ¶
type Runtime struct {
// contains filtered or unexported fields
}
Runtime manages shared observable state and event delivery for a SpeechKit session. Create one with NewRuntime and wire it into the host application via Runtime.Events and Runtime.Commands.
func NewRuntime ¶
func (*Runtime) Commands ¶
func (r *Runtime) Commands() CommandBus
func (*Runtime) UpdateState ¶
type RuntimePolicy ¶ added in v0.24.0
type RuntimePolicy struct {
EnabledModes []Mode `json:"enabledModes,omitempty"`
AllowedProfiles []string `json:"allowedProfiles,omitempty"`
FixedProfiles map[Mode]string `json:"fixedProfiles,omitempty"`
AllowFallbacks bool `json:"allowFallbacks,omitempty"`
ModeBehaviors map[Mode]ModeBehavior `json:"modeBehaviors,omitempty"`
}
RuntimePolicy constrains which parts of the SpeechKit framework a host application exposes. Empty EnabledModes or AllowedProfiles mean "all".
type SegmentCollector ¶
type SegmentCollector interface {
FeedPCM([]byte) error
CollectStopSegments(fullPCM []byte) ([]AudioSegment, error)
}
SegmentCollector accumulates real-time PCM frames and splits them into dictation segments when recording stops.
type SegmentCollectorFactory ¶
type SegmentCollectorFactory func() SegmentCollector
type ServerConnectionSetting ¶ added in v0.26.0
type ServerConnectionSetting struct {
Enabled bool `json:"enabled"`
ActiveTargetID string `json:"activeTargetId,omitempty"`
URL string `json:"url"`
BearerTokenEnv string `json:"bearerTokenEnv,omitempty"`
AuthMode string `json:"authMode,omitempty"`
BearerTokenSet bool `json:"bearerTokenSet"`
FallbackToLocal bool `json:"fallbackToLocal"`
RequestTimeoutSec int `json:"requestTimeoutSec"`
Targets []ServerConnectionTarget `json:"targets,omitempty"`
}
ServerConnectionSetting exposes the [server_connection] config section to the control-plane API + frontend. The bearer token is never sent across this boundary — only the env var name + connection metadata.
type ServerConnectionTarget ¶ added in v0.31.0
type ServerConnectionTarget struct {
ID string `json:"id"`
Label string `json:"label"`
URL string `json:"url"`
AuthMode string `json:"authMode"`
BearerTokenEnv string `json:"bearerTokenEnv,omitempty"`
BearerTokenSet bool `json:"bearerTokenSet"`
FallbackToLocal bool `json:"fallbackToLocal"`
RequestTimeoutSec int `json:"requestTimeoutSec"`
}
type Snapshot ¶
type Snapshot struct {
Status string
Text string
Level float64
Hotkey string
ActiveMode string
Providers []string
ActiveProfiles map[string]string
Transcriptions int
QuickNoteMode bool
QuickCaptureMode bool
LastTranscriptionText string
}
Snapshot is a point-in-time copy of the Runtime's observable state. All slice and map fields are safe to read without holding any lock.
type Submission ¶
type Submission struct {
PCM []byte
WAV []byte
DurationSecs float64
Language string
Prefix string
QuickNote bool
QuickNoteID int64
}
Submission carries a single audio segment and its metadata into the transcription pipeline.
type Transcriber ¶
type Transcriber interface {
Transcribe(ctx context.Context, audio []byte, durationSecs float64, language string) (Transcript, error)
}
Transcriber converts raw WAV audio into a Transcript.
type Transcript ¶
type Transcript struct {
Text string
Language string
Duration time.Duration
Provider string
Model string
Confidence float64
}
Transcript holds the result of a single transcription call.
type TranscriptInterceptor ¶
type TranscriptInterceptor interface {
Intercept(ctx context.Context, transcript Transcript, target any) (bool, error)
}
TranscriptInterceptor can handle a transcript before it reaches the normal output path. Return (true, nil) to signal that the transcript was consumed.
type TranscriptOutput ¶
type TranscriptOutput interface {
Deliver(ctx context.Context, transcript Transcript, target any) error
}
TranscriptOutput delivers a completed Transcript to the host application (e.g. clipboard injection or text-field paste).
type TranscriptionJob ¶
type TranscriptionJob struct {
Submission
Target any
}
TranscriptionJob pairs a Submission with its delivery target.
func (TranscriptionJob) Clone ¶
func (j TranscriptionJob) Clone() TranscriptionJob
type TranscriptionObserver ¶
type TranscriptionObserver interface {
OnState(status, text string)
OnLog(message, kind string)
OnTranscriptCommitted(transcript Transcript, quickNote bool)
}
TranscriptionObserver receives real-time status and log updates from a TranscriptionWorker during processing.
type TranscriptionRunner ¶
type TranscriptionRunner struct {
// contains filtered or unexported fields
}
TranscriptionRunner transcribes audio submissions and persists results. Create one with NewTranscriptionRunner.
func NewTranscriptionRunner ¶
func NewTranscriptionRunner(transcriber Transcriber, store Persistence) *TranscriptionRunner
NewTranscriptionRunner creates a TranscriptionRunner backed by the given transcriber and persistence store. Either argument may be nil.
func (*TranscriptionRunner) Commit ¶
func (r *TranscriptionRunner) Commit(ctx context.Context, submission Submission, transcript Transcript) (Completion, error)
func (*TranscriptionRunner) WithObserver ¶
func (r *TranscriptionRunner) WithObserver(observer CommitObserver) *TranscriptionRunner
type TranscriptionStore ¶
type TranscriptionStore interface {
SaveTranscription(ctx context.Context, text, language, provider, model string, durationMs, latencyMs int64, audioData []byte) error
}
TranscriptionStore persists completed dictation transcriptions.
type TranscriptionWorker ¶
type TranscriptionWorker struct {
// contains filtered or unexported fields
}
TranscriptionWorker processes TranscriptionJob values from an internal queue on a single goroutine. Start it with TranscriptionWorker.Start and submit work with TranscriptionWorker.Submit.
func NewTranscriptionWorker ¶
func NewTranscriptionWorker(cfg TranscriptionWorkerConfig) (*TranscriptionWorker, error)
func (*TranscriptionWorker) Close ¶
func (w *TranscriptionWorker) Close()
func (*TranscriptionWorker) Start ¶
func (w *TranscriptionWorker) Start(ctx context.Context)
func (*TranscriptionWorker) Submit ¶
func (w *TranscriptionWorker) Submit(job TranscriptionJob) error
func (*TranscriptionWorker) Wait ¶
func (w *TranscriptionWorker) Wait()
type TranscriptionWorkerConfig ¶
type TranscriptionWorkerConfig struct {
Timeout time.Duration
QueueSize int
Runner *TranscriptionRunner
Output TranscriptOutput
Interceptor TranscriptInterceptor
Observer TranscriptionObserver
}
TranscriptionWorkerConfig configures a TranscriptionWorker. Runner is required; all other fields are optional.
type VoiceActivityDetector ¶ added in v0.24.0
VoiceActivityDetector is the public VAD contract consumed by DictationSegmenter. It intentionally matches SpeechKit's internal Silero detector shape without exposing internal packages.
type VoiceAgentService ¶ added in v0.24.0
type VoiceAgentService interface {
Start(context.Context) error
Stop(context.Context) (VoiceAgentSession, error)
SendText(context.Context, string) error
CurrentSession(context.Context) (VoiceAgentSession, error)
}
VoiceAgentService is the mode-scoped SDK contract for realtime dialogue.
type VoiceAgentSession ¶ added in v0.24.0
type VoiceAgentSession struct {
ID string `json:"id,omitempty"`
StartedAt time.Time `json:"startedAt,omitempty"`
EndedAt time.Time `json:"endedAt,omitempty"`
Locale string `json:"locale,omitempty"`
ProviderProfileID string `json:"providerProfileId,omitempty"`
RuntimeKind string `json:"runtimeKind,omitempty"`
Turns []VoiceAgentTurn `json:"turns,omitempty"`
Summary VoiceAgentSessionSummary `json:"summary"`
}
VoiceAgentSession is the public record for a live dialogue.
type VoiceAgentSessionSummary ¶ added in v0.24.0
type VoiceAgentSessionSummary struct {
Title string `json:"title,omitempty"`
Summary string `json:"summary"`
Ideas []string `json:"ideas,omitempty"`
Decisions []string `json:"decisions,omitempty"`
OpenQuestions []string `json:"openQuestions,omitempty"`
NextSteps []string `json:"nextSteps,omitempty"`
RawText string `json:"rawText,omitempty"`
}
VoiceAgentSessionSummary is the structured handoff produced when a Voice Agent session ends.
type VoiceAgentSetting ¶ added in v0.24.0
type VoiceAgentSetting struct {
ModeSetting
SessionSummary bool `json:"sessionSummary"`
PipelineFallback bool `json:"pipelineFallback"`
CloseBehavior string `json:"closeBehavior,omitempty"`
AgentProfileID string `json:"agentProfileId,omitempty"`
AgentSequenceID string `json:"agentSequenceId,omitempty"`
}
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts.
|
Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts. |
|
Package assist provides an embeddable Assist Mode service.
|
Package assist provides an embeddable Assist Mode service. |
|
genkitadapter
Package genkitadapter keeps Genkit-specific Assist wiring out of the core public assist package.
|
Package genkitadapter keeps Genkit-specific Assist wiring out of the core public assist package. |
|
Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment).
|
Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment). |
|
Package companion provides small composers for hands-free SpeechKit hosts.
|
Package companion provides small composers for hands-free SpeechKit hosts. |
|
Package dictation provides an embeddable strict Dictation runtime.
|
Package dictation provides an embeddable strict Dictation runtime. |
|
Package lifecycle owns mode start/stop orchestration and refcounted shared dependencies for SpeechKit hosts.
|
Package lifecycle owns mode start/stop orchestration and refcounted shared dependencies for SpeechKit hosts. |
|
Package tts exposes the embeddable SpeechKit text-to-speech surface.
|
Package tts exposes the embeddable SpeechKit text-to-speech surface. |
|
Package voiceagent provides an embeddable Voice Agent service.
|
Package voiceagent provides an embeddable Voice Agent service. |
|
live
Package live exposes the low-level Voice Agent realtime-protocol types.
|
Package live exposes the low-level Voice Agent realtime-protocol types. |
|
Package wakeword exposes embeddable SpeechKit wake-word contracts.
|
Package wakeword exposes embeddable SpeechKit wake-word contracts. |
|
sherpa
Package sherpa exposes the sherpa-onnx wake-word detector adapter.
|
Package sherpa exposes the sherpa-onnx wake-word detector adapter. |