Documentation
¶
Overview ¶
Package speechkit provides the public SDK for embedding SpeechKit voice capture, transcription, and assist/voice-agent pipelines into host applications.
Surface ¶
The kernel exposes three strict modes:
- Dictation — speech to text only, no AI rewriting.
- Assist — speech (or text) to a one-shot result, with optional TTS.
- Voice Agent — realtime audio-to-audio dialogue.
Each mode is constructed via a small subpackage so host apps depend only on what they use:
- github.com/kombifyio/SpeechKit/pkg/speechkit/dictation
- github.com/kombifyio/SpeechKit/pkg/speechkit/assist
- github.com/kombifyio/SpeechKit/pkg/speechkit/voiceagent
- github.com/kombifyio/SpeechKit/pkg/speechkit/agentkit (tool registry, session memory, lifecycle hooks for Voice Agent hosts)
- github.com/kombifyio/SpeechKit/pkg/speechkit/client (HTTP client for talking to a remote SpeechKit Server)
Central types in this package ¶
Runtime owns shared state and the event channel that host apps read from. Engine is the full voice pipeline; RecordingController and TranscriptionWorker can be composed independently for custom pipelines. [Catalog] exposes the provider/model/mode metadata for setup UIs and readiness checks; RuntimePolicy lets the host pin profiles and gate fallbacks.
Stability ¶
pkg/speechkit is the OSS public surface. Symbols here follow semver from v1.0 onward. Before v1.0 the surface may still evolve — see CHANGELOG.md and the release notes for breaking-change calls.
Package-level documentation lives in doc.go.
Index ¶
- Constants
- Variables
- func PCMDurationSecs(pcm []byte) float64
- func PCMToWAV(pcm []byte) []byte
- func ValidateDefaultCatalog() error
- func ValidateModeSettingsForPolicy(profiles []ProviderProfile, settings ModeSettings, policy RuntimePolicy) error
- func ValidateProfileForMode(profile ProviderProfile, mode Mode) error
- func ValidateRuntimePolicy(profiles []ProviderProfile, policy RuntimePolicy) error
- type AssistRequest
- type AssistResult
- type AssistService
- type AssistSetting
- type AssistSurfaceDecision
- type AudioRecorder
- type AudioSegment
- type Capability
- type Command
- type CommandBus
- type CommandType
- type CommitObserver
- type Completion
- type DictationRun
- type DictationSegmenter
- type DictationService
- type DictationSetting
- type Engine
- type Event
- type EventType
- type ExecutionMode
- type Hooks
- type IdleObserver
- type IntelligenceKind
- type JobSubmitter
- type Mode
- type ModeBehavior
- type ModeContract
- type ModeSetting
- type ModeSettings
- type ModelVariant
- type Persistence
- type ProviderKind
- type ProviderProfile
- type QuickNoteStore
- type Readiness
- type ReadinessAction
- type ReadinessArtifact
- type ReadinessRequirement
- type RecordingController
- type RecordingObserver
- type RecordingStartOptions
- type RecordingStopOptions
- type Runtime
- func (r *Runtime) Close()
- func (r *Runtime) Commands() CommandBus
- func (r *Runtime) Events() <-chan Event
- func (r *Runtime) Publish(event Event) bool
- func (r *Runtime) SetState(snapshot Snapshot)
- func (r *Runtime) Start(ctx context.Context) error
- func (r *Runtime) State() Snapshot
- func (r *Runtime) Stop(ctx context.Context) error
- func (r *Runtime) UpdateState(update func(*Snapshot)) Snapshot
- type RuntimePolicy
- type SegmentCollector
- type SegmentCollectorFactory
- type ServerConnectionSetting
- type ServerConnectionTarget
- type Snapshot
- type Submission
- type Transcriber
- type Transcript
- type TranscriptInterceptor
- type TranscriptOutput
- type TranscriptionJob
- type TranscriptionObserver
- type TranscriptionRunner
- type TranscriptionStore
- type TranscriptionWorker
- type TranscriptionWorkerConfig
- type VoiceActivityDetector
- type VoiceAgentService
- type VoiceAgentSession
- type VoiceAgentSessionSummary
- type VoiceAgentSetting
- type VoiceAgentTurn
Constants ¶
const ( AudioSampleRate = 16000 AudioChannels = 1 AudioBitsPerSample = 16 AudioBytesPerSample = AudioBitsPerSample / 8 )
const ( DefaultDictationMinSegment = 1200 * time.Millisecond DefaultDictationPadding = 160 * time.Millisecond DefaultDictationOverlap = 200 * time.Millisecond )
const DefaultLocalBuiltInLLMModel = "ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M"
const DefaultMinPCMBytes = 3200
const DefaultProcessingMessage = "Recording stopped · Transcribing"
const ReadinessSchemaVersion = "provider-readiness.v1"
Variables ¶
var ( ErrMissingRunner = errors.New("speechkit: transcription worker requires a runner") ErrMissingTranscriber = errors.New("speechkit: transcription runner requires a transcriber") ErrWorkerClosed = errors.New("speechkit: transcription worker is closed") ErrWorkerQueueFull = errors.New("speechkit: transcription worker queue is full") )
ErrCommandHandlerUnavailable is returned by [CommandBus.Dispatch] when no command handler has been configured on the Runtime.
Functions ¶
func PCMDurationSecs ¶ added in v0.24.0
PCMDurationSecs returns the duration of 16kHz S16 mono PCM audio in seconds.
func ValidateDefaultCatalog ¶ added in v0.24.0
func ValidateDefaultCatalog() error
ValidateDefaultCatalog verifies the framework invariant that every strict mode exposes all four provider groups and every visible profile satisfies its mode contract.
func ValidateModeSettingsForPolicy ¶ added in v0.24.0
func ValidateModeSettingsForPolicy(profiles []ProviderProfile, settings ModeSettings, policy RuntimePolicy) error
ValidateModeSettingsForPolicy checks mode selections against a RuntimePolicy.
func ValidateProfileForMode ¶ added in v0.24.0
func ValidateProfileForMode(profile ProviderProfile, mode Mode) error
ValidateProfileForMode checks the stable v23 mode capability contract.
func ValidateRuntimePolicy ¶ added in v0.24.0
func ValidateRuntimePolicy(profiles []ProviderProfile, policy RuntimePolicy) error
ValidateRuntimePolicy checks that a policy references existing profiles and does not require a profile that violates its mode contract.
Types ¶
type AssistRequest ¶ added in v0.24.0
type AssistRequest struct {
Text string `json:"text"`
Locale string `json:"locale,omitempty"`
Selection string `json:"selection,omitempty"`
Context string `json:"context,omitempty"`
EditableTarget bool `json:"editableTarget,omitempty"`
ProviderProfileID string `json:"providerProfileId,omitempty"`
}
AssistRequest is the mode-scoped input for Assist integrations.
type AssistResult ¶ added in v0.24.0
type AssistResult struct {
Text string `json:"text"`
SpeakText string `json:"speakText,omitempty"`
Action string `json:"action,omitempty"`
Kind string `json:"kind,omitempty"`
Surface AssistSurfaceDecision `json:"surface"`
ShortcutID string `json:"shortcutId,omitempty"`
Locale string `json:"locale,omitempty"`
}
AssistResult is the public one-shot output contract for Assist Mode.
type AssistService ¶ added in v0.24.0
type AssistService interface {
Process(context.Context, AssistRequest) (AssistResult, error)
}
AssistService is the mode-scoped SDK contract for one-shot utilities and work-product generation.
type AssistSetting ¶ added in v0.24.0
type AssistSetting struct {
ModeSetting
TTSEnabled bool `json:"ttsEnabled"`
UtilityRegistry string `json:"utilityRegistry,omitempty"`
}
type AssistSurfaceDecision ¶ added in v0.24.0
type AssistSurfaceDecision string
AssistSurfaceDecision describes where an Assist result should be presented.
const ( AssistSurfacePanel AssistSurfaceDecision = "panel" AssistSurfaceInsert AssistSurfaceDecision = "insert" AssistSurfaceReplace AssistSurfaceDecision = "replace" AssistSurfaceActionAck AssistSurfaceDecision = "action_ack" AssistSurfaceSilent AssistSurfaceDecision = "silent" )
type AudioRecorder ¶
AudioRecorder is the hardware abstraction for microphone capture.
type AudioSegment ¶ added in v0.24.0
AudioSegment is a transcribable utterance extracted from a dictation recording. PCM is raw 16kHz S16 mono audio.
func FallbackDictationSegments ¶
func FallbackDictationSegments(fullPCM []byte) []AudioSegment
FallbackDictationSegments wraps all of fullPCM in a single segment. Used when VAD-based segmentation is unavailable or produces no output.
type Capability ¶ added in v0.24.0
type Capability string
Capability is a mode capability declared by a provider profile.
const ( CapabilityTranscription Capability = "transcription" CapabilitySTT Capability = "stt" CapabilityAudioInput Capability = "audio_input" CapabilityLLM Capability = "llm" CapabilityTTS Capability = "tts" CapabilityRealtimeAudio Capability = "realtime_audio" CapabilityPipelineFallback Capability = "pipeline_fallback" CapabilityToolCalling Capability = "tool_calling" CapabilityDictionaryPrompt Capability = "dictionary_prompt" CapabilityDictionaryNativeHints Capability = "dictionary_native_hints" CapabilitySessionSummary Capability = "session_summary" )
func RequiredCapabilities ¶ added in v0.24.0
func RequiredCapabilities(mode Mode, nativeRealtime bool) []Capability
RequiredCapabilities returns the minimum capability set for a profile to satisfy a mode contract.
type Command ¶
type Command struct {
Type CommandType
Text string
NoteID int64
Target string
Metadata map[string]string
}
Command is a request dispatched through the CommandBus.
type CommandBus ¶
CommandBus delivers Command values to the registered handler.
type CommandType ¶
type CommandType string
CommandType identifies the action a Command requests.
const ( CommandShowDashboard CommandType = "dashboard.show" CommandStartDictation CommandType = "dictation.start" CommandStopDictation CommandType = "dictation.stop" CommandStartMode CommandType = "mode.start" CommandStopMode CommandType = "mode.stop" CommandSetActiveMode CommandType = "mode.set_active" CommandOpenQuickNote CommandType = "quicknote.open" CommandOpenQuickCapture CommandType = "quicknote.capture.open" CommandCloseQuickCapture CommandType = "quicknote.capture.close" CommandArmQuickNoteRecording CommandType = "quicknote.record.arm" CommandCopyLastTranscription CommandType = "transcription.copy_last" CommandInsertLastTranscription CommandType = "transcription.insert_last" CommandSummarizeSelection CommandType = "selection.summarize" )
type CommitObserver ¶
type CommitObserver interface {
OnCommit(completion Completion)
}
CommitObserver is notified after each successful TranscriptionRunner.Commit.
type Completion ¶
type Completion struct {
Transcript Transcript
QuickNoteCommitted bool
QuickNoteCreated bool
QuickNoteID int64
TranscriptionPersisted bool
}
Completion describes the outcome of a TranscriptionRunner.Commit call.
type DictationRun ¶ added in v0.24.0
type DictationRun struct {
ID string `json:"id,omitempty"`
Transcript Transcript `json:"transcript"`
StartedAt time.Time `json:"startedAt,omitempty"`
CompletedAt time.Time `json:"completedAt,omitempty"`
ProviderProfile string `json:"providerProfile,omitempty"`
DictionaryTerms []string `json:"dictionaryTerms,omitempty"`
AudioDurationMs int64 `json:"audioDurationMs,omitempty"`
ProcessingTimeMs int64 `json:"processingTimeMs,omitempty"`
}
DictationRun is the public record produced by a completed Dictation request. Hosts may persist it directly or map it into their own history model.
type DictationSegmenter ¶
type DictationSegmenter struct {
// contains filtered or unexported fields
}
DictationSegmenter implements SegmentCollector using VAD-based pause detection to split continuous speech into discrete segments.
func NewDictationSegmenter ¶
func NewDictationSegmenter(detector VoiceActivityDetector, pauseThreshold time.Duration) *DictationSegmenter
func (*DictationSegmenter) CollectStopSegments ¶
func (s *DictationSegmenter) CollectStopSegments(fullPCM []byte) ([]AudioSegment, error)
func (*DictationSegmenter) FeedPCM ¶
func (s *DictationSegmenter) FeedPCM(pcm []byte) error
func (*DictationSegmenter) IdleSince ¶ added in v0.35.21
func (s *DictationSegmenter) IdleSince() time.Time
IdleSince returns the wall-clock time at which the segmenter most recently transitioned out of speech (or, for a fresh session that has not yet seen speech, the construction time). Returns the zero value when speech is currently being captured — the poller treats zero as "user is actively speaking, silence timer should reset."
Satisfies the IdleObserver contract consumed by RecordingController to drive silence-based auto-stop.
type DictationService ¶ added in v0.24.0
type DictationService interface {
Start(context.Context) error
Stop(context.Context) (DictationRun, error)
}
DictationService is the mode-scoped SDK contract for text-only dictation.
type DictationSetting ¶ added in v0.24.0
type DictationSetting struct {
ModeSetting
DictionaryEnabled bool `json:"dictionaryEnabled"`
}
type Engine ¶
type Engine interface {
Start(context.Context) error
Stop(context.Context) error
Events() <-chan Event
Commands() CommandBus
State() Snapshot
}
Engine is the interface implemented by a full SpeechKit voice pipeline.
type Event ¶
type Event struct {
Type EventType
Time time.Time
Message string
Text string
Provider string
QuickNote bool
Err error
Shortcut string
}
Event is a notification published to the event channel returned by Runtime.Events. Consumers should switch on Type and inspect the relevant fields.
type EventType ¶
type EventType string
EventType identifies the kind of event published to the event channel.
const ( EventStateChanged EventType = "state.changed" EventRecordingStarted EventType = "recording.started" EventProcessingStarted EventType = "processing.started" EventTranscriptionReady EventType = "transcription.ready" EventTranscriptCommitted EventType = "transcription.committed" EventQuickNoteModeArmed EventType = "quicknote.mode_armed" EventQuickNoteUpdated EventType = "quicknote.updated" EventWarningRaised EventType = "warning.raised" EventErrorRaised EventType = "error.raised" EventShortcutMatched EventType = "shortcut.matched" )
type ExecutionMode ¶ added in v0.24.0
type ExecutionMode string
ExecutionMode describes the technical runtime behind a provider profile.
const ( ExecutionModeLocal ExecutionMode = "local" ExecutionModeSelfHostedHTTP ExecutionMode = "self_hosted_http" ExecutionModeHFRouted ExecutionMode = "hf_routed" ExecutionModeOpenAI ExecutionMode = "openai_api" ExecutionModeGroq ExecutionMode = "groq_api" ExecutionModeGoogle ExecutionMode = "google_api" ExecutionModeOllama ExecutionMode = "ollama_local" ExecutionModeOpenRouter ExecutionMode = "openrouter_api" )
type Hooks ¶
type Hooks struct {
Start func(context.Context) error
Stop func(context.Context) error
HandleCommand func(context.Context, Command) error
}
Hooks are the lifecycle callbacks wired into a Runtime. Nil hooks are silently skipped.
type IdleObserver ¶ added in v0.35.21
IdleObserver is implemented by SegmentCollectors that want to drive silence-based auto-stop. Returning the zero value tells the watcher "user is actively speaking; reset the timer." Returning a non-zero time tells the watcher "user has been silent since T."
type IntelligenceKind ¶ added in v0.24.0
type IntelligenceKind string
IntelligenceKind names the mode-specific intelligence contract.
const ( IntelligenceUser IntelligenceKind = "user" IntelligenceUtility IntelligenceKind = "utility" IntelligenceBrainstorming IntelligenceKind = "brainstorming" )
type JobSubmitter ¶
type JobSubmitter interface {
Submit(TranscriptionJob) error
}
JobSubmitter accepts a TranscriptionJob for async processing.
type Mode ¶ added in v0.24.0
type Mode string
Mode identifies one of SpeechKit's strict product modes.
func NormalizeMode ¶ added in v0.24.0
type ModeBehavior ¶ added in v0.24.0
type ModeBehavior string
ModeBehavior describes how much mode-specific intelligence a host enables.
const ( // ModeBehaviorClean keeps a mode on its core contract, such as strict STT // for Dictation or deterministic utility handling for Assist. ModeBehaviorClean ModeBehavior = "clean" // ModeBehaviorIntelligence allows optional intelligence layers such as // LLM utility handling, TTS, summaries, or realtime tool use. ModeBehaviorIntelligence ModeBehavior = "intelligence" )
type ModeContract ¶ added in v0.24.0
type ModeContract struct {
Mode Mode `json:"mode"`
Intelligence IntelligenceKind `json:"intelligence"`
Input string `json:"input"`
Output string `json:"output"`
Allowed []Capability `json:"allowed"`
Forbidden []Capability `json:"forbidden"`
}
ModeContract documents what a mode may and may not do. Hosts can use this to validate custom adapters before exposing them to users.
func DefaultModeContracts ¶ added in v0.24.0
func DefaultModeContracts() []ModeContract
type ModeSetting ¶ added in v0.24.0
type ModeSetting struct {
Enabled bool `json:"enabled"`
Hotkey string `json:"hotkey,omitempty"`
HotkeyBehavior string `json:"hotkeyBehavior,omitempty"`
PrimaryProfileID string `json:"primaryProfileId,omitempty"`
FallbackProfileID string `json:"fallbackProfileId,omitempty"`
// ModeSource is "local" (default) or "server". When "server", this mode
// runs against the speechkit-server pointed to by ServerConnection
// instead of the in-process Framework kernel. Empty/missing is treated
// as "local" for backwards compatibility with pre-0.26 hosts.
ModeSource string `json:"modeSource,omitempty"`
}
ModeSetting is the public per-mode configuration shape used by the SDK and the versioned HTTP control plane.
type ModeSettings ¶ added in v0.24.0
type ModeSettings struct {
Dictation DictationSetting `json:"dictation"`
Assist AssistSetting `json:"assist"`
VoiceAgent VoiceAgentSetting `json:"voiceAgent"`
ServerConnection ServerConnectionSetting `json:"serverConnection"`
}
type ModelVariant ¶ added in v0.24.0
type ModelVariant struct {
ID string `json:"id"`
Name string `json:"name"`
ModelID string `json:"modelId"`
Description string `json:"description,omitempty"`
Recommended bool `json:"recommended,omitempty"`
}
ModelVariant is a concrete model choice inside a provider profile group.
type Persistence ¶
type Persistence interface {
QuickNoteStore
TranscriptionStore
}
Persistence combines QuickNoteStore and TranscriptionStore.
type ProviderKind ¶ added in v0.24.0
type ProviderKind string
ProviderKind is the product-facing provider group shown for every mode.
const ( ProviderKindLocalBuiltIn ProviderKind = "local_built_in" ProviderKindLocalProvider ProviderKind = "local_provider" ProviderKindCloudProvider ProviderKind = "cloud_provider" ProviderKindDirectProvider ProviderKind = "direct_provider" )
func ProviderKindsForMode ¶ added in v0.24.0
func ProviderKindsForMode(mode Mode) []ProviderKind
type ProviderProfile ¶ added in v0.24.0
type ProviderProfile struct {
ID string `json:"id"`
Mode Mode `json:"mode"`
Name string `json:"name"`
ProviderKind ProviderKind `json:"providerKind"`
ExecutionMode ExecutionMode `json:"executionMode,omitempty"`
ModelID string `json:"modelId,omitempty"`
Source string `json:"source,omitempty"`
Description string `json:"description,omitempty"`
License string `json:"license,omitempty"`
Capabilities []Capability `json:"capabilities,omitempty"`
AdapterKind string `json:"adapterKind,omitempty"`
Variants []ModelVariant `json:"variants,omitempty"`
AllowInference bool `json:"inferenceAllowed,omitempty"`
Default bool `json:"default,omitempty"`
Recommended bool `json:"recommended,omitempty"`
Experimental bool `json:"experimental,omitempty"`
}
ProviderProfile is the public catalog entry host applications can present or activate. ProviderKind is the stable user-facing grouping; ExecutionMode is the technical adapter underneath it.
func DefaultProviderProfiles ¶ added in v0.24.0
func DefaultProviderProfiles() []ProviderProfile
DefaultProviderProfiles returns the built-in framework provider catalog for the three strict SpeechKit modes. The Windows desktop host adapts this public catalog into its internal runtime model; the catalog itself belongs to the reusable framework layer.
func FilterProviderProfiles ¶ added in v0.24.0
func FilterProviderProfiles(profiles []ProviderProfile, policy RuntimePolicy) []ProviderProfile
FilterProviderProfiles returns the profiles visible under policy.
func ProfilesForMode ¶ added in v0.24.0
func ProfilesForMode(mode Mode) []ProviderProfile
func (ProviderProfile) HasCapability ¶ added in v0.24.0
func (p ProviderProfile) HasCapability(capability Capability) bool
type QuickNoteStore ¶
type QuickNoteStore interface {
SaveQuickNote(ctx context.Context, text, language, provider string, durationMs, latencyMs int64, audioData []byte) (int64, error)
GetQuickNoteText(ctx context.Context, id int64) (string, error)
UpdateQuickNote(ctx context.Context, id int64, text string) error
UpdateQuickNoteCapture(ctx context.Context, id int64, text, provider string, durationMs, latencyMs int64, audioData []byte) error
}
QuickNoteStore persists and retrieves Quick Note records.
type Readiness ¶ added in v0.24.0
type Readiness struct {
SchemaVersion string `json:"schemaVersion,omitempty"`
ProfileID string `json:"profileId"`
Mode Mode `json:"mode"`
ProviderKind ProviderKind `json:"providerKind"`
ExecutionMode ExecutionMode `json:"executionMode,omitempty"`
ModelID string `json:"modelId,omitempty"`
Source string `json:"source,omitempty"`
Active bool `json:"active"`
Default bool `json:"default"`
Configured bool `json:"configured"`
CredentialsReady bool `json:"credentialsReady"`
RuntimeReady bool `json:"runtimeReady"`
CapabilityReady bool `json:"capabilityReady"`
Ready bool `json:"ready"`
Missing []string `json:"missing,omitempty"`
Requirements []ReadinessRequirement `json:"requirements,omitempty"`
Actions []ReadinessAction `json:"actions,omitempty"`
Artifacts []ReadinessArtifact `json:"artifacts,omitempty"`
}
Readiness describes whether a provider profile can be used right now.
type ReadinessAction ¶ added in v0.24.0
type ReadinessAction struct {
ID string `json:"id"`
Label string `json:"label"`
Kind string `json:"kind"`
Target string `json:"target,omitempty"`
}
ReadinessAction describes the next setup command a host can expose when a requirement is not ready.
type ReadinessArtifact ¶ added in v0.24.0
type ReadinessArtifact struct {
ID string `json:"id"`
Name string `json:"name"`
Kind string `json:"kind"`
SizeLabel string `json:"sizeLabel,omitempty"`
SizeBytes int64 `json:"sizeBytes,omitempty"`
Available bool `json:"available"`
Selected bool `json:"selected"`
RuntimeReady bool `json:"runtimeReady,omitempty"`
RuntimeProblem string `json:"runtimeProblem,omitempty"`
Recommended bool `json:"recommended,omitempty"`
}
ReadinessArtifact describes downloadable or pullable model artifacts tied to a provider profile. Local Built-in profiles use this to expose concrete model choices through the same readiness API as credentials and runtime checks.
type ReadinessRequirement ¶ added in v0.24.0
type ReadinessRequirement struct {
ID string `json:"id"`
Label string `json:"label"`
Category string `json:"category"`
Required bool `json:"required"`
Ready bool `json:"ready"`
Missing string `json:"missing,omitempty"`
}
ReadinessRequirement is a machine-readable setup check for a provider profile. Hosts can render these checks directly instead of hard-coding provider-specific setup rules.
type RecordingController ¶
type RecordingController struct {
// contains filtered or unexported fields
}
RecordingController manages the start/stop lifecycle of a single recording session and hands audio segments to the submission queue.
func NewRecordingController ¶
func NewRecordingController(recorder AudioRecorder, submitter JobSubmitter, observer RecordingObserver, segmenterFactory SegmentCollectorFactory) *RecordingController
func (*RecordingController) IsRecording ¶
func (c *RecordingController) IsRecording() bool
func (*RecordingController) SetIdleWatchInterval ¶ added in v0.35.21
func (c *RecordingController) SetIdleWatchInterval(d time.Duration)
SetIdleWatchInterval overrides the polling interval used by the silence-based auto-stop watcher. Tests use this to keep the unit tests fast (e.g. 5ms polling). Production should never touch this.
func (*RecordingController) Start ¶
func (c *RecordingController) Start(opts RecordingStartOptions) error
func (*RecordingController) Stop ¶
func (c *RecordingController) Stop(opts RecordingStopOptions) error
type RecordingObserver ¶
type RecordingStartOptions ¶
type RecordingStartOptions struct {
Label string
Target any
Language string
QuickNote bool
QuickNoteID int64
// IdleTimeout, when greater than zero AND the underlying collector
// implements [IdleObserver], arms a watcher that calls
// OnIdleTimeoutCallback once the user has been silent for this long.
// Zero (default) disables the watcher — typical for hold-to-talk
// hotkey sessions that already terminate on KeyUp.
IdleTimeout time.Duration
// OnIdleTimeoutCallback fires once if IdleTimeout elapses without
// observed speech. Wired by the host to dispatch a Stop command so
// the dictate session ends after a silence window. The watcher
// guarantees at-most-one invocation per Start() call.
OnIdleTimeoutCallback func()
}
type RecordingStopOptions ¶
type RecordingStopOptions struct {
Label string
}
type Runtime ¶
type Runtime struct {
// contains filtered or unexported fields
}
Runtime manages shared observable state and event delivery for a SpeechKit session. Create one with NewRuntime and wire it into the host application via Runtime.Events and Runtime.Commands.
func NewRuntime ¶
func (*Runtime) Commands ¶
func (r *Runtime) Commands() CommandBus
func (*Runtime) UpdateState ¶
type RuntimePolicy ¶ added in v0.24.0
type RuntimePolicy struct {
EnabledModes []Mode `json:"enabledModes,omitempty"`
AllowedProfiles []string `json:"allowedProfiles,omitempty"`
FixedProfiles map[Mode]string `json:"fixedProfiles,omitempty"`
AllowFallbacks bool `json:"allowFallbacks,omitempty"`
ModeBehaviors map[Mode]ModeBehavior `json:"modeBehaviors,omitempty"`
}
RuntimePolicy constrains which parts of the SpeechKit framework a host application exposes. Empty EnabledModes or AllowedProfiles mean "all".
type SegmentCollector ¶
type SegmentCollector interface {
FeedPCM([]byte) error
CollectStopSegments(fullPCM []byte) ([]AudioSegment, error)
}
SegmentCollector accumulates real-time PCM frames and splits them into dictation segments when recording stops.
type SegmentCollectorFactory ¶
type SegmentCollectorFactory func() SegmentCollector
type ServerConnectionSetting ¶ added in v0.26.0
type ServerConnectionSetting struct {
Enabled bool `json:"enabled"`
ActiveTargetID string `json:"activeTargetId,omitempty"`
URL string `json:"url"`
BearerTokenEnv string `json:"bearerTokenEnv,omitempty"`
AuthMode string `json:"authMode,omitempty"`
BearerTokenSet bool `json:"bearerTokenSet"`
FallbackToLocal bool `json:"fallbackToLocal"`
RequestTimeoutSec int `json:"requestTimeoutSec"`
Targets []ServerConnectionTarget `json:"targets,omitempty"`
}
ServerConnectionSetting exposes the [server_connection] config section to the control-plane API + frontend. The bearer token is never sent across this boundary — only the env var name + connection metadata.
type ServerConnectionTarget ¶ added in v0.31.0
type ServerConnectionTarget struct {
ID string `json:"id"`
Label string `json:"label"`
URL string `json:"url"`
AuthMode string `json:"authMode"`
BearerTokenEnv string `json:"bearerTokenEnv,omitempty"`
BearerTokenSet bool `json:"bearerTokenSet"`
FallbackToLocal bool `json:"fallbackToLocal"`
RequestTimeoutSec int `json:"requestTimeoutSec"`
}
type Snapshot ¶
type Snapshot struct {
Status string
Text string
Level float64
Hotkey string
ActiveMode string
Providers []string
ActiveProfiles map[string]string
Transcriptions int
QuickNoteMode bool
QuickCaptureMode bool
LastTranscriptionText string
}
Snapshot is a point-in-time copy of the Runtime's observable state. All slice and map fields are safe to read without holding any lock.
type Submission ¶
type Submission struct {
PCM []byte
WAV []byte
DurationSecs float64
Language string
Prefix string
QuickNote bool
QuickNoteID int64
}
Submission carries a single audio segment and its metadata into the transcription pipeline.
type Transcriber ¶
type Transcriber interface {
Transcribe(ctx context.Context, audio []byte, durationSecs float64, language string) (Transcript, error)
}
Transcriber converts raw WAV audio into a Transcript.
type Transcript ¶
type Transcript struct {
Text string
Language string
Duration time.Duration
Provider string
Model string
Confidence float64
}
Transcript holds the result of a single transcription call.
type TranscriptInterceptor ¶
type TranscriptInterceptor interface {
Intercept(ctx context.Context, transcript Transcript, target any) (bool, error)
}
TranscriptInterceptor can handle a transcript before it reaches the normal output path. Return (true, nil) to signal that the transcript was consumed.
type TranscriptOutput ¶
type TranscriptOutput interface {
Deliver(ctx context.Context, transcript Transcript, target any) error
}
TranscriptOutput delivers a completed Transcript to the host application (e.g. clipboard injection or text-field paste).
type TranscriptionJob ¶
type TranscriptionJob struct {
Submission
Target any
}
TranscriptionJob pairs a Submission with its delivery target.
func (TranscriptionJob) Clone ¶
func (j TranscriptionJob) Clone() TranscriptionJob
type TranscriptionObserver ¶
type TranscriptionObserver interface {
OnState(status, text string)
OnLog(message, kind string)
OnTranscriptCommitted(transcript Transcript, quickNote bool)
}
TranscriptionObserver receives real-time status and log updates from a TranscriptionWorker during processing.
type TranscriptionRunner ¶
type TranscriptionRunner struct {
// contains filtered or unexported fields
}
TranscriptionRunner transcribes audio submissions and persists results. Create one with NewTranscriptionRunner.
func NewTranscriptionRunner ¶
func NewTranscriptionRunner(transcriber Transcriber, store Persistence) *TranscriptionRunner
NewTranscriptionRunner creates a TranscriptionRunner backed by the given transcriber and persistence store. Either argument may be nil.
func (*TranscriptionRunner) Commit ¶
func (r *TranscriptionRunner) Commit(ctx context.Context, submission Submission, transcript Transcript) (Completion, error)
func (*TranscriptionRunner) WithObserver ¶
func (r *TranscriptionRunner) WithObserver(observer CommitObserver) *TranscriptionRunner
type TranscriptionStore ¶
type TranscriptionStore interface {
SaveTranscription(ctx context.Context, text, language, provider, model string, durationMs, latencyMs int64, audioData []byte) error
}
TranscriptionStore persists completed dictation transcriptions.
type TranscriptionWorker ¶
type TranscriptionWorker struct {
// contains filtered or unexported fields
}
TranscriptionWorker processes TranscriptionJob values from an internal queue on a single goroutine. Start it with TranscriptionWorker.Start and submit work with TranscriptionWorker.Submit.
func NewTranscriptionWorker ¶
func NewTranscriptionWorker(cfg TranscriptionWorkerConfig) (*TranscriptionWorker, error)
func (*TranscriptionWorker) Close ¶
func (w *TranscriptionWorker) Close()
func (*TranscriptionWorker) Start ¶
func (w *TranscriptionWorker) Start(ctx context.Context)
func (*TranscriptionWorker) Submit ¶
func (w *TranscriptionWorker) Submit(job TranscriptionJob) error
func (*TranscriptionWorker) Wait ¶
func (w *TranscriptionWorker) Wait()
type TranscriptionWorkerConfig ¶
type TranscriptionWorkerConfig struct {
Timeout time.Duration
QueueSize int
Runner *TranscriptionRunner
Output TranscriptOutput
Interceptor TranscriptInterceptor
Observer TranscriptionObserver
}
TranscriptionWorkerConfig configures a TranscriptionWorker. Runner is required; all other fields are optional.
type VoiceActivityDetector ¶ added in v0.24.0
VoiceActivityDetector is the public VAD contract consumed by DictationSegmenter. It intentionally matches SpeechKit's internal Silero detector shape without exposing internal packages.
type VoiceAgentService ¶ added in v0.24.0
type VoiceAgentService interface {
Start(context.Context) error
Stop(context.Context) (VoiceAgentSession, error)
SendText(context.Context, string) error
CurrentSession(context.Context) (VoiceAgentSession, error)
}
VoiceAgentService is the mode-scoped SDK contract for realtime dialogue.
type VoiceAgentSession ¶ added in v0.24.0
type VoiceAgentSession struct {
ID string `json:"id,omitempty"`
StartedAt time.Time `json:"startedAt,omitempty"`
EndedAt time.Time `json:"endedAt,omitempty"`
Locale string `json:"locale,omitempty"`
ProviderProfileID string `json:"providerProfileId,omitempty"`
RuntimeKind string `json:"runtimeKind,omitempty"`
Turns []VoiceAgentTurn `json:"turns,omitempty"`
Summary VoiceAgentSessionSummary `json:"summary"`
}
VoiceAgentSession is the public record for a live dialogue.
type VoiceAgentSessionSummary ¶ added in v0.24.0
type VoiceAgentSessionSummary struct {
Title string `json:"title,omitempty"`
Summary string `json:"summary"`
Ideas []string `json:"ideas,omitempty"`
Decisions []string `json:"decisions,omitempty"`
OpenQuestions []string `json:"openQuestions,omitempty"`
NextSteps []string `json:"nextSteps,omitempty"`
RawText string `json:"rawText,omitempty"`
}
VoiceAgentSessionSummary is the structured handoff produced when a Voice Agent session ends.
type VoiceAgentSetting ¶ added in v0.24.0
type VoiceAgentSetting struct {
ModeSetting
SessionSummary bool `json:"sessionSummary"`
PipelineFallback bool `json:"pipelineFallback"`
CloseBehavior string `json:"closeBehavior,omitempty"`
AgentProfileID string `json:"agentProfileId,omitempty"`
AgentSequenceID string `json:"agentSequenceId,omitempty"`
}
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts.
|
Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts. |
|
Package assist provides an embeddable Assist Mode service.
|
Package assist provides an embeddable Assist Mode service. |
|
Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment).
|
Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment). |
|
Package dictation provides an embeddable strict Dictation runtime.
|
Package dictation provides an embeddable strict Dictation runtime. |
|
Package voiceagent provides an embeddable Voice Agent service.
|
Package voiceagent provides an embeddable Voice Agent service. |
|
live
Package live exposes the low-level Voice Agent realtime-protocol types.
|
Package live exposes the low-level Voice Agent realtime-protocol types. |