SpeechKit

module

v0.35.15 Latest Latest Go to latest Published: May 21, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kombifyio/SpeechKit

Links

Open Source Insights

README ¶

SpeechKit

🚧 Beta. SpeechKit is in active beta. Public APIs, config keys, and defaults can still change between minor releases. Use it in production only with version pins. Pre-1.0 releases use the v0.MAJOR.MINOR scheme; breaking changes are called out in each release entry.

SpeechKit is a Windows-first voice framework for products that need dictation, voice commands, and realtime voice dialogue without coupling every use case to one desktop app or one hosted API.

The framework currently has three modules:

Module	What it is	Use it when
Local-first Go backend	Embeddable Go runtime in `pkg/speechkit` with mode contracts, provider profiles, routing policy, readiness metadata, and reusable Dictation, Assist, and Voice Agent services.	You want to integrate SpeechKit into your own Go product, internal tool, prototype, or automation host.
SpeechKit Server	Linux server runtime in `cmd/speechkit-server` that wraps the same backend behind HTTP and WebSocket APIs.	You need a durable server process for remote clients, teams, product backends, browsers, or centrally managed model/provider configuration.
Windows Client	Wails desktop client in `cmd/speechkit` for local use, provider testing, and server-connected workflows.	You want to use SpeechKit on a Windows machine, validate providers and models, or connect a workstation to a SpeechKit Server.

All three modules share the same three strict modes:

Mode	Purpose	Boundary
Dictation	Turn speech into text.	STT only. No LLM rewriting, no utilities, no codewords.
Assist	Turn speech or text into one useful result.	Codeword, utility, or LLM output with optional TTS and explicit UI surface metadata.
Voice Agent	Run realtime audio-to-audio dialogue.	Live conversation for brainstorming, support, and fast follow-ups.

Why SpeechKit

Local-first Go backend

Use the backend when you want voice features inside another application without adopting the Windows client. The public pkg/speechkit boundary gives host apps stable mode contracts, service interfaces, provider catalogs, and readiness data they can turn into their own setup UI.

Key advantages:

One framework kernel for Dictation, Assist, and Voice Agent instead of three unrelated voice pipelines.
Local-first provider support with room for managed local runtimes, user-managed local services, cloud providers, and direct vendor APIs.
Host policy controls for enabled modes, fixed profiles, fallbacks, and clean vs intelligence behavior.
Machine-readable readiness checks for credentials, local runtimes, model artifacts, and mode capability.

Start with Framework API or the examples in examples/.

SpeechKit Server

Use the server when SpeechKit should run as a long-lived Linux service. It adapts the same framework kernel to a containerized API surface so other clients can call Dictation, Assist, and Voice Agent without embedding Go code.

Key advantages:

One server image, one URL, and one deployment contract for all three modes.
HTTP endpoints for Dictation and Assist plus WebSocket sessions for realtime Voice Agent.
Built-in health/readiness routes, bearer or edge-auth modes, CORS/origin controls, and OpenAPI contracts.
Centralized provider, model, and secret configuration for teams or hosted deployments.

Start with docs/server/README.md and the server OpenAPI file at docs/server/openapi.v1.yaml.

Windows Client

Use the Windows client when you want a ready-to-run desktop experience or a reference host for testing providers, models, and server connections. The app can run local-first on the machine or delegate selected work to a SpeechKit Server.

Key advantages:

Global hotkeys for Dictation, Assist, and Voice Agent.
Local audio capture, VAD, overlays, settings, provider setup, and optional audio playback in one Wails app.
Provider/model test bench for local, cloud, and direct integrations.
Server connection support with configurable bearer-token environment variable, request timeout, and local fallback behavior.

Download public builds from GitHub Releases when available. For source builds:

powershell -ExecutionPolicy Bypass -File scripts/build.ps1 -SkipInstaller

Default hotkeys:

Dictation: Ctrl+Win
Assist: Win+Alt
Voice Agent: Ctrl+Shift

Quick Start

Embed the Go backend:

go get github.com/kombifyio/SpeechKit/pkg/speechkit

Build the Windows client locally:

powershell -ExecutionPolicy Bypass -File scripts/build.ps1 -SkipInstaller

Run the server image:

docker pull ghcr.io/kombifyio/speechkit-server:latest

Documentation

This README is the short orientation page. Use the detailed docs when you need contracts, deployment steps, or release rules:

Build

Canonical Windows app build:

powershell -ExecutionPolicy Bypass -File scripts/build.ps1 -SkipInstaller

For local verification before commit, PR, CI, or deploy work, use the repo-local mise contract documented in docs/LOCAL_TESTING.md. Package-manager and raw Go commands are implementation details behind those preflight gates.

mise run preflight:quick
mise run preflight:release
mise run preflight:deploy

Repository Layout

pkg/speechkit/          Local-first Go backend
cmd/speechkit-server/   SpeechKit Server entry point
cmd/speechkit/          Windows Client entry point
frontend/app/           Windows UI source
internal/               Product internals
docs/                   Detailed documentation
deploy/                 Docker and server config
installer/              Windows installer
scripts/                Build, release, export, and verification scripts

Trust

Public releases include checksums, an SBOM, and an unsigned Windows notice while the no-cost unsigned release path is active. Download only from the official kombifyio/SpeechKit releases.

License

Apache-2.0. See LICENSE.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
assets
cmd
sk-browser-smoke command Command sk-browser-smoke drives the public smoke page (`/`) of a running speechkit-server instance from a real headless Chrome process and asserts that every mode tile reports OK.	Command sk-browser-smoke drives the public smoke page (`/`) of a running speechkit-server instance from a real headless Chrome process and asserts that every mode tile reports OK.
sk-e2e command Command sk-e2e is a thin end-to-end smoke client for a running speechkit-server instance.	Command sk-e2e is a thin end-to-end smoke client for a running speechkit-server instance.
sk-localprobe command Command sk-localprobe verifies that the SpeechKit kernel libraries produce working Dictation, Assist, and Voice Agent results against LOCAL models only (Whisper.cpp + Gemma via llama-server).	Command sk-localprobe verifies that the SpeechKit kernel libraries produce working Dictation, Assist, and Voice Agent results against LOCAL models only (Whisper.cpp + Gemma via llama-server).
speechkit command
speechkit-cli command
speechkit-mcp command
speechkit-mcp/internal/util
speechkit-openwakeword command speechkit-openwakeword hosts the openWakeWord-compatible ONNX frontend in a sibling process.	speechkit-openwakeword hosts the openWakeWord-compatible ONNX frontend in a sibling process.
speechkit-server command Package main is the canonical kombify SpeechKit Linux container server.	Package main is the canonical kombify SpeechKit Linux container server.
speechkit-wakeword command speechkit-wakeword is the sidecar binary that hosts SpeechKit's on-device keyword spotter outside the main desktop process.	speechkit-wakeword is the sidecar binary that hosts SpeechKit's on-device keyword spotter outside the main desktop process.
speechkit/internal/profiles Package profiles provides pure model-profile selection helpers — the part of the legacy cmd/speechkit model_selection_helpers.go that depends only on config and the model catalog (no *appState, no network, no Wails surface).	Package profiles provides pure model-profile selection helpers — the part of the legacy cmd/speechkit model_selection_helpers.go that depends only on config and the model catalog (no *appState, no network, no Wails surface).
speechkit/internal/transcription Package transcription provides STT model-selection helpers and the vocabulary-dictionary primitives that the desktop adapters consume.	Package transcription provides STT model-selection helpers and the vocabulary-dictionary primitives that the desktop adapters consume.
docs
examples
library command Example: Using SpeechKit as a Go library for speech-to-text.	Example: Using SpeechKit as a Go library for speech-to-text.
provider-catalog command Example: reading SpeechKit's public mode and provider catalog.	Example: reading SpeechKit's public mode and provider catalog.
voice-agent/game-instructor command Example: 15-minute Voice-Agent game instructor.	Example: 15-minute Voice-Agent game instructor.
internal
ai Package ai wires the Genkit runtime and the SpeechKit model catalog into a single LLM/embedding/reranker surface used by Assist and the Voice Agent pipeline-fallback path.	Package ai wires the Genkit runtime and the SpeechKit model catalog into a single LLM/embedding/reranker surface used by Assist and the Voice Agent pipeline-fallback path.
ai/flows
assist Package assist implements the Assist Mode pipeline: STT transcript → Codeword check → LLM → TTS → Result with both text and audio.	Package assist implements the Assist Mode pipeline: STT transcript → Codeword check → LLM → TTS → Result with both text and audio.
audio Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego.	Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego.
auditlog Package auditlog provides the dedicated audit-event stream for SpeechKit.	Package auditlog provides the dedicated audit-event stream for SpeechKit.
auditlogtest Package auditlogtest provides test-only helpers for resetting the audit log package state between test cases.	Package auditlogtest provides test-only helpers for resetting the audit log package state between test cases.
auth Package auth provides the authentication abstraction for SpeechKit.	Package auth provides the authentication abstraction for SpeechKit.
config
desktop/controlplane
desktop/runtime
desktop/settings
desktop/update
dictation Package dictation implements pause-based segmentation for Dictation Mode: it consumes VAD speech-probability frames and emits one transcription request per natural pause.	Package dictation implements pause-based segmentation for Dictation Mode: it consumes VAD speech-probability frames and emits one transcription request per natural pause.
downloads Package downloads manages model downloads for SpeechKit — HTTP file downloads and Ollama model pulls with progress tracking.	Package downloads manages model downloads for SpeechKit — HTTP file downloads and Ollama model pulls with progress tracking.
features Package features provides runtime feature detection for UI gating.	Package features provides runtime feature detection for UI gating.
frontendassets
hotkey
kombify Package kombify is the build-tag seam between OSS and kombify builds.	Package kombify is the build-tag seam between OSS and kombify builds.
localllm
models Package models defines the SpeechKit model catalog: provider IDs, model identifiers, modality (STT, TTS, Realtime Voice, Assist, Utility, Embedding, Reranker), execution mode (local/cloud/direct), and the readiness metadata that setup UIs and the readiness endpoint consume.	Package models defines the SpeechKit model catalog: provider IDs, model identifiers, modality (STT, TTS, Realtime Voice, Assist, Utility, Embedding, Reranker), execution mode (local/cloud/direct), and the readiness metadata that setup UIs and the readiness endpoint consume.
netsec Package netsec provides centralized network security primitives used by every HTTP-based provider in SpeechKit (STT, TTS, LLM, downloads).	Package netsec provides centralized network security primitives used by every HTTP-based provider in SpeechKit (STT, TTS, LLM, downloads).
output
router Package router implements the STT routing layer.	Package router implements the STT routing layer.
runtimepath
scaffold Package scaffold renders embedded starter templates into a target directory so callers can bootstrap a SpeechKit integration without hand-copying boilerplate.	Package scaffold renders embedded starter templates into a target directory so callers can bootstrap a SpeechKit integration without hand-copying boilerplate.
secrets
server/assist Package assist implements the POST /v1/assist/process handler.	Package assist implements the POST /v1/assist/process handler.
server/audio Package audio normalizes inbound audio payloads to the Framework kernel's canonical PCM format (16 kHz, signed 16-bit little-endian, mono) before they enter the STT router.	Package audio normalizes inbound audio payloads to the Framework kernel's canonical PCM format (16 kHz, signed 16-bit little-endian, mono) before they enter the STT router.
server/catalog
server/cli Package cli holds the small amount of CLI-level glue for the Linux SpeechKit Server entry point.	Package cli holds the small amount of CLI-level glue for the Linux SpeechKit Server entry point.
server/configapi
server/core Package core is the SpeechKit server bootstrap layer.	Package core is the SpeechKit server bootstrap layer.
server/dictation Package dictation implements the POST /v1/dictation/transcribe handler.	Package dictation implements the POST /v1/dictation/transcribe handler.
server/httpx Package httpx contains tiny cross-handler helpers for JSON error envelopes and status mapping.	Package httpx contains tiny cross-handler helpers for JSON error envelopes and status mapping.
server/middleware Package middleware provides HTTP middleware primitives for the SpeechKit server adapter.	Package middleware provides HTTP middleware primitives for the SpeechKit server adapter.
server/onboarding
server/persona Package persona provides the Voice Agent persona / role / sequence catalog for the Server-Target.	Package persona provides the Voice Agent persona / role / sequence catalog for the Server-Target.
server/storageauth
server/transcripts
server/ttsapi
server/vocabulary
server/voiceagent Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target.	Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target.
serverclient Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.	Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.
shortcuts Package shortcuts implements pattern-matched intent shortcuts used by Assist Mode.	Package shortcuts implements pattern-matched intent shortcuts used by Assist Mode.
store
stt Package stt defines the SpeechKit speech-to-text provider interface and houses the concrete provider implementations: whisper.cpp (local built-in), HuggingFace, OpenAI, Groq, Google, an OpenAI-compatible adapter (covers Ollama and other compatible servers), and the self-hosted VPS adapter.	Package stt defines the SpeechKit speech-to-text provider interface and houses the concrete provider implementations: whisper.cpp (local built-in), HuggingFace, OpenAI, Groq, Google, an OpenAI-compatible adapter (covers Ollama and other compatible servers), and the self-hosted VPS adapter.
testutil
textactions
tray
tts Package tts implements the SpeechKit text-to-speech surface: a small provider interface plus concrete adapters for OpenAI, Google, and Hugging Face.	Package tts implements the SpeechKit text-to-speech surface: a small provider interface plus concrete adapters for OpenAI, Google, and Hugging Face.
vad
voiceagent
voiceagent/cascaded Package cascaded implements a turn-based STT -> LLM -> TTS voice agent provider.	Package cascaded implements a turn-based STT -> LLM -> TTS voice agent provider.
voiceagentprofile
voicebehavior Package voicebehavior contains the shared Voice Agent behavior catalog used by both the local desktop runtime and the Linux server target.	Package voicebehavior contains the shared Voice Agent behavior catalog used by both the local desktop runtime and the Linux server target.
voiceeval Package voiceeval contains deterministic dialogue checks for Voice Agent workflow tests.	Package voiceeval contains deterministic dialogue checks for Voice Agent workflow tests.
wakeword Package wakeword implements an always-on, on-device keyword spotter that any of the three SpeechKit modes (Dictation, Assist, Voice Agent) can opt into.	Package wakeword implements an always-on, on-device keyword spotter that any of the three SpeechKit modes (Dictation, Assist, Voice Agent) can opt into.
winapi Package winapi provides shared Windows DLL proc references used by multiple packages.	Package winapi provides shared Windows DLL proc references used by multiple packages.
pkg
speechkit Package speechkit provides the public SDK for embedding SpeechKit voice capture, transcription, and assist/voice-agent pipelines into host applications.	Package speechkit provides the public SDK for embedding SpeechKit voice capture, transcription, and assist/voice-agent pipelines into host applications.
speechkit/agentkit Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts.	Package agentkit provides a small Go harness for building SpeechKit Voice Agent hosts.
speechkit/assist Package assist provides an embeddable Assist Mode service.	Package assist provides an embeddable Assist Mode service.
speechkit/client Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment).	Package client provides a typed HTTP client for talking to a remote SpeechKit Server (the `cmd/speechkit-server` Linux container or any compatible deployment).
speechkit/dictation Package dictation provides an embeddable strict Dictation runtime.	Package dictation provides an embeddable strict Dictation runtime.
speechkit/storage
speechkit/voiceagent Package voiceagent provides an embeddable Voice Agent service.	Package voiceagent provides an embeddable Voice Agent service.
speechkit/voiceagent/live Package live exposes the low-level Voice Agent realtime-protocol types.	Package live exposes the low-level Voice Agent realtime-protocol types.