SpeechKit

module

v0.28.2 Latest Latest Go to latest Published: May 2, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kombifyio/SpeechKit

Links

Open Source Insights

README ¶

SpeechKit

SpeechKit is a Windows-first voice system built around three products:

Product	What it is	Start here
Go Voice Framework	Embeddable Go APIs for Dictation, Assist, Voice Agent, provider routing, and mode contracts.	Framework API
Local Windows Client	Wails desktop app with global hotkeys, local-first dictation, settings, overlays, and optional cloud providers.	Windows app
SpeechKit Server	Containerized HTTP/WebSocket service for remote Dictation, Assist, and realtime Voice Agent workloads.	Server docs

The shared rule is simple: Dictation only transcribes, Assist returns one-shot utility or LLM output, and Voice Agent is realtime dialogue.

Products

Go Voice Framework

Use pkg/speechkit when you want SpeechKit inside another Go product.

go get github.com/kombifyio/SpeechKit/pkg/speechkit

Useful entry points:

speechkit.DefaultModeContracts()
speechkit.DefaultProviderProfiles()
speechkit.ProfilesForMode(mode)
speechkit.ValidateProfileForMode(profile, mode)

Examples live in examples/.

Local Windows Client

Download the latest Windows installer or portable bundle from GitHub Releases.

Default hotkeys:

Dictation: Win+Alt
Assist: Ctrl+Win
Voice Agent: Ctrl+Shift

Local development:

powershell -ExecutionPolicy Bypass -File .\start-dev.ps1

SpeechKit Server

Use the SpeechKit Server when SpeechKit should run behind an HTTP/WebSocket API.

docker pull ghcr.io/kombifyio/speechkit-server:latest

Read docs/server/README.md for endpoints, auth, deployment setup, and OpenAPI links.

Documentation

This README is intentionally short. Jump into the detailed docs when needed:

Build

Canonical Windows app build:

powershell -ExecutionPolicy Bypass -File scripts/build.ps1 -SkipInstaller

Common checks:

go test ./...
go vet ./...
npm --prefix frontend/app run test
npm --prefix frontend/app run build
npm --prefix Website run test
npm --prefix Website run build

Repository Layout

pkg/speechkit/          Go Voice Framework
cmd/speechkit/          Local Windows Client
cmd/speechkit-server/   SpeechKit Server entry point
frontend/app/           Windows UI source
Website/                Public website
internal/               Product internals
docs/                   Detailed documentation
deploy/                 Docker, Render, and server config
installer/              Windows installer
scripts/                Build, release, export, and verification scripts

Trust

Public releases include checksums, an SBOM, and an unsigned Windows notice while the no-cost unsigned release path is active. Download only from the official kombifyio/SpeechKit releases.

License

Apache-2.0. See LICENSE.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
assets
cmd
sk-e2e command Command sk-e2e is a thin end-to-end smoke client for a running speechkit-server instance.	Command sk-e2e is a thin end-to-end smoke client for a running speechkit-server instance.
speechkit command
speechkit-server command Package main is the canonical kombify SpeechKit Linux container server.	Package main is the canonical kombify SpeechKit Linux container server.
examples
library command Example: Using SpeechKit as a Go library for speech-to-text.	Example: Using SpeechKit as a Go library for speech-to-text.
provider-catalog command Example: reading SpeechKit's public mode and provider catalog.	Example: reading SpeechKit's public mode and provider catalog.
internal
ai
ai/flows
assist Package assist implements the Assist Mode pipeline: STT transcript → Codeword check → LLM → TTS → Result with both text and audio.	Package assist implements the Assist Mode pipeline: STT transcript → Codeword check → LLM → TTS → Result with both text and audio.
audio Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego.	Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego.
auth Package auth provides the authentication abstraction for SpeechKit.	Package auth provides the authentication abstraction for SpeechKit.
config
dictation
downloads Package downloads manages model downloads for SpeechKit — HTTP file downloads and Ollama model pulls with progress tracking.	Package downloads manages model downloads for SpeechKit — HTTP file downloads and Ollama model pulls with progress tracking.
features Package features provides runtime feature detection for UI gating.	Package features provides runtime feature detection for UI gating.
frontendassets
hotkey
kombify Package kombify is the build-tag seam between OSS and kombify builds.	Package kombify is the build-tag seam between OSS and kombify builds.
localllm
models
netsec Package netsec provides centralized network security primitives used by every HTTP-based provider in SpeechKit (STT, TTS, LLM, downloads).	Package netsec provides centralized network security primitives used by every HTTP-based provider in SpeechKit (STT, TTS, LLM, downloads).
output
router
runtimepath
secrets
server/assist Package assist implements the POST /v1/assist/process handler.	Package assist implements the POST /v1/assist/process handler.
server/audio Package audio normalizes inbound audio payloads to the Framework kernel's canonical PCM format (16 kHz, signed 16-bit little-endian, mono) before they enter the STT router.	Package audio normalizes inbound audio payloads to the Framework kernel's canonical PCM format (16 kHz, signed 16-bit little-endian, mono) before they enter the STT router.
server/cli Package cli holds the small amount of CLI-level glue for the Linux SpeechKit Server entry point.	Package cli holds the small amount of CLI-level glue for the Linux SpeechKit Server entry point.
server/core Package core is the SpeechKit server bootstrap layer.	Package core is the SpeechKit server bootstrap layer.
server/dictation Package dictation implements the POST /v1/dictation/transcribe handler.	Package dictation implements the POST /v1/dictation/transcribe handler.
server/httpx Package httpx contains tiny cross-handler helpers for JSON error envelopes and status mapping.	Package httpx contains tiny cross-handler helpers for JSON error envelopes and status mapping.
server/middleware Package middleware provides HTTP middleware primitives for the SpeechKit server adapter.	Package middleware provides HTTP middleware primitives for the SpeechKit server adapter.
server/persona Package persona provides the Voice Agent persona / role / sequence catalog for the Server-Target.	Package persona provides the Voice Agent persona / role / sequence catalog for the Server-Target.
server/voiceagent Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target.	Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target.
serverclient Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.	Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.
shortcuts
store
stt
textactions
tray
tts
vad
voiceagent Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.	Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.
voiceagentprofile
voicebehavior Package voicebehavior contains the shared Voice Agent behavior catalog used by both the local desktop runtime and the Linux server target.	Package voicebehavior contains the shared Voice Agent behavior catalog used by both the local desktop runtime and the Linux server target.
voiceeval Package voiceeval contains deterministic dialogue checks for Voice Agent workflow tests.	Package voiceeval contains deterministic dialogue checks for Voice Agent workflow tests.
winapi Package winapi provides shared Windows DLL proc references used by multiple packages.	Package winapi provides shared Windows DLL proc references used by multiple packages.
pkg
speechkit Package speechkit provides the public SDK for embedding SpeechKit voice capture and transcription into host applications.	Package speechkit provides the public SDK for embedding SpeechKit voice capture and transcription into host applications.
speechkit/assist Package assist provides an embeddable Assist service constructor.	Package assist provides an embeddable Assist service constructor.
speechkit/dictation Package dictation provides an embeddable strict Dictation runtime.	Package dictation provides an embeddable strict Dictation runtime.
speechkit/voiceagent Package voiceagent provides an embeddable Voice Agent service constructor.	Package voiceagent provides an embeddable Voice Agent service constructor.