SpeechKit

module
v0.28.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2026 License: Apache-2.0

README

SpeechKit

SpeechKit is a Windows-first voice system built around three products:

Product What it is Start here
Go Voice Framework Embeddable Go APIs for Dictation, Assist, Voice Agent, provider routing, and mode contracts. Framework API
Local Windows Client Wails desktop app with global hotkeys, local-first dictation, settings, overlays, and optional cloud providers. Windows app
SpeechKit Server Containerized HTTP/WebSocket service for remote Dictation, Assist, and realtime Voice Agent workloads. Server docs

The shared rule is simple: Dictation only transcribes, Assist returns one-shot utility or LLM output, and Voice Agent is realtime dialogue.

Products

Go Voice Framework

Use pkg/speechkit when you want SpeechKit inside another Go product.

go get github.com/kombifyio/SpeechKit/pkg/speechkit

Useful entry points:

  • speechkit.DefaultModeContracts()
  • speechkit.DefaultProviderProfiles()
  • speechkit.ProfilesForMode(mode)
  • speechkit.ValidateProfileForMode(profile, mode)

Examples live in examples/.

Local Windows Client

Download the latest Windows installer or portable bundle from GitHub Releases.

Default hotkeys:

  • Dictation: Win+Alt
  • Assist: Ctrl+Win
  • Voice Agent: Ctrl+Shift

Local development:

powershell -ExecutionPolicy Bypass -File .\start-dev.ps1
SpeechKit Server

Use the SpeechKit Server when SpeechKit should run behind an HTTP/WebSocket API.

docker pull ghcr.io/kombifyio/speechkit-server:latest

Read docs/server/README.md for endpoints, auth, deployment setup, and OpenAPI links.

Documentation

This README is intentionally short. Jump into the detailed docs when needed:

Build

Canonical Windows app build:

powershell -ExecutionPolicy Bypass -File scripts/build.ps1 -SkipInstaller

Common checks:

go test ./...
go vet ./...
npm --prefix frontend/app run test
npm --prefix frontend/app run build
npm --prefix Website run test
npm --prefix Website run build

Repository Layout

pkg/speechkit/          Go Voice Framework
cmd/speechkit/          Local Windows Client
cmd/speechkit-server/   SpeechKit Server entry point
frontend/app/           Windows UI source
Website/                Public website
internal/               Product internals
docs/                   Detailed documentation
deploy/                 Docker, Render, and server config
installer/              Windows installer
scripts/                Build, release, export, and verification scripts

Trust

Public releases include checksums, an SBOM, and an unsigned Windows notice while the no-cost unsigned release path is active. Download only from the official kombifyio/SpeechKit releases.

License

Apache-2.0. See LICENSE.

Directories

Path Synopsis
cmd
sk-e2e command
Command sk-e2e is a thin end-to-end smoke client for a running speechkit-server instance.
Command sk-e2e is a thin end-to-end smoke client for a running speechkit-server instance.
speechkit command
speechkit-server command
Package main is the canonical kombify SpeechKit Linux container server.
Package main is the canonical kombify SpeechKit Linux container server.
examples
library command
Example: Using SpeechKit as a Go library for speech-to-text.
Example: Using SpeechKit as a Go library for speech-to-text.
provider-catalog command
Example: reading SpeechKit's public mode and provider catalog.
Example: reading SpeechKit's public mode and provider catalog.
internal
ai
assist
Package assist implements the Assist Mode pipeline: STT transcript → Codeword check → LLM → TTS → Result with both text and audio.
Package assist implements the Assist Mode pipeline: STT transcript → Codeword check → LLM → TTS → Result with both text and audio.
audio
Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego.
Audio playback via ebitengine/oto only requires cgo on Linux (ALSA/PulseAudio); the Windows and Darwin backends are pure-Go via purego.
auth
Package auth provides the authentication abstraction for SpeechKit.
Package auth provides the authentication abstraction for SpeechKit.
downloads
Package downloads manages model downloads for SpeechKit — HTTP file downloads and Ollama model pulls with progress tracking.
Package downloads manages model downloads for SpeechKit — HTTP file downloads and Ollama model pulls with progress tracking.
features
Package features provides runtime feature detection for UI gating.
Package features provides runtime feature detection for UI gating.
kombify
Package kombify is the build-tag seam between OSS and kombify builds.
Package kombify is the build-tag seam between OSS and kombify builds.
netsec
Package netsec provides centralized network security primitives used by every HTTP-based provider in SpeechKit (STT, TTS, LLM, downloads).
Package netsec provides centralized network security primitives used by every HTTP-based provider in SpeechKit (STT, TTS, LLM, downloads).
server/assist
Package assist implements the POST /v1/assist/process handler.
Package assist implements the POST /v1/assist/process handler.
server/audio
Package audio normalizes inbound audio payloads to the Framework kernel's canonical PCM format (16 kHz, signed 16-bit little-endian, mono) before they enter the STT router.
Package audio normalizes inbound audio payloads to the Framework kernel's canonical PCM format (16 kHz, signed 16-bit little-endian, mono) before they enter the STT router.
server/cli
Package cli holds the small amount of CLI-level glue for the Linux SpeechKit Server entry point.
Package cli holds the small amount of CLI-level glue for the Linux SpeechKit Server entry point.
server/core
Package core is the SpeechKit server bootstrap layer.
Package core is the SpeechKit server bootstrap layer.
server/dictation
Package dictation implements the POST /v1/dictation/transcribe handler.
Package dictation implements the POST /v1/dictation/transcribe handler.
server/httpx
Package httpx contains tiny cross-handler helpers for JSON error envelopes and status mapping.
Package httpx contains tiny cross-handler helpers for JSON error envelopes and status mapping.
server/middleware
Package middleware provides HTTP middleware primitives for the SpeechKit server adapter.
Package middleware provides HTTP middleware primitives for the SpeechKit server adapter.
server/persona
Package persona provides the Voice Agent persona / role / sequence catalog for the Server-Target.
Package persona provides the Voice Agent persona / role / sequence catalog for the Server-Target.
server/voiceagent
Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target.
Package voiceagent implements the Voice Agent WebSocket surface on the Server-Target.
serverclient
Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.
Package serverclient is the client-side transport adapter that lets a device-target (cmd/speechkit) or a local-target binary delegate one or more modes (Dictation, Assist, Voice Agent) to a remote SpeechKit Server-Target instead of running the Framework kernel in-process.
stt
tts
vad
voiceagent
Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.
Package voiceagent implements the Voice Agent Mode — a real-time, bidirectional voice conversation using native audio-to-audio models (Gemini Live API, OpenAI Realtime API) over WebSocket.
voicebehavior
Package voicebehavior contains the shared Voice Agent behavior catalog used by both the local desktop runtime and the Linux server target.
Package voicebehavior contains the shared Voice Agent behavior catalog used by both the local desktop runtime and the Linux server target.
voiceeval
Package voiceeval contains deterministic dialogue checks for Voice Agent workflow tests.
Package voiceeval contains deterministic dialogue checks for Voice Agent workflow tests.
winapi
Package winapi provides shared Windows DLL proc references used by multiple packages.
Package winapi provides shared Windows DLL proc references used by multiple packages.
pkg
speechkit
Package speechkit provides the public SDK for embedding SpeechKit voice capture and transcription into host applications.
Package speechkit provides the public SDK for embedding SpeechKit voice capture and transcription into host applications.
speechkit/assist
Package assist provides an embeddable Assist service constructor.
Package assist provides an embeddable Assist service constructor.
speechkit/dictation
Package dictation provides an embeddable strict Dictation runtime.
Package dictation provides an embeddable strict Dictation runtime.
speechkit/voiceagent
Package voiceagent provides an embeddable Voice Agent service constructor.
Package voiceagent provides an embeddable Voice Agent service constructor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL