go-whisper

A unified speech-to-text and translation service that provides a single API for multiple AI providers:
- Local Models: High-performance transcription using whisper.cpp with GPU acceleration
- Commercial Models: Cloud-based transcription using OpenAI Whisper and ElevenLabs APIs with advanced features like speaker diarization
Whether you need privacy-focused local processing or cloud-based convenience, go-whisper provides a consistent interface for all your speech processing needs.
Features
Multi-Provider Support
- Local Processing: Privacy-focused transcription using whisper.cpp models
- OpenAI Integration: Access to OpenAI's Whisper API for cloud processing
- ElevenLabs Integration: Advanced features like speaker diarization and SRT subtitle generation
Flexible Deployment
- Command Line Interface: Simple CLI for direct audio processing
- HTTP API Server: RESTful API for transcription and translation services
- Docker Support: Pre-built containers for easy deployment
- GPU Support: CUDA, Vulkan, and Metal (macOS) acceleration for local models
- Model Management: Download, cache, and manage models locally
- Efficient Processing: Optimized for both batch and real-time transcription
For detailed feature documentation, see the Features document.
Quick Start
Get started quickly with Docker (recommended for most users):
# Set API keys for commercial providers (optional)
export OPENAI_API_KEY="your-key-here"
export ELEVENLABS_API_KEY="your-key-here"
# Start the server
docker volume create whisper
docker run -d --name whisper-server \
--env OPENAI_API_KEY \
--env ELEVENLABS_API_KEY \
-v whisper:/data -p 8081:8081 \
ghcr.io/mutablelogic/go-whisper:latest
# Set the server address for CLI commands
export GOWHISPER_ADDR="localhost:8081"
# Download a local model
gowhisper download ggml-medium-q5_0.bin
# Transcribe with local model
gowhisper transcribe ggml-medium-q5_0 your-audio.wav
# Or use OpenAI (requires OPENAI_API_KEY)
gowhisper transcribe whisper-1 your-audio.wav
Note: Download the gowhisper CLI from GitHub Releases or build from source (see Building section).
The following sections provide detailed information about deployment, CLI usage, and building from source. For HTTP API documentation, see the API Reference.
Docker Deployment
For detailed Docker deployment instructions, including GPU support, environment configuration, and production setup, see the Docker Guide.
CLI Usage Examples
The gowhisper CLI tool provides a unified interface for all providers.
| Command |
Description |
Example |
models |
List all available models |
gowhisper models |
model |
Get information about a specific model |
gowhisper model ggml-medium-q5_0 |
download-model |
Download a model |
gowhisper download-model ggml-medium-q5_0.bin |
delete-model |
Delete a local model |
gowhisper delete-model ggml-medium-q5_0 |
transcribe |
Transcribe audio with local model |
gowhisper transcribe ggml-medium-q5_0 samples/jfk.wav |
transcribe |
Transcribe with OpenAI (requires API key) |
gowhisper transcribe whisper-1 samples/jfk.wav |
transcribe |
Transcribe with ElevenLabs diarization |
gowhisper transcribe scribe_v1 samples/meeting.wav --format srt --diarize |
translate |
Translate to English with local model |
gowhisper translate ggml-medium-q5_0 samples/de-podcast.wav |
translate |
Translate with OpenAI |
gowhisper translate whisper-1 samples/de-podcast.wav |
run |
Run the server |
gowhisper run --http.addr localhost:8081 |
Use gowhisper --help or gowhisper <command> --help for more options and detailed usage information.
Development
Project Structure
cmd contains the command-line tool, which can also be run as an OpenAPI-compatible HTTP server
pkg contains the whisper service and client:
whisper/ - Core whisper.cpp bindings and local transcription
openai/ - OpenAI Whisper API client integration
elevenlabs/ - ElevenLabs API client integration
httpclient/ - HTTP client utilities
httphandler/ - HTTP server handlers and routing
schema/ - API schema definitions and types
manager.go - Service orchestration and provider routing
sys contains the whisper bindings to the whisper.cpp library
third_party is a submodule for the whisper.cpp source, and ffmpeg bindings
Building
Docker Images
If you are building a Docker image, you just need make and Docker installed. Some examples:
GGML_CUDA=1 DOCKER_FILE=etc/Dockerfile.cuda DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container with the server binary for CUDA, tagged to a specific registry
GGML_VULKAN=1 make docker - builds a Docker container with the server binary for Vulkan
OS=linux DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container for Linux, with the server binary without CUDA, tagged to a specific registry
From Source
It's recommended (especially for MacOS) to build the whisper binary without Docker, to utilize GPU acceleration.
You can use the Makefile in the root directory and have the following dependencies met:
- Recent version of Go (ie, 1.24+)
- C++ compiler and cmake
- For CUDA, you'll need the CUDA toolkit installed including the
nvcc compiler
- For Vulkan, you'll need the Vulkan SDK installed
- For the Rasperry Pi, install the following additional packages first:
sudo apt install libvulkan-dev libvulkan1 mesa-vulkan-drivers glslc
- For Metal, you'll need Xcode installed on macOS
- For audio and video codec support (ie, x264, AAC, etc) when extracting the audio, you'll need to install appropriate codecs before building (see below).
The following Makefile targets can be used:
make - creates the server binary, and places it in the build directory. Should link to Metal on macOS
GGML_CUDA=1 make gowhisper - creates the server binary linked to CUDA, and places it in the build directory. Should work for amd64 and arm64 (Jetson) platforms
GGML_VULKAN=1 make gowhisper - creates the server binary linked to Vulkan, and places it in the build directory.
See all the other targets and variations in the Makefile for more information.
Contributing & License
This project is currently in development and subject to change. Please file feature requests and bugs
in the GitHub issues.
The license is Apache 2 so feel free to redistribute. Redistributions in either source
code or binary form must reproduce the copyright notice, and please link back to this
repository for more information:
go-whisper
https://github.com/mutablelogic/go-whisper/
Copyright (c) David Thorpe, All rights reserved.
whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) The ggml authors
ffmpeg
https://ffmpeg.org/
Copyright (c) the FFmpeg developers
This software links to static libraries of whisper.cpp licensed under
the MIT License. This software links to static libraries of ffmpeg licensed under the
LGPL 2.1 License.