go-whisper

module
v0.0.34 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 19, 2026 License: Apache-2.0

README

go-whisper

Go Reference License

A unified speech-to-text and translation service that provides a single API for multiple providers:

  • Local Models: High-performance transcription using whisper.cpp with GPU acceleration
  • Commercial Models: Cloud-based transcription using OpenAI Whisper and ElevenLabs APIs with advanced features like speaker diarization

Features

  • Command Line Interface: Downloadable CLI for communicating with the server for audio processing
  • Download and Realtime Transcription with JSON, SRT, VTT, and text output formats
  • HTTP API Server: RESTful API for transcription and translation service
  • Docker Support: Pre-built GPU-enabled containers for easy deployment of the service
  • GPU Support: CUDA, Vulkan, and Metal (macOS) acceleration for local models
  • Model Management: Download, cache, and manage models locally

Quick Start

Get started quickly with Docker as the server:

# Set API keys for commercial providers (optional)
export OPENAI_API_KEY="your-key-here"
export ELEVENLABS_API_KEY="your-key-here"

# Start the CPU-only server
docker volume create whisper
docker run -d --name whisper-server \
  --env OPENAI_API_KEY --env ELEVENLABS_API_KEY \
  -v whisper:/data -p 8081:8081 ghcr.io/mutablelogic/go-whisper

Download the gowhisper CLI from GitHub Releases or build from source:

# Set the server address for CLI commands
export GOWHISPER_ADDR="localhost:8081"

# Download a local model
gowhisper download ggml-medium-q5_0.bin

# Transcribe with local model to SRT
gowhisper transcribe ggml-medium-q5_0 your-audio.wav --format srt

# Or use OpenAI (requires OPENAI_API_KEY)
gowhisper transcribe whisper-1 your-audio.wav

The following sections provide detailed information about deployment, CLI usage, and building from source. For HTTP API documentation, see the API Reference.

Model Support

  • Transcription is the process of converting spoken language into written text, in any language supported by the model.
  • Translation is the process of converting spoken language into written text in English, regardless of the original language.
  • Diarization is the process of identifying and separating different speakers in an audio recording.
  • Realtime processing allows for transcription or translation of audio streams to be returned as it is being processed, rather than waiting for the entire audio file to be processed before returning results.
Model(s) Transcription Translation to English Diarization Realtime
GGML Whisper *-en.bin
GGML Whisper *.bin
GGML Whisper ggml-small.en-tdrz.bin^1
OpenAI whisper-1 ^2
OpenAI gpt-4o-*-transcribe ^4,^5
ElevenLabs scribe_v1,scribe_v2 ^3

Docker Deployment

For detailed Docker deployment instructions, including GPU support, environment configuration, and production setup, see the Docker Guide.

CLI Usage Examples

The gowhisper CLI tool provides a unified interface for all providers.

Command Description Example
models List all available models gowhisper models
model Get information about a specific model gowhisper model ggml-medium-q5_0
download-model Download a model gowhisper download-model ggml-medium-q5_0.bin
delete-model Delete a local model gowhisper delete-model ggml-medium-q5_0
transcribe Transcribe audio with local model gowhisper transcribe ggml-medium-q5_0 samples/jfk.wav
transcribe Transcribe with OpenAI (requires API key) gowhisper transcribe whisper-1 samples/jfk.wav
transcribe Transcribe with ElevenLabs diarization gowhisper transcribe scribe_v1 samples/meeting.wav --format srt --diarize
translate Translate to English with local model gowhisper translate ggml-medium-q5_0 samples/de-podcast.wav
translate Translate with OpenAI gowhisper translate whisper-1 samples/de-podcast.wav
run Run the server gowhisper run --http.addr localhost:8081

Use gowhisper --help or gowhisper <command> --help for more options and detailed usage information.

Development

Project Structure
  • cmd contains the command-line tool, which can also be run as an OpenAPI-compatible HTTP server
  • pkg contains the whisper service and client:
    • whisper/ - Core whisper.cpp bindings and local transcription
    • openai/ - OpenAI Whisper API client integration
    • elevenlabs/ - ElevenLabs API client integration
    • httpclient/ - HTTP client utilities
    • httphandler/ - HTTP server handlers and routing
    • schema/ - API schema definitions and types
    • manager.go - Service orchestration and provider routing
  • sys contains the bindings to the whisper.cpp library
  • third_party is a submodule for the whisper.cpp source, and ffmpeg bindings
Building
Docker Images

If you are building a Docker image, you just need make and Docker installed. Some examples:

  • GGML_CUDA=1 DOCKER_FILE=etc/Dockerfile.cuda DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container with the server binary for CUDA, tagged to a specific registry
  • GGML_VULKAN=1 make docker - builds a Docker container with the server binary for Vulkan
  • OS=linux DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container for Linux, with the server binary without CUDA, tagged to a specific registry
From Source

It's recommended (especially for MacOS) to build the whisper binary without Docker, to utilize GPU acceleration. You can use the Makefile in the root directory and have the following dependencies met:

  • Recent version of Go (ie, 1.24+)
  • C++ compiler and cmake
  • For CUDA, you'll need the CUDA toolkit installed including the nvcc compiler
  • For Vulkan, you'll need the Vulkan SDK installed
    • For the Rasperry Pi, install the following additional packages first: sudo apt install libvulkan-dev libvulkan1 mesa-vulkan-drivers glslc
  • For Metal, you'll need Xcode installed on macOS
  • For audio and video codec support (ie, x264, AAC, etc) when extracting the audio, you'll need to install appropriate codecs before building (see below).

The following Makefile targets can be used:

  • make - creates the server binary, and places it in the build directory. Should link to Metal on macOS
  • GGML_CUDA=1 make gowhisper - creates the server binary linked to CUDA, and places it in the build directory. Should work for amd64 and arm64 (Jetson) platforms
  • GGML_VULKAN=1 make gowhisper - creates the server binary linked to Vulkan, and places it in the build directory.

See all the other targets and variations in the Makefile for more information.

Contributing & License

This project is currently in development and subject to change. Please file feature requests and bugs in the GitHub issues. The license is Apache 2 so feel free to redistribute. Redistributions in either source code or binary form must reproduce the copyright notice, and please link back to this repository for more information:

go-whisper
https://github.com/mutablelogic/go-whisper/
Copyright (c) David Thorpe, All rights reserved.

whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) The ggml authors

go-media
https://github.com/mutablelogic/go-media/ Copyright (c) 2021-2026 David Thorpe, All rights reserved.

ffmpeg
https://ffmpeg.org/
Copyright (c) the FFmpeg developers

This software links to static libraries of whisper.cpp licensed under the MIT License. This software links to static libraries of ffmpeg licensed under the LGPL 2.1 License.

Directories

Path Synopsis
cmd
gowhisper command
pkg
elevenlabs
https://elevenlabs.io/docs/overview
https://elevenlabs.io/docs/overview
httpclient
Package httpclient provides a typed Go client for consuming the go-whisper REST API.
Package httpclient provides a typed Go client for consuming the go-whisper REST API.
wav
whisper/store
store implements a model store which allows downloading models from a remote server
store implements a model store which allows downloading models from a remote server
sys
pkg-config command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL