whisper

package module

v0.0.24 Latest Latest Go to latest Published: Jun 16, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/mutablelogic/go-whisper

Links

Open Source Insights

README ¶

go-whisper

Speech-to-Text in golang using whisper.cpp.

Features

Transcription & Translation: Easily transcribe audio files and translate them to English
Providers: Use models from OpenAI, ElevenLabs, and HuggingFace
Command Line Interface: Simple CLI for transcription and managing models
HTTP API Server: OpenAPI-compatible server with streaming support
Model Management: Download, list, and delete models
GPU Acceleration: Support for CUDA, Vulkan, and Metal (macOS) acceleration
Docker Support: Pre-built images for amd64 and arm64 architectures

Project Structure

cmd contains the command-line tool, which can also be run as an OpenAPI-compatible HTTP server
pkg contains the whisper service and client
sys contains the whisper bindings to the whisper.cpp library
third_party is a submodule for the whisper.cpp source, and ffmpeg bindings

The following sections describe how to use whisper on the command-line, run it as a service, download a model, run the server, and build the project.

Using Docker

You can run whisper as a CLI command or in a Docker container. There are Docker images for arm64 and amd64 (Intel). There is support for CUDA and Vulkan, but some features are still under development.

In order to utilize an NVIDIA GPU, you'll need to install the NVIDIA Container Toolkit first.

A Docker volume called "whisper" can be used for storing the Whisper language models. You can see which models are available to download from the HuggingFace whisper.cpp repository.

The following command will run the server on port 8080 for an NVIDIA GPU:

docker run \
  --name whisper-server --rm \
  --runtime nvidia --gpus all \ # When using a NVIDIA GPU
  -v whisper:/data -p 8080:80 \
  ghcr.io/mutablelogic/go-whisper:latest-cuda

The API is then available at http://localhost:8080/api/v1 and it generally conforms to the OpenAI API spec.

API Examples

The API is available through the server and conforms generally to the OpenAI API spec. Here are some common usage examples:

Download a model

curl -X POST -H "Content-Type: application/json" \
  -d '{"path": "ggml-medium-q5_0.bin"}' \
  localhost:8080/v1/models?stream=true

List available models

curl -X GET localhost:8080/v1/models

Delete a model

curl -X DELETE localhost:8080/v1/models/ggml-medium-q5_0

Transcribe an audio file

curl -F model=ggml-medium-q5_0 \
  -F file=@samples/jfk.wav \
  localhost:8080/v1/audio/transcriptions?stream=true

Translate an audio file to English

curl -F model=ggml-medium-q5_0 \
  -F file=@samples/de-podcast.wav \
  -F language=en \
  localhost:8080/v1/audio/translations?stream=true

For more detailed API documentation, see the API Reference.

Building

Docker Images

If you are building a Docker image, you just need make and Docker installed:

GGML_CUDA=1 DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container with the server binary for CUDA, tagged to a specific registry
OS=linux GGML_CUDA=0 DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container for Linux, with the server binary without CUDA, tagged to a specific registry

From Source

If you want to build the server without Docker, you can use the Makefile in the root directory and have the following dependencies met:

Recent version of Go (ie, 1.22+)
C++ compiler and cmake
For CUDA, you'll need the CUDA toolkit installed including the nvcc compiler
For Vulkan, you'll need the Vulkan SDK installed
- For the Rasperry Pi, install the following additional packages first: sudo apt install libvulkan-dev libvulkan1 mesa-vulkan-drivers glslc
For Metal, you'll need Xcode installed on macOS

The following Makefile targets can be used:

make whisper - creates the server binary, and places it in the build directory. Should link to Metal on macOS
GGML_CUDA=1 make whisper - creates the server binary linked to CUDA, and places it in the build directory. Should work for amd64 and arm64 (Jetson) platforms
GGML_VULKAN=1 make whisper - creates the server binary linked to Vulkan, and places it in the build directory.

See all the other targets and variations in the Makefile for more information.

Command Line Usage

The whisper command-line tool can be built with make whisper and provides various functionalities.

# List available models
whisper models

# Download a model
whisper download ggml-medium-q5_0.bin

# Delete a model
whisper delete ggml-medium-q5_0

# Transcribe an audio file
whisper transcribe ggml-medium-q5_0 samples/jfk.wav

# Translate an audio file to English
whisper translate ggml-medium-q5_0 samples/de-podcast.wav

# Run the whisper server
whisper server --listen localhost:8080

You can also access transcription and translation functionalities from OpenAI-compatible HTTP endpoints, and ElevenLabs-compatible endpoints:

Set OPENAI_API_KEY environment variable to your OpenAI API key to use the OpenAI-compatible endpoints.
Set ELEVENLABS_API_KEY environment variable to your ElevenLabs API key
Set WHISPER_URL environment variable to the URL of the whisper server to use the OpenAI-compatible endpoints.

# List available remote models (including OpenAI and ElevenLabs models)
whisper models --remote

# Download a model
whisper download ggml-medium-q5_0.bin --remote

# Transcribe an audio file for subtitles (ElevenLabs)
whisper transcribe scribe_v1 samples/jfk.wav --format srt --diarize --remote

# Translate an audio file to English (OpenAI)
whisper translate  whisper-1 samples/de-podcast.wav  --remote

Development Status

This project is currently in development and subject to change. See this GitHub issue for remaining tasks to be completed.

Contributing & License

Please file feature requests and bugs in the GitHub issues. The license is Apache 2 so feel free to redistribute. Redistributions in either source code or binary form must reproduce the copyright notice, and please link back to this repository for more information:

go-whisper
https://github.com/mutablelogic/go-whisper/
Copyright (c) David Thorpe, All rights reserved.

whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) The ggml authors

ffmpeg
https://ffmpeg.org/
Copyright (c) the FFmpeg developers

This software links to static libraries of whisper.cpp licensed under the MIT License. This software links to static libraries of ffmpeg licensed under the LGPL 2.1 License.

Documentation ¶

Index ¶

Constants
type LogFn
type Opt
type Whisper
- func New(path string, opt ...Opt) (*Whisper, error)

Constants ¶

View Source

const (

	// Sample Rate
	SampleRate = whisper.SampleRate
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type LogFn ¶

type LogFn func(string)

type Opt ¶

type Opt func(*opts) error

func OptDebug ¶

func OptDebug() Opt

Set debugging

func OptLog ¶

func OptLog(fn LogFn) Opt

Set logging function

func OptMaxConcurrent ¶

func OptMaxConcurrent(v int) Opt

Set maximum number of concurrent tasks

func OptNoGPU ¶

func OptNoGPU() Opt

Disable GPU acceleration

type Whisper ¶

type Whisper struct {
	// contains filtered or unexported fields
}

Whisper represents a whisper service for running transcription and translation

func New ¶

func New(path string, opt ...Opt) (*Whisper, error)

Create a new whisper service with the path to the models directory and optional parameters

func (*Whisper) Close ¶

func (w *Whisper) Close() error

Release all resources

func (*Whisper) DeleteModelById ¶

func (w *Whisper) DeleteModelById(id string) error

Delete a model by its id

func (*Whisper) DownloadModel ¶

func (w *Whisper) DownloadModel(ctx context.Context, path string, fn func(curBytes, totalBytes uint64)) (*schema.Model, error)

Download a model by path, where the directory is the root of the model within the models directory. The model is returned immediately if it already exists in the store

func (*Whisper) GetModelById ¶

func (w *Whisper) GetModelById(id string) *schema.Model

Get a model by its Id, returns nil if the model does not exist

func (*Whisper) ListModels ¶

func (w *Whisper) ListModels() []*schema.Model

Return all models in the models directory

func (*Whisper) MarshalJSON ¶

func (w *Whisper) MarshalJSON() ([]byte, error)

func (*Whisper) String ¶

func (w *Whisper) String() string

func (*Whisper) WithModel ¶

func (w *Whisper) WithModel(model *schema.Model, fn func(task *task.Context) error) error

Get a task for the specified model, which may load the model or return an existing one. The context can then be used to run the Transcribe function, and after the context is returned to the pool.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
whisper command
pkg
api
client
client/elevenlabs https://elevenlabs.io/docs/overview	https://elevenlabs.io/docs/overview
client/gowhisper
client/openai
pool
schema
store store implements a model store which allows downloading models from a remote server	store implements a model store which allows downloading models from a remote server
task
version
wav
sys
pkg-config command
whisper

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL