whisper

package module
v0.0.24 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

README

go-whisper

Go Reference License

Speech-to-Text in golang using whisper.cpp.

Features

  • Transcription & Translation: Easily transcribe audio files and translate them to English
  • Providers: Use models from OpenAI, ElevenLabs, and HuggingFace
  • Command Line Interface: Simple CLI for transcription and managing models
  • HTTP API Server: OpenAPI-compatible server with streaming support
  • Model Management: Download, list, and delete models
  • GPU Acceleration: Support for CUDA, Vulkan, and Metal (macOS) acceleration
  • Docker Support: Pre-built images for amd64 and arm64 architectures

Project Structure

  • cmd contains the command-line tool, which can also be run as an OpenAPI-compatible HTTP server
  • pkg contains the whisper service and client
  • sys contains the whisper bindings to the whisper.cpp library
  • third_party is a submodule for the whisper.cpp source, and ffmpeg bindings

The following sections describe how to use whisper on the command-line, run it as a service, download a model, run the server, and build the project.

Using Docker

You can run whisper as a CLI command or in a Docker container. There are Docker images for arm64 and amd64 (Intel). There is support for CUDA and Vulkan, but some features are still under development.

In order to utilize an NVIDIA GPU, you'll need to install the NVIDIA Container Toolkit first.

A Docker volume called "whisper" can be used for storing the Whisper language models. You can see which models are available to download from the HuggingFace whisper.cpp repository.

The following command will run the server on port 8080 for an NVIDIA GPU:

docker run \
  --name whisper-server --rm \
  --runtime nvidia --gpus all \ # When using a NVIDIA GPU
  -v whisper:/data -p 8080:80 \
  ghcr.io/mutablelogic/go-whisper:latest-cuda

The API is then available at http://localhost:8080/api/v1 and it generally conforms to the OpenAI API spec.

API Examples

The API is available through the server and conforms generally to the OpenAI API spec. Here are some common usage examples:

Download a model
curl -X POST -H "Content-Type: application/json" \
  -d '{"path": "ggml-medium-q5_0.bin"}' \
  localhost:8080/v1/models?stream=true
List available models
curl -X GET localhost:8080/v1/models
Delete a model
curl -X DELETE localhost:8080/v1/models/ggml-medium-q5_0
Transcribe an audio file
curl -F model=ggml-medium-q5_0 \
  -F file=@samples/jfk.wav \
  localhost:8080/v1/audio/transcriptions?stream=true
Translate an audio file to English
curl -F model=ggml-medium-q5_0 \
  -F file=@samples/de-podcast.wav \
  -F language=en \
  localhost:8080/v1/audio/translations?stream=true

For more detailed API documentation, see the API Reference.

Building

Docker Images

If you are building a Docker image, you just need make and Docker installed:

  • GGML_CUDA=1 DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container with the server binary for CUDA, tagged to a specific registry
  • OS=linux GGML_CUDA=0 DOCKER_REGISTRY=docker.io/user make docker - builds a Docker container for Linux, with the server binary without CUDA, tagged to a specific registry
From Source

If you want to build the server without Docker, you can use the Makefile in the root directory and have the following dependencies met:

  • Recent version of Go (ie, 1.22+)
  • C++ compiler and cmake
  • For CUDA, you'll need the CUDA toolkit installed including the nvcc compiler
  • For Vulkan, you'll need the Vulkan SDK installed
    • For the Rasperry Pi, install the following additional packages first: sudo apt install libvulkan-dev libvulkan1 mesa-vulkan-drivers glslc
  • For Metal, you'll need Xcode installed on macOS

The following Makefile targets can be used:

  • make whisper - creates the server binary, and places it in the build directory. Should link to Metal on macOS
  • GGML_CUDA=1 make whisper - creates the server binary linked to CUDA, and places it in the build directory. Should work for amd64 and arm64 (Jetson) platforms
  • GGML_VULKAN=1 make whisper - creates the server binary linked to Vulkan, and places it in the build directory.

See all the other targets and variations in the Makefile for more information.

Command Line Usage

The whisper command-line tool can be built with make whisper and provides various functionalities.

# List available models
whisper models

# Download a model
whisper download ggml-medium-q5_0.bin

# Delete a model
whisper delete ggml-medium-q5_0

# Transcribe an audio file
whisper transcribe ggml-medium-q5_0 samples/jfk.wav

# Translate an audio file to English
whisper translate ggml-medium-q5_0 samples/de-podcast.wav

# Run the whisper server
whisper server --listen localhost:8080

You can also access transcription and translation functionalities from OpenAI-compatible HTTP endpoints, and ElevenLabs-compatible endpoints:

  • Set OPENAI_API_KEY environment variable to your OpenAI API key to use the OpenAI-compatible endpoints.
  • Set ELEVENLABS_API_KEY environment variable to your ElevenLabs API key
  • Set WHISPER_URL environment variable to the URL of the whisper server to use the OpenAI-compatible endpoints.
# List available remote models (including OpenAI and ElevenLabs models)
whisper models --remote

# Download a model
whisper download ggml-medium-q5_0.bin --remote

# Transcribe an audio file for subtitles (ElevenLabs)
whisper transcribe scribe_v1 samples/jfk.wav --format srt --diarize --remote

# Translate an audio file to English (OpenAI)
whisper translate  whisper-1 samples/de-podcast.wav  --remote

Development Status

This project is currently in development and subject to change. See this GitHub issue for remaining tasks to be completed.

Contributing & License

Please file feature requests and bugs in the GitHub issues. The license is Apache 2 so feel free to redistribute. Redistributions in either source code or binary form must reproduce the copyright notice, and please link back to this repository for more information:

go-whisper
https://github.com/mutablelogic/go-whisper/
Copyright (c) David Thorpe, All rights reserved.

whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) The ggml authors

ffmpeg
https://ffmpeg.org/
Copyright (c) the FFmpeg developers

This software links to static libraries of whisper.cpp licensed under the MIT License. This software links to static libraries of ffmpeg licensed under the LGPL 2.1 License.

Documentation

Index

Constants

View Source
const (

	// Sample Rate
	SampleRate = whisper.SampleRate
)

Variables

This section is empty.

Functions

This section is empty.

Types

type LogFn

type LogFn func(string)

type Opt

type Opt func(*opts) error

func OptDebug

func OptDebug() Opt

Set debugging

func OptLog

func OptLog(fn LogFn) Opt

Set logging function

func OptMaxConcurrent

func OptMaxConcurrent(v int) Opt

Set maximum number of concurrent tasks

func OptNoGPU

func OptNoGPU() Opt

Disable GPU acceleration

type Whisper

type Whisper struct {
	// contains filtered or unexported fields
}

Whisper represents a whisper service for running transcription and translation

func New

func New(path string, opt ...Opt) (*Whisper, error)

Create a new whisper service with the path to the models directory and optional parameters

func (*Whisper) Close

func (w *Whisper) Close() error

Release all resources

func (*Whisper) DeleteModelById

func (w *Whisper) DeleteModelById(id string) error

Delete a model by its id

func (*Whisper) DownloadModel

func (w *Whisper) DownloadModel(ctx context.Context, path string, fn func(curBytes, totalBytes uint64)) (*schema.Model, error)

Download a model by path, where the directory is the root of the model within the models directory. The model is returned immediately if it already exists in the store

func (*Whisper) GetModelById

func (w *Whisper) GetModelById(id string) *schema.Model

Get a model by its Id, returns nil if the model does not exist

func (*Whisper) ListModels

func (w *Whisper) ListModels() []*schema.Model

Return all models in the models directory

func (*Whisper) MarshalJSON

func (w *Whisper) MarshalJSON() ([]byte, error)

func (*Whisper) String

func (w *Whisper) String() string

func (*Whisper) WithModel

func (w *Whisper) WithModel(model *schema.Model, fn func(task *task.Context) error) error

Get a task for the specified model, which may load the model or return an existing one. The context can then be used to run the Transcribe function, and after the context is returned to the pool.

Directories

Path Synopsis
cmd
whisper command
pkg
api
client/elevenlabs
https://elevenlabs.io/docs/overview
https://elevenlabs.io/docs/overview
store
store implements a model store which allows downloading models from a remote server
store implements a model store which allows downloading models from a remote server
wav
sys
pkg-config command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL