go-whisper
Speech-to-Text in golang. This is an early development version.
cmd contains an OpenAI-API compatible server
pkg contains the whisper service and http gateway
sys contains the whisper bindings to the whisper.cpp library
third_party is a submodule for the whisper.cpp source
Running
There are docker images for arm64 and amd64 (Intel). The arm64 image is built for
Jetson GPU support specifically, but it will also run on Raspberry Pi's.
In order to utilize a NVIDIA GPU, you'll need to install the
NVIDIA Container Toolkit first.
A docker volume should be created called "whisper" can be used for storing the Whisper language
models. You can see which models are available to download locally here. The following command will run the server on port 8080:
docker run \
--name whisper-server --rm \
--runtime nvidia --gpus all \ # When using a NVIDIA GPU
-v whisper:/models -p 8080:8080 -e WHISPER_DATA=/models \
ghcr.io/mutablelogic/go-whisper:latest
If you include a -debug flag at the end, you'll get more verbose output. The API is then
available at http://localhost:8080/v1 and it generally conforms to the
OpenAI API spec.
Sample Usage
In order to download a model, you can use the following command (for example):
curl -X POST -H "Content-Type: application/json" -d '{"Path" : "ggml-tiny.en-q8_0.bin" }' localhost:8080/v1/models
To list the models available, you can use the following command:
curl -X GET localhost:8080/v1/models
To delete a model, you can use the following command:
curl -X DELETE localhost:8080/v1/models/ggml-tiny.en-q8_0.bin
And to transcribe an audio file, you can use the following command:
curl -F "model=ggml-tiny.en-q8_0.bin" -F "file=@samples/jfk.wav" -F "language=en" localhost:8080/v1/audio/transcriptions
Right now there's a limitation on the files: they must be mono WAV files at 16K sample rate.
There's more information on the API here.
Building
If you want to build the server yourself for your specific combination of hardware,
you can use the Makefile in the root directory. You'll need go 1.22, make and
a C++ compiler to build this project. The following Makefile targets can be used:
make server - creates the server binary, and places it in the build directory
DOCKER_REGISTRY=docker.io/user make docker - builds a docker container with the server binary
See all the other targets in the Makefile for more information.
Status
Still in development. It only accepts mono WAV files at 16K sample rate, for example. It also
occasionally crashes, and the API is not fully implemented.
Contributing & Distribution
This module is currently in development and subject to change.
Please do file feature requests and bugs here.
The license is Apache 2 so feel free to redistribute. Redistributions in either source
code or binary form must reproduce the copyright notice, and please link back to this
repository for more information:
go-media
https://github.com/mutablelogic/go-whisper/
Copyright (c) 2023-2024 David Thorpe, All rights reserved.
whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) 2023-2024 The ggml authors
This software links to static libraries of whisper.cpp licensed under
the MIT License.