whisperAPI

command module

v0.0.0-...-a36e19c Latest Latest Go to latest Published: Dec 22, 2025 License: BSD-3-Clause Imports: 26 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/VA7DBI/whisperAPI

Links

Open Source Insights

README ¶

Whisper API Service

A self-hosted voice-to-text transcription API service using Whisper AI. Supports multiple audio formats including WAV, MP3, OGG (Vorbis), and Opus.

Features

Speech-to-text transcription using Whisper AI
Support for multiple audio formats:
- WAV (16-bit PCM)
- MP3 (MPEG Layer-3)
- FLAC (Free Lossless Audio Codec)
- AAC (Advanced Audio Coding) - metadata parsing only
- OGG/Vorbis
- OGG/Opus
Automatic format detection and conversion:
- Sample rate conversion to 16kHz
- Mono channel conversion
- Bit depth normalization
Rich metadata for each transcription:
- Word-level timing
- Confidence scores
- Audio format details
- Performance metrics
Prometheus monitoring with detailed metrics
Swagger API documentation
Authentication:
- Bearer token authentication
- Multi-layer token validation:
  - Redis cache (fast)
  - PostgreSQL database (persistent)
  - Static tokens (fallback)
- Configurable token expiration
- Optional authentication mode

Prerequisites

Core requirements:

Go 1.20 or later
Whisper model file (ggml-base.bin) download from https://huggingface.co/ggerganov/whisper.cpp)

For test fixtures generation:

Python 3.x
FFmpeg
gTTS (pip install gtts)
pydub (pip install pydub)

Additional requirements for authentication:

Redis (optional, for token caching)
PostgreSQL (optional, for token storage)

Quick Start

Clone the repository:

git clone https://github.com/VA7DBI/whisperAPI.git
cd whisperAPI

Download the Whisper model:

mkdir models
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin -o models/ggml-base.bin

Install dependencies:
```
go mod download
```

Generate test fixtures (optional):

cd test_fixtures
python makewave.py "This is a test audio file"
cd ..

Run API tests (optional):

go test -v ./...

Note, if whisper.h is not in your default system include path, you may need to use the CGO_*FLAGS environment variables before running your go test or go build, for example:

export CGO_CFLAGS="-I/usr/local/include"
export CGO_LDFLAGS="-L/usr/local/lib"

Build and run:
```
go build
./whisperAPI
```

Authentication Setup

Create the PostgreSQL token table:

CREATE TABLE api_tokens (
    token VARCHAR(255) PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    valid_until TIMESTAMP NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    scopes TEXT[]
);

-- Example token insertion
INSERT INTO api_tokens (token, user_id, valid_until, scopes) 
VALUES (
    'your-api-token-here',
    'user123',
    NOW() + INTERVAL '30 days',
    ARRAY['transcribe', 'read']
);

Configure authentication in config.yaml:

auth:
  enabled: true  # Enable/disable auth
  tokens:        # Static fallback tokens
    - "your-static-token"
  redis:
    enabled: true
    host: "localhost"
    port: 6379
    db: 0
    password: ""
    key_ttl: 3600  # Cache TTL in seconds
  postgres:
    enabled: true
    host: "localhost"
    port: 5432
    user: "postgres"
    password: "secret"
    dbname: "whisperapi"
    table: "api_tokens"
    query: "SELECT EXISTS(SELECT 1 FROM api_tokens WHERE token = $1 AND valid_until > NOW())"

API Documentation

GET /swagger/

Swagger UI for API documentation.

GET /health

Health check endpoint.

Response:

{
  "status": "ok"
}

GET /metrics

Prometheus metrics endpoint providing:

Request counts by format and status
Processing durations (histogram)
Audio durations (histogram)
Memory usage (gauge)
CPU/GPU time (histogram)

POST /transcribe

Upload an audio file for transcription.

Request:

Method: POST
Content-Type: multipart/form-data
Form field: "audio" (file)
Supported formats: WAV, OGG/Vorbis, OGG/Opus

Response:

{
  "text": "Transcribed text content",
  "audio_info": {
    "format": "WAV",
    "codec": "PCM",
    "sample_rate": 16000,
    "channels": 1,
    "bit_depth": 16,
    "duration_seconds": 10.5,
    "original_size_bytes": 336000,
    "bitrate_kbps": 256
  },
  "compute_time": {
    "cpu_time_seconds": 1.23,
    "gpu_time_seconds": null
  },
  "timestamp": "2024-02-14T12:34:56Z"
  ,
  ...
}

See the swagger documentation for more details

Authentication

All protected endpoints require a Bearer token:

curl -X POST http://localhost:8080/transcribe \
  -H "Authorization: Bearer your-token-here" \
  -F "audio=@sample.wav"

Token validation flow:

Check Redis cache for fast validation
If not in cache, check PostgreSQL database
If found in database, cache in Redis
If not found, check static tokens
If no match found, return 401 Unauthorized

Testing

Run the test suite:

go test -v ./...

Generate test coverage:

go test -v -cover ./...

Error Handling

The API returns detailed error responses:

{
  "error": "Detailed error message"
}

Common error scenarios:

Invalid audio format
Unsupported codec
File read/write errors
Processing timeouts
Memory limits exceeded

Authentication errors:

{
  "error": "Authorization header required"
}

{
  "error": "Invalid token"
}

Monitoring

Prometheus metrics available at /metrics:

whisperapi_transcription_requests_total{status="success|error",format="wav|ogg|opus"}
whisperapi_transcription_duration_seconds
whisperapi_audio_duration_seconds
whisperapi_memory_usage_bytes{type="allocated|system|heap"}
whisperapi_cpu_time_seconds{operation="user|system|total"}

Contributing

Fork the repository
Create your feature branch
Run tests: go test -v ./...
Commit changes
Push to your branch
Create Pull Request

Running as a Service

FreeBSD RC Service

Create a FreeBSD service file at /usr/local/etc/rc.d/whisperapi:

#!/bin/sh
#
# PROVIDE: whisperapi
# REQUIRE: NETWORKING
# KEYWORD: shutdown

. /etc/rc.subr

name="whisperapi"
rcvar="whisperapi_enable"
whisperapi_user="www"
whisperapi_group="www"
pidfile="/var/run/${name}.pid"
command="/usr/local/bin/whisperAPI"
command_args="&"

load_rc_config $name
run_rc_command "$1"

Make it executable and enable the service:

chmod +x /usr/local/etc/rc.d/whisperapi
echo 'whisperapi_enable="YES"' >> /etc/rc.conf
service whisperapi start

Systemd Service (Linux)

Create a systemd service file at /etc/systemd/system/whisperapi.service:

[Unit]
Description=Whisper API Service
After=network.target
StartLimitIntervalSec=0

[Service]
Type=simple
User=www-data
Group=www-data
Restart=always
RestartSec=1
WorkingDirectory=/opt/whisperapi
ExecStart=/opt/whisperapi/whisperAPI

# Security settings
PrivateTmp=true
NoNewPrivileges=true
ProtectSystem=full
ProtectHome=true
CapabilityBoundingSet=
AmbientCapabilities=

# Resource limits
LimitNOFILE=65535
MemoryMax=2G
CPUQuota=80%

[Install]
WantedBy=multi-user.target

Install and start the service:

# Copy application to /opt/whisperapi
sudo mkdir -p /opt/whisperapi
sudo cp whisperAPI /opt/whisperapi/
sudo cp -r models /opt/whisperapi/

# Set permissions
sudo chown -R www-data:www-data /opt/whisperapi

# Enable and start service
sudo systemctl daemon-reload
sudo systemctl enable whisperapi
sudo systemctl start whisperapi

# View logs
sudo journalctl -u whisperapi -f

Monitor service status:

sudo systemctl status whisperapi

Common systemctl commands:

sudo systemctl stop whisperapi
sudo systemctl restart whisperapi
sudo systemctl disable whisperapi

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
audio
auth
config
docs Package docs Code generated by swaggo/swag.	Package docs Code generated by swaggo/swag.
metrics
middleware

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL