stt

package
v1.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 23, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package stt provides speech-to-text services for converting audio to text.

The package defines a common Service interface that abstracts STT providers, enabling voice AI applications to transcribe speech from users.

Architecture

The package provides:

  • Service interface for STT providers
  • TranscriptionConfig for audio format configuration
  • Multiple provider implementations (OpenAI Whisper, etc.)

Usage

Basic usage with OpenAI Whisper:

service := stt.NewOpenAI(os.Getenv("OPENAI_API_KEY"))
text, err := service.Transcribe(ctx, audioData, stt.TranscriptionConfig{
    Format:     "pcm",
    SampleRate: 16000,
    Channels:   1,
    Language:   "en",
})
if err != nil {
    log.Fatal(err)
}
fmt.Println("User said:", text)

Available Providers

The package includes implementations for:

  • OpenAI Whisper (whisper-1 model)
  • More providers can be added following the Service interface

Index

Constants

View Source
const (
	// Default audio settings.
	DefaultSampleRate = 16000
	DefaultChannels   = 1
	DefaultBitDepth   = 16

	// Common audio formats.
	FormatPCM = "pcm"
	FormatWAV = "wav"
	FormatMP3 = "mp3"
)
View Source
const (

	// ModelWhisper1 is the OpenAI Whisper model for transcription.
	ModelWhisper1 = "whisper-1"
)

Variables

View Source
var (
	// ErrEmptyAudio is returned when audio data is empty.
	ErrEmptyAudio = errors.New("audio data is empty")

	// ErrRateLimited is returned when the provider rate limits requests.
	ErrRateLimited = errors.New("rate limited by provider")

	// ErrInvalidFormat is returned when the audio format is not supported.
	ErrInvalidFormat = errors.New("unsupported audio format")

	// ErrAudioTooShort is returned when audio is too short to transcribe.
	ErrAudioTooShort = errors.New("audio too short to transcribe")
)

Common errors for STT services.

Functions

func WrapPCMAsWAV

func WrapPCMAsWAV(pcmData []byte, sampleRate, channels, bitsPerSample int) []byte

WrapPCMAsWAV wraps raw PCM audio data in a WAV header. This is necessary for APIs like OpenAI Whisper that expect file uploads.

Parameters:

  • pcmData: Raw PCM audio bytes (little-endian, signed)
  • sampleRate: Sample rate in Hz (e.g., 16000)
  • channels: Number of channels (1=mono, 2=stereo)
  • bitsPerSample: Bits per sample (typically 16)

Returns a byte slice containing WAV-formatted audio.

Types

type OpenAIOption

type OpenAIOption func(*OpenAIService)

OpenAIOption configures the OpenAI STT service.

func WithOpenAIBaseURL

func WithOpenAIBaseURL(url string) OpenAIOption

WithOpenAIBaseURL sets a custom base URL (for testing or proxies).

func WithOpenAIClient

func WithOpenAIClient(client *http.Client) OpenAIOption

WithOpenAIClient sets a custom HTTP client.

func WithOpenAIModel

func WithOpenAIModel(model string) OpenAIOption

WithOpenAIModel sets the STT model to use.

type OpenAIService

type OpenAIService struct {
	// contains filtered or unexported fields
}

OpenAIService implements STT using OpenAI's Whisper API.

func NewOpenAI

func NewOpenAI(apiKey string, opts ...OpenAIOption) *OpenAIService

NewOpenAI creates an OpenAI STT service using Whisper.

func (*OpenAIService) Name

func (s *OpenAIService) Name() string

Name returns the provider identifier.

func (*OpenAIService) SupportedFormats

func (s *OpenAIService) SupportedFormats() []string

SupportedFormats returns audio formats supported by OpenAI Whisper.

func (*OpenAIService) Transcribe

func (s *OpenAIService) Transcribe(
	ctx context.Context, audio []byte, config TranscriptionConfig,
) (string, error)

Transcribe converts audio to text using OpenAI's Whisper API.

type Service

type Service interface {
	// Name returns the provider identifier (for logging/debugging).
	Name() string

	// Transcribe converts audio to text.
	// Returns the transcribed text or an error if transcription fails.
	Transcribe(ctx context.Context, audio []byte, config TranscriptionConfig) (string, error)

	// SupportedFormats returns supported audio input formats.
	// Common values: "pcm", "wav", "mp3", "m4a", "webm"
	SupportedFormats() []string
}

Service transcribes audio to text. This interface abstracts different STT providers (OpenAI Whisper, Google, etc.) enabling voice AI applications to use any provider interchangeably.

type TranscriptionConfig

type TranscriptionConfig struct {
	// Format is the audio format ("pcm", "wav", "mp3").
	// Default: "pcm"
	Format string

	// SampleRate is the audio sample rate in Hz.
	// Default: 16000
	SampleRate int

	// Channels is the number of audio channels (1=mono, 2=stereo).
	// Default: 1
	Channels int

	// BitDepth is the bits per sample for PCM audio.
	// Default: 16
	BitDepth int

	// Language is a hint for the transcription language (e.g., "en", "es").
	// Optional - improves accuracy if provided.
	Language string

	// Model is the STT model to use (provider-specific).
	// For OpenAI: "whisper-1"
	Model string

	// Prompt is a text prompt to guide transcription (provider-specific).
	// Can improve accuracy for domain-specific vocabulary.
	Prompt string
}

TranscriptionConfig configures speech-to-text transcription.

func DefaultTranscriptionConfig

func DefaultTranscriptionConfig() TranscriptionConfig

DefaultTranscriptionConfig returns sensible defaults for transcription.

type TranscriptionError

type TranscriptionError struct {
	// Provider is the STT provider name.
	Provider string

	// Code is the provider-specific error code.
	Code string

	// Message is a human-readable error message.
	Message string

	// Cause is the underlying error, if any.
	Cause error

	// Retryable indicates whether the request can be retried.
	Retryable bool
}

TranscriptionError represents an error during transcription.

func NewTranscriptionError

func NewTranscriptionError(provider, code, message string, cause error, retryable bool) *TranscriptionError

NewTranscriptionError creates a new TranscriptionError.

func (*TranscriptionError) Error

func (e *TranscriptionError) Error() string

Error implements the error interface.

func (*TranscriptionError) Is

func (e *TranscriptionError) Is(target error) bool

Is implements error matching for errors.Is.

func (*TranscriptionError) Unwrap

func (e *TranscriptionError) Unwrap() error

Unwrap returns the underlying error.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL