stt

package

v1.1.6 Latest Latest Go to latest Published: Dec 23, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AltairaLabs/PromptKit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package stt provides speech-to-text services for converting audio to text.

The package defines a common Service interface that abstracts STT providers, enabling voice AI applications to transcribe speech from users.

Architecture ¶

The package provides:

Service interface for STT providers
TranscriptionConfig for audio format configuration
Multiple provider implementations (OpenAI Whisper, etc.)

Usage ¶

Basic usage with OpenAI Whisper:

service := stt.NewOpenAI(os.Getenv("OPENAI_API_KEY"))
text, err := service.Transcribe(ctx, audioData, stt.TranscriptionConfig{
    Format:     "pcm",
    SampleRate: 16000,
    Channels:   1,
    Language:   "en",
})
if err != nil {
    log.Fatal(err)
}
fmt.Println("User said:", text)

Available Providers ¶

The package includes implementations for:

OpenAI Whisper (whisper-1 model)
More providers can be added following the Service interface

Index ¶

Constants
Variables
func WrapPCMAsWAV(pcmData []byte, sampleRate, channels, bitsPerSample int) []byte
type OpenAIOption
type OpenAIService
- func NewOpenAI(apiKey string, opts ...OpenAIOption) *OpenAIService
type Service
type TranscriptionConfig
- func DefaultTranscriptionConfig() TranscriptionConfig
type TranscriptionError
- func NewTranscriptionError(provider, code, message string, cause error, retryable bool) *TranscriptionError

Constants ¶

View Source

const (
	// Default audio settings.
	DefaultSampleRate = 16000
	DefaultChannels   = 1
	DefaultBitDepth   = 16

	// Common audio formats.
	FormatPCM = "pcm"
	FormatWAV = "wav"
	FormatMP3 = "mp3"
)

View Source

const (

	// ModelWhisper1 is the OpenAI Whisper model for transcription.
	ModelWhisper1 = "whisper-1"
)

Variables ¶

View Source

var (
	// ErrEmptyAudio is returned when audio data is empty.
	ErrEmptyAudio = errors.New("audio data is empty")

	// ErrRateLimited is returned when the provider rate limits requests.
	ErrRateLimited = errors.New("rate limited by provider")

	// ErrInvalidFormat is returned when the audio format is not supported.
	ErrInvalidFormat = errors.New("unsupported audio format")

	// ErrAudioTooShort is returned when audio is too short to transcribe.
	ErrAudioTooShort = errors.New("audio too short to transcribe")
)

Common errors for STT services.

Functions ¶

func WrapPCMAsWAV ¶

func WrapPCMAsWAV(pcmData []byte, sampleRate, channels, bitsPerSample int) []byte

WrapPCMAsWAV wraps raw PCM audio data in a WAV header. This is necessary for APIs like OpenAI Whisper that expect file uploads.

Parameters:

pcmData: Raw PCM audio bytes (little-endian, signed)
sampleRate: Sample rate in Hz (e.g., 16000)
channels: Number of channels (1=mono, 2=stereo)
bitsPerSample: Bits per sample (typically 16)

Returns a byte slice containing WAV-formatted audio.

Types ¶

type OpenAIOption ¶

type OpenAIOption func(*OpenAIService)

OpenAIOption configures the OpenAI STT service.

func WithOpenAIBaseURL ¶

func WithOpenAIBaseURL(url string) OpenAIOption

WithOpenAIBaseURL sets a custom base URL (for testing or proxies).

func WithOpenAIClient ¶

func WithOpenAIClient(client *http.Client) OpenAIOption

WithOpenAIClient sets a custom HTTP client.

func WithOpenAIModel ¶

func WithOpenAIModel(model string) OpenAIOption

WithOpenAIModel sets the STT model to use.

type OpenAIService ¶

type OpenAIService struct {
	// contains filtered or unexported fields
}

OpenAIService implements STT using OpenAI's Whisper API.

func NewOpenAI ¶

func NewOpenAI(apiKey string, opts ...OpenAIOption) *OpenAIService

NewOpenAI creates an OpenAI STT service using Whisper.

func (*OpenAIService) Name ¶

func (s *OpenAIService) Name() string

Name returns the provider identifier.

func (*OpenAIService) SupportedFormats ¶

func (s *OpenAIService) SupportedFormats() []string

SupportedFormats returns audio formats supported by OpenAI Whisper.

func (*OpenAIService) Transcribe ¶

func (s *OpenAIService) Transcribe(
	ctx context.Context, audio []byte, config TranscriptionConfig,
) (string, error)

Transcribe converts audio to text using OpenAI's Whisper API.

type Service ¶

type Service interface {
	// Name returns the provider identifier (for logging/debugging).
	Name() string

	// Transcribe converts audio to text.
	// Returns the transcribed text or an error if transcription fails.
	Transcribe(ctx context.Context, audio []byte, config TranscriptionConfig) (string, error)

	// SupportedFormats returns supported audio input formats.
	// Common values: "pcm", "wav", "mp3", "m4a", "webm"
	SupportedFormats() []string
}

Service transcribes audio to text. This interface abstracts different STT providers (OpenAI Whisper, Google, etc.) enabling voice AI applications to use any provider interchangeably.

type TranscriptionConfig ¶

type TranscriptionConfig struct {
	// Format is the audio format ("pcm", "wav", "mp3").
	// Default: "pcm"
	Format string

	// SampleRate is the audio sample rate in Hz.
	// Default: 16000
	SampleRate int

	// Channels is the number of audio channels (1=mono, 2=stereo).
	// Default: 1
	Channels int

	// BitDepth is the bits per sample for PCM audio.
	// Default: 16
	BitDepth int

	// Language is a hint for the transcription language (e.g., "en", "es").
	// Optional - improves accuracy if provided.
	Language string

	// Model is the STT model to use (provider-specific).
	// For OpenAI: "whisper-1"
	Model string

	// Prompt is a text prompt to guide transcription (provider-specific).
	// Can improve accuracy for domain-specific vocabulary.
	Prompt string
}

TranscriptionConfig configures speech-to-text transcription.

func DefaultTranscriptionConfig ¶

func DefaultTranscriptionConfig() TranscriptionConfig

DefaultTranscriptionConfig returns sensible defaults for transcription.

type TranscriptionError ¶

type TranscriptionError struct {
	// Provider is the STT provider name.
	Provider string

	// Code is the provider-specific error code.
	Code string

	// Message is a human-readable error message.
	Message string

	// Cause is the underlying error, if any.
	Cause error

	// Retryable indicates whether the request can be retried.
	Retryable bool
}

TranscriptionError represents an error during transcription.

func NewTranscriptionError ¶

func NewTranscriptionError(provider, code, message string, cause error, retryable bool) *TranscriptionError

NewTranscriptionError creates a new TranscriptionError.

func (*TranscriptionError) Error ¶

func (e *TranscriptionError) Error() string

Error implements the error interface.

func (*TranscriptionError) Is ¶

func (e *TranscriptionError) Is(target error) bool

Is implements error matching for errors.Is.

func (*TranscriptionError) Unwrap ¶

func (e *TranscriptionError) Unwrap() error

Unwrap returns the underlying error.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL