stt

package

v1.4.5 Latest Latest Go to latest Published: Apr 15, 2026 License: Apache-2.0 Imports: 12 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AltairaLabs/PromptKit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package stt provides speech-to-text services for converting audio to text.

The package defines a common Service interface that abstracts STT providers, enabling voice AI applications to transcribe speech from users.

Architecture ¶

The package provides:

Service interface for STT providers
TranscriptionConfig for audio format configuration
Multiple provider implementations (OpenAI Whisper, etc.)

Usage ¶

Basic usage with OpenAI Whisper:

service := stt.NewOpenAI(os.Getenv("OPENAI_API_KEY"))
text, err := service.Transcribe(ctx, audioData, stt.TranscriptionConfig{
    Format:     "pcm",
    SampleRate: 16000,
    Channels:   1,
    Language:   "en",
})
if err != nil {
    log.Fatal(err)
}
fmt.Println("User said:", text)

Available Providers ¶

The package includes implementations for:

OpenAI Whisper (whisper-1 model)
More providers can be added following the Service interface

Index ¶

Constants
Variables
func APIKeyFromCredential(c credentials.Credential) string
func RegisterFactory(providerType string, factory Factory)
func ResolveCredential(ctx context.Context, providerType string, cfgDir string, ...) (credentials.Credential, error)
func TranscribeWithRetry(ctx context.Context, svc Service, audio []byte, config TranscriptionConfig, ...) (string, error)
func WrapPCMAsWAV(pcmData []byte, sampleRate, channels, bitsPerSample int) []byte
type Factory
type OpenAIOption
- func WithOpenAIBaseURL(url string) OpenAIOption
- func WithOpenAIClient(client *http.Client) OpenAIOption
- func WithOpenAIModel(model string) OpenAIOption
type OpenAIService
- func NewOpenAI(apiKey string, opts ...OpenAIOption) *OpenAIService
- func (s *OpenAIService) Name() string
- func (s *OpenAIService) SupportedFormats() []string
- func (s *OpenAIService) Transcribe(ctx context.Context, audio []byte, config TranscriptionConfig) (string, error)
type ProviderSpec
type RetryConfig
- func DefaultRetryConfig() RetryConfig
type Service
- func CreateFromSpec(spec ProviderSpec) (Service, error)
type TranscriptionConfig
- func DefaultTranscriptionConfig() TranscriptionConfig
type TranscriptionError
- func NewTranscriptionError(provider, code, message string, cause error, retryable bool) *TranscriptionError
- func (e *TranscriptionError) Error() string
- func (e *TranscriptionError) Is(target error) bool
- func (e *TranscriptionError) Unwrap() error

Constants ¶

View Source

const (
	// Default audio settings.
	DefaultSampleRate = 16000
	DefaultChannels   = 1
	DefaultBitDepth   = 16

	// Common audio formats.
	FormatPCM = "pcm"
	FormatWAV = "wav"
	FormatMP3 = "mp3"
)

View Source

const (

	// ModelWhisper1 is the OpenAI Whisper model for transcription.
	ModelWhisper1 = "whisper-1"
)

Variables ¶

View Source

var (
	// ErrEmptyAudio is returned when audio data is empty.
	ErrEmptyAudio = errors.New("audio data is empty")

	// ErrRateLimited is returned when the provider rate limits requests.
	ErrRateLimited = errors.New("rate limited by provider")

	// ErrInvalidFormat is returned when the audio format is not supported.
	ErrInvalidFormat = errors.New("unsupported audio format")

	// ErrAudioTooShort is returned when audio is too short to transcribe.
	ErrAudioTooShort = errors.New("audio too short to transcribe")
)

Common errors for STT services.

Functions ¶

func APIKeyFromCredential ¶ added in v1.4.5

func APIKeyFromCredential(c credentials.Credential) string

APIKeyFromCredential returns the raw API key from an APIKey credential, or "" for any other credential shape (or nil).

func RegisterFactory ¶ added in v1.4.5

func RegisterFactory(providerType string, factory Factory)

RegisterFactory registers a factory for the given provider type. Typically called from per-provider package init().

func ResolveCredential ¶ added in v1.4.5

func ResolveCredential(ctx context.Context, providerType string,
	cfgDir string, cred *credentials.CredentialConfig,
) (credentials.Credential, error)

ResolveCredential resolves an STT provider's credential block into a concrete Credential, applying the same fallback chain as chat providers.

func TranscribeWithRetry ¶ added in v1.4.2

func TranscribeWithRetry(
	ctx context.Context,
	svc Service,
	audio []byte,
	config TranscriptionConfig,
	retry RetryConfig,
) (string, error)

TranscribeWithRetry calls svc.Transcribe with bounded retry on transient errors. Only errors where TranscriptionError.Retryable is true are retried; all others are returned immediately. Uses full jitter backoff to avoid synchronized retries across concurrent callers.

func WrapPCMAsWAV ¶

func WrapPCMAsWAV(pcmData []byte, sampleRate, channels, bitsPerSample int) []byte

WrapPCMAsWAV wraps raw PCM audio data in a WAV header. This is necessary for APIs like OpenAI Whisper that expect file uploads.

Parameters:

pcmData: Raw PCM audio bytes (little-endian, signed)
sampleRate: Sample rate in Hz (e.g., 16000)
channels: Number of channels (1=mono, 2=stereo)
bitsPerSample: Bits per sample (typically 16)

Returns a byte slice containing WAV-formatted audio.

Types ¶

type Factory ¶ added in v1.4.5

type Factory func(spec ProviderSpec) (Service, error)

Factory builds a Service from a spec.

type OpenAIOption ¶

type OpenAIOption func(*OpenAIService)

OpenAIOption configures the OpenAI STT service.

func WithOpenAIBaseURL ¶

func WithOpenAIBaseURL(url string) OpenAIOption

WithOpenAIBaseURL sets a custom base URL (for testing or proxies).

func WithOpenAIClient ¶

func WithOpenAIClient(client *http.Client) OpenAIOption

WithOpenAIClient sets a custom HTTP client.

func WithOpenAIModel ¶

func WithOpenAIModel(model string) OpenAIOption

WithOpenAIModel sets the STT model to use.

type OpenAIService ¶

type OpenAIService struct {
	// contains filtered or unexported fields
}

OpenAIService implements STT using OpenAI's Whisper API.

func NewOpenAI ¶

func NewOpenAI(apiKey string, opts ...OpenAIOption) *OpenAIService

NewOpenAI creates an OpenAI STT service using Whisper.

func (*OpenAIService) Name ¶

func (s *OpenAIService) Name() string

Name returns the provider identifier.

func (*OpenAIService) SupportedFormats ¶

func (s *OpenAIService) SupportedFormats() []string

SupportedFormats returns audio formats supported by OpenAI Whisper.

func (*OpenAIService) Transcribe ¶

func (s *OpenAIService) Transcribe(
	ctx context.Context, audio []byte, config TranscriptionConfig,
) (string, error)

Transcribe converts audio to text using OpenAI's Whisper API.

type ProviderSpec ¶ added in v1.4.5

type ProviderSpec struct {
	// ID is a stable identifier; informational only at this layer.
	ID string
	// Type selects the implementation: openai (only one today).
	Type string
	// Model overrides the provider's default transcription model.
	Model string
	// BaseURL overrides the provider's default API endpoint.
	BaseURL string
	// Credential carries the resolved API key.
	Credential credentials.Credential
	// AdditionalConfig carries provider-specific extras. Unknown keys
	// are ignored.
	AdditionalConfig map[string]any
}

ProviderSpec is the runtime form of an STT-provider declaration, used by CreateFromSpec to construct a Service implementation. The SDK's runtime-config layer translates pkg/config.STTProviderConfig into this struct after resolving credentials.

type RetryConfig ¶ added in v1.4.2

type RetryConfig struct {
	// MaxAttempts is the total number of attempts including the initial
	// call. 3 means "initial + up to 2 retries". Values < 1 are
	// treated as 1 (no retry).
	MaxAttempts int
	// InitialDelay is the base backoff before the first retry.
	InitialDelay time.Duration
	// MaxDelay caps the per-attempt backoff.
	MaxDelay time.Duration
}

RetryConfig configures bounded retry for STT transcription calls. Defaults are on (unlike streaming retry) because STT calls are one-shot and idempotent — retry has no content-duplication risk, and the alternative is silently dropped speech.

func DefaultRetryConfig ¶ added in v1.4.2

func DefaultRetryConfig() RetryConfig

DefaultRetryConfig returns sensible defaults for STT retry.

type Service ¶

type Service interface {
	// Name returns the provider identifier (for logging/debugging).
	Name() string

	// Transcribe converts audio to text.
	// Returns the transcribed text or an error if transcription fails.
	Transcribe(ctx context.Context, audio []byte, config TranscriptionConfig) (string, error)

	// SupportedFormats returns supported audio input formats.
	// Common values: "pcm", "wav", "mp3", "m4a", "webm"
	SupportedFormats() []string
}

Service transcribes audio to text. This interface abstracts different STT providers (OpenAI Whisper, Google, etc.) enabling voice AI applications to use any provider interchangeably.

func CreateFromSpec ¶ added in v1.4.5

func CreateFromSpec(spec ProviderSpec) (Service, error)

CreateFromSpec returns a Service implementation for the given spec.

type TranscriptionConfig ¶

type TranscriptionConfig struct {
	// Format is the audio format ("pcm", "wav", "mp3").
	// Default: "pcm"
	Format string

	// SampleRate is the audio sample rate in Hz.
	// Default: 16000
	SampleRate int

	// Channels is the number of audio channels (1=mono, 2=stereo).
	// Default: 1
	Channels int

	// BitDepth is the bits per sample for PCM audio.
	// Default: 16
	BitDepth int

	// Language is a hint for the transcription language (e.g., "en", "es").
	// Optional - improves accuracy if provided.
	Language string

	// Model is the STT model to use (provider-specific).
	// For OpenAI: "whisper-1"
	Model string

	// Prompt is a text prompt to guide transcription (provider-specific).
	// Can improve accuracy for domain-specific vocabulary.
	Prompt string
}

TranscriptionConfig configures speech-to-text transcription.

func DefaultTranscriptionConfig ¶

func DefaultTranscriptionConfig() TranscriptionConfig

DefaultTranscriptionConfig returns sensible defaults for transcription.

type TranscriptionError ¶

type TranscriptionError struct {
	// Provider is the STT provider name.
	Provider string

	// Code is the provider-specific error code.
	Code string

	// Message is a human-readable error message.
	Message string

	// Cause is the underlying error, if any.
	Cause error

	// Retryable indicates whether the request can be retried.
	Retryable bool
}

TranscriptionError represents an error during transcription.

func NewTranscriptionError ¶

func NewTranscriptionError(provider, code, message string, cause error, retryable bool) *TranscriptionError

NewTranscriptionError creates a new TranscriptionError.

func (*TranscriptionError) Error ¶

func (e *TranscriptionError) Error() string

Error implements the error interface.

func (*TranscriptionError) Is ¶

func (e *TranscriptionError) Is(target error) bool

Is implements error matching for errors.Is.

func (*TranscriptionError) Unwrap ¶

func (e *TranscriptionError) Unwrap() error

Unwrap returns the underlying error.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL