transcript

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2026 License: GPL-3.0 Imports: 5 Imported by: 0

Documentation

Overview

Package transcript defines the transcript correction pipeline used by Glyphoxa to fix STT errors in domain-specific vocabulary.

Raw speech-to-text output is rarely perfect for fantasy proper nouns — NPC names, location names, faction names, and spell names are frequently misheard. The Pipeline applies a two-stage correction strategy:

  1. Phonetic matching (PhoneticMatcher): fast, dictionary-free alignment based on pronunciation similarity (e.g., Soundex, Metaphone, or edit distance on phoneme sequences). Runs in-process with no network calls.

  2. LLM-assisted correction: a language model resolves ambiguous or low-confidence phonetic candidates using session context and the full entity list. Falls back to the phonetic suggestion when confidence is sufficient, or leaves the original word unchanged.

Each Correction records which method produced the substitution and its confidence, so callers can audit, display, or selectively roll back changes.

Implementations of both interfaces must be safe for concurrent use.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CorrectedTranscript

type CorrectedTranscript struct {
	// Original is the raw [stt.Transcript] as received from the STT provider.
	Original stt.Transcript

	// Corrected is the full corrected transcript text with all substitutions
	// applied. Suitable for downstream processing (memory storage, LLM context).
	Corrected string

	// Corrections is the ordered list of word-level substitutions applied to
	// produce Corrected. An empty (non-nil) slice means no corrections were
	// necessary.
	Corrections []Correction
}

CorrectedTranscript is the output of a [Pipeline.Correct] call. It pairs the original stt.Transcript with the fully corrected text and an itemised record of every substitution that was applied.

type Correction

type Correction struct {
	// Original is the word as produced by the STT provider.
	Original string

	// Corrected is the replacement selected by the pipeline.
	Corrected string

	// Confidence is the pipeline's confidence in this substitution (0.0–1.0).
	// Values above 0.9 are considered high-confidence; values below 0.5
	// indicate the correction is speculative.
	Confidence float64

	// Method describes which correction stage produced this substitution.
	// Well-known values:
	//   "phonetic" — produced by a [PhoneticMatcher].
	//   "llm"      — produced by a language-model correction pass.
	Method string
}

Correction captures a single word-level substitution made by the pipeline.

type CorrectionPipeline

type CorrectionPipeline struct {
	// contains filtered or unexported fields
}

CorrectionPipeline is the two-stage transcript correction implementation of Pipeline. Stages are optional and are applied in order:

  1. PhoneticMatcher — fast, in-process phonetic entity alignment.
  2. llmcorrect.Corrector — LLM-assisted correction for low-confidence spans.

CorrectionPipeline is safe for concurrent use.

func NewPipeline

func NewPipeline(opts ...PipelineOption) *CorrectionPipeline

NewPipeline constructs a CorrectionPipeline with the supplied options. By default both stages are disabled (nil); use WithPhoneticMatcher and WithLLMCorrector to activate them.

func (*CorrectionPipeline) Correct

func (p *CorrectionPipeline) Correct(
	ctx context.Context,
	t stt.Transcript,
	entities []string,
) (*CorrectedTranscript, error)

Correct applies the configured correction stages to transcript and returns a CorrectedTranscript.

Pipeline flow:

  1. The transcript text is tokenised into whitespace-separated word tokens.
  2. When a PhoneticMatcher is configured, every single-word token is tested against the entity list. Additionally, n-gram windows (up to the maximum entity word count) are tested to match multi-word entities.
  3. Words that carry a stt.WordDetail confidence score below the LLM threshold AND were not corrected by the phonetic stage are collected as low-confidence spans.
  4. When an llmcorrect.Corrector is configured and at least one low-confidence span exists (or no per-word confidence data is available), the LLM corrector is invoked on the phonetic-corrected text.
  5. Phonetic and LLM corrections are merged into the final CorrectedTranscript.

Context cancellation is respected: if ctx is Done before the LLM stage completes, an error is returned.

type PhoneticMatcher

type PhoneticMatcher interface {
	// Match attempts to find the entity name from entities that is most
	// phonetically similar to word.
	//
	// Return values:
	//   corrected  — the best-matching entity name from entities.
	//   confidence — similarity score in [0.0, 1.0] where 1.0 is a perfect match.
	//   matched    — true when a sufficiently similar entity was found.
	//
	// When matched is false, corrected must equal word unchanged and confidence
	// must be 0. Implementations define their own similarity threshold for
	// deciding when a match is "sufficient".
	Match(word string, entities []string) (corrected string, confidence float64, matched bool)
}

PhoneticMatcher resolves a single word to a known entity name based on pronunciation similarity. It is the first stage of the correction pipeline and is designed to be fast enough for real-time use — no network calls, no LLM round-trips.

Implementations must be safe for concurrent use.

type Pipeline

type Pipeline interface {
	// Correct processes transcript using the provided entity name list and
	// returns a [CorrectedTranscript] containing the corrected text and an
	// itemised record of every substitution made.
	//
	// entities is the list of known entity names the pipeline should recognise
	// within the transcript text. It may include NPC names, location names,
	// item names, faction names, and other session-specific proper nouns.
	//
	// Returns a non-nil *CorrectedTranscript on success.
	// When no corrections are needed, Corrected equals transcript.Text and
	// Corrections is an empty (non-nil) slice.
	Correct(ctx context.Context, transcript stt.Transcript, entities []string) (*CorrectedTranscript, error)
}

Pipeline applies multi-stage corrections to a raw stt.Transcript, resolving STT errors for domain-specific vocabulary.

Implementations must be safe for concurrent use.

type PipelineOption

type PipelineOption func(*CorrectionPipeline)

PipelineOption is a functional option for configuring a CorrectionPipeline.

func WithLLMCorrector

func WithLLMCorrector(c *llmcorrect.Corrector) PipelineOption

WithLLMCorrector attaches an llmcorrect.Corrector as the second correction stage. When nil (the default), the LLM stage is skipped entirely.

func WithLLMOnLowConfidence

func WithLLMOnLowConfidence(threshold float64) PipelineOption

WithLLMOnLowConfidence sets the STT word-confidence threshold below which a word is flagged as a low-confidence span and passed to the LLM corrector (when one is configured). Default: 0.5.

Words with stt.WordDetail.Confidence below this value that were NOT already corrected by the phonetic stage are submitted to the LLM for review. Words without any confidence data (i.e., the transcript has no Words slice) are always submitted when the LLM corrector is configured.

func WithMinWordsForLLM added in v0.0.5

func WithMinWordsForLLM(n int) PipelineOption

WithMinWordsForLLM sets the minimum number of words a transcript must contain before the LLM correction stage is attempted. Transcripts with fewer words than n are passed through without LLM correction because there is too little context for meaningful entity-name correction. Default: 8.

func WithPhoneticMatcher

func WithPhoneticMatcher(m PhoneticMatcher) PipelineOption

WithPhoneticMatcher attaches a PhoneticMatcher as the first correction stage. When nil (the default), the phonetic stage is skipped entirely.

Directories

Path Synopsis
Package llmcorrect implements a language-model-based transcript correction stage that resolves entity misspellings not caught by the phonetic matcher.
Package llmcorrect implements a language-model-based transcript correction stage that resolves entity misspellings not caught by the phonetic matcher.
Package phonetic implements the [transcript.PhoneticMatcher] interface using Double Metaphone phonetic encoding combined with Jaro-Winkler string similarity for ranked candidate selection.
Package phonetic implements the [transcript.PhoneticMatcher] interface using Double Metaphone phonetic encoding combined with Jaro-Winkler string similarity for ranked candidate selection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL