ttsscript

package

v0.8.1 Latest Latest Go to latest Published: Feb 15, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/agentplexus/go-elevenlabs

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ttsscript provides a structured format for authoring multilingual TTS (Text-to-Speech) scripts that can be compiled to various output formats.

This package is engine-agnostic and can be used with any TTS provider including ElevenLabs, Google Cloud TTS, Amazon Polly, Azure TTS, and others.

Why Use ttsscript? ¶

Instead of storing raw SSML (which is engine-specific and hard to edit), store your scripts in a structured JSON format that:

Supports multiple languages in a single file
Handles pronunciations/acronyms separately from content
Can be compiled to any TTS engine format
Is easy to edit and version control

Basic Usage ¶

Create a script JSON file:

{
  "title": "My Course",
  "default_voices": {"en": "voice-id", "es": "voice-id-2"},
  "pronunciations": {
    "API": {"en": "A P I", "es": "A P I"},
    "SDK": {"en": "S D K", "es": "S D K"}
  },
  "slides": [
    {
      "title": "Introduction",
      "segments": [
        {
          "text": {"en": "Welcome to the API course", "es": "Bienvenidos al curso de API"},
          "pause_after": "500ms"
        }
      ]
    }
  ]
}

Load and compile for ElevenLabs:

script, _ := ttsscript.LoadScript("script.json")
compiler := ttsscript.NewCompiler()
segments, _ := compiler.Compile(script, "en")

formatter := ttsscript.NewElevenLabsFormatter()
jobs := formatter.Format(segments)

for _, job := range jobs {
    // Generate TTS for each segment
    audio, _ := client.TextToSpeech().Simple(ctx, job.VoiceID, job.Text)
    // Save with pause information for post-processing
}

Compile to SSML for Google TTS:

formatter := ttsscript.NewSSMLFormatter()
ssml, _ := formatter.FormatScript(script, "en")
// Use ssml with Google Cloud TTS API

Script Structure ¶

A Script contains:

Metadata (title, description, default language)
Default voices per language
Global pronunciations
Slides/sections containing segments

Each Segment contains:

Text in multiple languages
Voice overrides per language
Pause before/after
Prosody settings (rate, pitch, emphasis)
Segment-specific pronunciations

Compilation Process ¶

1. Load the script from JSON 2. Create a Compiler and optionally add additional pronunciations 3. Compile for a specific language to get CompiledSegments 4. Format the segments for your target TTS engine

Formatters ¶

SSMLFormatter: Outputs W3C SSML compatible with Google, Amazon, Azure ElevenLabsFormatter: Outputs segments ready for ElevenLabs TTS API

Pronunciation Handling ¶

Pronunciations are applied at compile time with this priority: 1. Compiler-level (added via AddPronunciation) 2. Segment-level (in segment.pronunciations) 3. Script-level (in script.pronunciations)

This allows overrides at any level. Terms are matched case-insensitively with word boundaries.

Package ttsscript provides a structured format for authoring multilingual TTS scripts that can be compiled to various output formats (SSML, ElevenLabs, etc.).

This package is engine-agnostic and can be used with any TTS provider.

Index ¶

func CombineText(segments []CompiledSegment) string
func EscapeSSML(s string) string
func FormatDuration(ms int) string
func GroupBySlide(segments []CompiledSegment) map[int][]CompiledSegment
func GroupByVoice(segments []CompiledSegment) map[string][]CompiledSegment
func ParseDuration(s string) int
func SSMLBreak(duration string) string
func SSMLEmphasis(text, level string) string
func SSMLPhoneme(text, alphabet, ph string) string
func SSMLProsody(text, rate, pitch, volume string) string
func SSMLSayAs(text, interpretAs, format string) string
func SSMLSub(text, alias string) string
type BatchConfig
- func NewBatchConfig(outputDir string) *BatchConfig
- func (c *BatchConfig) GenerateFilename(seg ElevenLabsSegment, language string) string
type CompiledSegment
type Compiler
- func NewCompiler() *Compiler
- func (c *Compiler) AddPronunciation(term, language, replacement string)
- func (c *Compiler) AddPronunciations(language string, rules map[string]string)
- func (c *Compiler) Compile(script *Script, language string) ([]CompiledSegment, error)
type ElevenLabsFormatter
- func NewElevenLabsFormatter() *ElevenLabsFormatter
- func (f *ElevenLabsFormatter) CombineForSingleRequest(segments []ElevenLabsSegment) string
- func (f *ElevenLabsFormatter) Format(segments []CompiledSegment) []ElevenLabsSegment
- func (f *ElevenLabsFormatter) FormatScript(script *Script, language string) ([]ElevenLabsSegment, error)
- func (f *ElevenLabsFormatter) GroupByVoice(segments []ElevenLabsSegment) map[string][]ElevenLabsSegment
type ElevenLabsSegment
type ManifestEntry
- func GenerateManifest(segments []ElevenLabsSegment, config *BatchConfig, language string) []ManifestEntry
type SSMLFormatter
- func NewSSMLFormatter() *SSMLFormatter
- func (f *SSMLFormatter) Format(segments []CompiledSegment, language string) string
- func (f *SSMLFormatter) FormatScript(script *Script, language string) (string, error)
type Script
- func LoadScript(filePath string) (*Script, error)
- func ParseScript(data []byte) (*Script, error)
- func (s *Script) Languages() []string
- func (s *Script) Save(filePath string) error
- func (s *Script) SegmentCount() int
- func (s *Script) SlideCount() int
- func (s *Script) Validate() []string
type Segment
type Slide
- func (s *Slide) ShouldSpeakTitle() bool
type TTSRequest
- func GenerateTTSRequests(segments []ElevenLabsSegment, modelID, language string) []TTSRequest

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CombineText ¶

func CombineText(segments []CompiledSegment) string

CombineText combines all segment texts into a single string with pause markers.

func EscapeSSML ¶

func EscapeSSML(s string) string

EscapeSSML escapes special characters for SSML.

func FormatDuration ¶

func FormatDuration(ms int) string

FormatDuration formats milliseconds as a duration string.

func GroupBySlide ¶

func GroupBySlide(segments []CompiledSegment) map[int][]CompiledSegment

GroupBySlide groups compiled segments by slide index.

func GroupByVoice ¶

func GroupByVoice(segments []CompiledSegment) map[string][]CompiledSegment

GroupByVoice groups compiled segments by voice ID. Useful for batch processing with the same voice.

func ParseDuration ¶

func ParseDuration(s string) int

ParseDuration parses a duration string like "500ms" or "1s" to milliseconds.

func SSMLBreak ¶

func SSMLBreak(duration string) string

SSMLBreak generates an SSML break element.

func SSMLEmphasis ¶

func SSMLEmphasis(text, level string) string

SSMLEmphasis wraps text in emphasis tags.

func SSMLPhoneme ¶

func SSMLPhoneme(text, alphabet, ph string) string

SSMLPhoneme wraps text with phonetic pronunciation.

func SSMLProsody ¶

func SSMLProsody(text, rate, pitch, volume string) string

SSMLProsody wraps text in prosody tags.

func SSMLSayAs ¶

func SSMLSayAs(text, interpretAs, format string) string

SSMLSayAs wraps text in say-as tags for specific interpretation.

func SSMLSub ¶

func SSMLSub(text, alias string) string

SSMLSub provides an alias for a word.

Types ¶

type BatchConfig ¶

type BatchConfig struct {
	// OutputDir is the directory for output files.
	OutputDir string

	// FilePrefix is added before each filename.
	FilePrefix string

	// FileSuffix is added after each filename (before extension).
	FileSuffix string

	// IncludeLanguageInFilename adds language code to filename.
	IncludeLanguageInFilename bool
}

BatchConfig contains configuration for batch TTS processing.

func NewBatchConfig ¶

func NewBatchConfig(outputDir string) *BatchConfig

NewBatchConfig creates a batch config with defaults.

func (*BatchConfig) GenerateFilename ¶

func (c *BatchConfig) GenerateFilename(seg ElevenLabsSegment, language string) string

GenerateFilename generates an output filename for a segment.

type CompiledSegment ¶

type CompiledSegment struct {
	// SlideIndex is the 0-based slide index.
	SlideIndex int

	// SegmentIndex is the 0-based segment index within the slide.
	// For title segments, this is -1.
	SegmentIndex int

	// SlideTitle is the slide title (if any).
	SlideTitle string

	// IsTitleSegment indicates this segment was generated from a slide title.
	IsTitleSegment bool

	// IsSectionHeader indicates this segment belongs to a section header slide.
	IsSectionHeader bool

	// Text is the processed text with pronunciations applied.
	Text string

	// OriginalText is the text before pronunciation substitutions.
	OriginalText string

	// VoiceID is the voice to use for this segment.
	VoiceID string

	// Language is the language code.
	Language string

	// PauseBeforeMs is the pause before in milliseconds.
	PauseBeforeMs int

	// PauseAfterMs is the pause after in milliseconds.
	PauseAfterMs int

	// Emphasis is the emphasis level.
	Emphasis string

	// Rate is the speaking rate.
	Rate string

	// Pitch is the pitch adjustment.
	Pitch string
}

CompiledSegment represents a compiled segment ready for TTS.

type Compiler ¶

type Compiler struct {
	// AdditionalPronunciations are extra pronunciations to apply.
	AdditionalPronunciations map[string]map[string]string

	// DefaultPauseAfterSlide is the pause after each slide if not specified.
	DefaultPauseAfterSlide string

	// DefaultPauseAfterSegment is the pause after each segment if not specified.
	DefaultPauseAfterSegment string
}

Compiler compiles scripts to various output formats.

func NewCompiler ¶

func NewCompiler() *Compiler

NewCompiler creates a new script compiler with default settings.

func (*Compiler) AddPronunciation ¶

func (c *Compiler) AddPronunciation(term, language, replacement string)

AddPronunciation adds a pronunciation rule.

func (*Compiler) AddPronunciations ¶

func (c *Compiler) AddPronunciations(language string, rules map[string]string)

AddPronunciations adds multiple pronunciation rules for a language.

func (*Compiler) Compile ¶

func (c *Compiler) Compile(script *Script, language string) ([]CompiledSegment, error)

Compile compiles the script for the specified language. Returns a slice of compiled segments ready for TTS processing.

type ElevenLabsFormatter ¶

type ElevenLabsFormatter struct {
	// UsePauseMarkers includes [pause:Xms] markers in text output.
	// When false, pauses are tracked separately for post-processing.
	UsePauseMarkers bool

	// PauseMarkerFormat is the format for pause markers (default: "[pause:%s]").
	PauseMarkerFormat string
}

ElevenLabsFormatter formats compiled segments for ElevenLabs TTS.

func NewElevenLabsFormatter ¶

func NewElevenLabsFormatter() *ElevenLabsFormatter

NewElevenLabsFormatter creates a new ElevenLabs formatter.

func (*ElevenLabsFormatter) CombineForSingleRequest ¶

func (f *ElevenLabsFormatter) CombineForSingleRequest(segments []ElevenLabsSegment) string

CombineForSingleRequest combines segments into a single text block. Useful when you want to generate all audio in one API call. Note: This loses per-segment voice control.

func (*ElevenLabsFormatter) Format ¶

func (f *ElevenLabsFormatter) Format(segments []CompiledSegment) []ElevenLabsSegment

Format formats compiled segments for ElevenLabs.

func (*ElevenLabsFormatter) FormatScript ¶

func (f *ElevenLabsFormatter) FormatScript(script *Script, language string) ([]ElevenLabsSegment, error)

FormatScript compiles and formats a script for ElevenLabs.

func (*ElevenLabsFormatter) GroupByVoice ¶

func (f *ElevenLabsFormatter) GroupByVoice(segments []ElevenLabsSegment) map[string][]ElevenLabsSegment

GroupByVoice groups segments by voice ID for batch processing.

type ElevenLabsSegment ¶

type ElevenLabsSegment struct {
	// Text is the text to generate speech for.
	Text string

	// VoiceID is the ElevenLabs voice ID.
	VoiceID string

	// SlideIndex is the source slide index.
	SlideIndex int

	// SegmentIndex is the source segment index (-1 for title segments).
	SegmentIndex int

	// SlideTitle is the slide title for reference.
	SlideTitle string

	// IsTitleSegment indicates this segment was generated from a slide title.
	IsTitleSegment bool

	// IsSectionHeader indicates this segment belongs to a section header slide.
	IsSectionHeader bool

	// PauseBeforeMs is silence to add before this segment.
	PauseBeforeMs int

	// PauseAfterMs is silence to add after this segment.
	PauseAfterMs int

	// SuggestedFilename is a suggested output filename.
	SuggestedFilename string
}

ElevenLabsSegment represents a segment ready for ElevenLabs TTS.

type ManifestEntry ¶

type ManifestEntry struct {
	SlideIndex      int    `json:"slide_index"`
	SegmentIndex    int    `json:"segment_index"`
	SlideTitle      string `json:"slide_title,omitempty"`
	IsTitleSegment  bool   `json:"is_title_segment,omitempty"`
	IsSectionHeader bool   `json:"is_section_header,omitempty"`
	Text            string `json:"text"`
	VoiceID         string `json:"voice_id"`
	Language        string `json:"language"`
	OutputFile      string `json:"output_file"`
	PauseBeforeMs   int    `json:"pause_before_ms,omitempty"`
	PauseAfterMs    int    `json:"pause_after_ms,omitempty"`
}

ManifestEntry represents an entry in a generation manifest.

func GenerateManifest ¶

func GenerateManifest(segments []ElevenLabsSegment, config *BatchConfig, language string) []ManifestEntry

GenerateManifest creates a manifest of all segments for tracking.

type SSMLFormatter ¶

type SSMLFormatter struct {
	// Version is the SSML version (default: "1.1").
	Version string

	// IncludeComments includes slide title comments in output.
	IncludeComments bool

	// IndentSpaces is the number of spaces for indentation.
	IndentSpaces int
}

SSMLFormatter formats compiled segments as SSML. Compatible with Google Cloud TTS, Amazon Polly, Azure TTS, and others.

func NewSSMLFormatter ¶

func NewSSMLFormatter() *SSMLFormatter

NewSSMLFormatter creates a new SSML formatter with default settings.

func (*SSMLFormatter) Format ¶

func (f *SSMLFormatter) Format(segments []CompiledSegment, language string) string

Format formats compiled segments as SSML.

func (*SSMLFormatter) FormatScript ¶

func (f *SSMLFormatter) FormatScript(script *Script, language string) (string, error)

FormatScript compiles and formats a script as SSML.

type Script ¶

type Script struct {
	// Title is the script title.
	Title string `json:"title,omitempty"`

	// Description is an optional description.
	Description string `json:"description,omitempty"`

	// DefaultLanguage is the primary language code (e.g., "en-US").
	DefaultLanguage string `json:"default_language,omitempty"`

	// DefaultVoices maps language codes to default voice IDs.
	DefaultVoices map[string]string `json:"default_voices,omitempty"`

	// Pronunciations maps terms to their pronunciation by language.
	// Example: {"ADK": {"en": "A D K", "es": "A D K"}}
	Pronunciations map[string]map[string]string `json:"pronunciations,omitempty"`

	// Slides contains the ordered list of slides/sections.
	Slides []Slide `json:"slides"`
}

Script represents a multilingual TTS script with slides/segments. This is the canonical format for authoring TTS content that can be compiled to SSML (Google TTS, Amazon Polly) or ElevenLabs-compatible text.

func LoadScript ¶

func LoadScript(filePath string) (*Script, error)

LoadScript loads a script from a JSON file.

func ParseScript ¶

func ParseScript(data []byte) (*Script, error)

ParseScript parses a script from JSON data.

func (*Script) Languages ¶

func (s *Script) Languages() []string

Languages returns all language codes used in the script.

func (*Script) Save ¶

func (s *Script) Save(filePath string) error

Save saves a script to a JSON file.

func (*Script) SegmentCount ¶

func (s *Script) SegmentCount() int

SegmentCount returns the total number of segments across all slides.

func (*Script) SlideCount ¶

func (s *Script) SlideCount() int

SlideCount returns the number of slides.

func (*Script) Validate ¶

func (s *Script) Validate() []string

Validate checks the script for common issues.

type Segment ¶

type Segment struct {
	// Text contains the text content by language code.
	// Example: {"en": "Hello world", "es": "Hola mundo"}
	Text map[string]string `json:"text"`

	// Voice overrides the default voice for this segment by language.
	// Example: {"en": "voice-id-1", "es": "voice-id-2"}
	Voice map[string]string `json:"voice,omitempty"`

	// PauseBefore is the pause duration before this segment (e.g., "500ms", "1s").
	PauseBefore string `json:"pause_before,omitempty"`

	// PauseAfter is the pause duration after this segment (e.g., "500ms", "1s").
	PauseAfter string `json:"pause_after,omitempty"`

	// Emphasis indicates the emphasis level ("strong", "moderate", "reduced").
	Emphasis string `json:"emphasis,omitempty"`

	// Rate is the speaking rate ("slow", "medium", "fast", or percentage like "80%").
	Rate string `json:"rate,omitempty"`

	// Pitch adjusts the pitch ("low", "medium", "high", or percentage like "+10%").
	Pitch string `json:"pitch,omitempty"`

	// Pronunciations are segment-specific pronunciation overrides.
	Pronunciations map[string]map[string]string `json:"pronunciations,omitempty"`
}

Segment represents a single audio segment within a slide.

type Slide ¶

type Slide struct {
	// Title is the slide title (optional).
	Title string `json:"title,omitempty"`

	// Notes are speaker notes or comments (not rendered to audio).
	Notes string `json:"notes,omitempty"`

	// IsSectionHeader marks this slide as the start of a new section.
	// Section headers can have their titles spoken and use longer transition pauses.
	IsSectionHeader bool `json:"is_section_header,omitempty"`

	// SpeakTitle causes the slide title to be spoken before the segments.
	// If true, the title is converted to a segment. Defaults to true for section headers.
	SpeakTitle *bool `json:"speak_title,omitempty"`

	// TitleVoice overrides the voice used for speaking the title, by language.
	// If not set, uses the segment voice or default voice.
	TitleVoice map[string]string `json:"title_voice,omitempty"`

	// TitlePauseAfter is the pause after the spoken title (e.g., "500ms").
	// Defaults to "500ms" for section headers, "300ms" for regular slides.
	TitlePauseAfter string `json:"title_pause_after,omitempty"`

	// Segments are the audio segments for this slide.
	Segments []Segment `json:"segments"`
}

Slide represents a slide or section of the script.

func (*Slide) ShouldSpeakTitle ¶

func (s *Slide) ShouldSpeakTitle() bool

ShouldSpeakTitle returns true if the slide title should be spoken. Returns true if SpeakTitle is explicitly true, or if the slide is a section header and SpeakTitle is not explicitly false.

type TTSRequest ¶

type TTSRequest struct {
	VoiceID  string
	Text     string
	ModelID  string
	Segment  ElevenLabsSegment
	Language string
}

TTSRequest represents a request to the ElevenLabs TTS API. This is a simplified version for use with ttsscript.

func GenerateTTSRequests ¶

func GenerateTTSRequests(segments []ElevenLabsSegment, modelID, language string) []TTSRequest

GenerateTTSRequests creates TTS requests from formatted segments.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL