SSMD Editing

SSMD (Speech Synthesis Markdown) is an intermediate text format used by ttsforge between EPUB extraction and TTS audio generation. It allows fine-grained control over pronunciation, pacing, and emphasis in your audiobooks.

How SSMD Works

During conversion, ttsforge automatically generates .ssmd files for each chapter:

.{book_title}_chapters/
├── {book_title}_state.json
├── chapter_001_intro.ssmd      # Editable text with speech markup
├── chapter_001_intro.wav
├── chapter_002_chapter1.ssmd
├── chapter_002_chapter1.wav
└── ...

When you resume a conversion, ttsforge detects if you’ve edited any SSMD files (using MD5 hash comparison) and automatically regenerates the corresponding audio.

Basic Workflow

  1. Start conversion:

    ttsforge convert book.epub
    
  2. Pause conversion (Ctrl+C when needed)

  3. Edit SSMD files to fix pronunciation or pacing:

    vim .book_chapters/chapter_001_intro.ssmd
    
  4. Resume conversion - automatically detects edits:

    ttsforge convert book.epub
    

SSMD Syntax

SSMD uses a simple markdown-like syntax for speech control.

Structural Breaks

Control pauses between text segments:

...p    # Paragraph break (0.5-1.0s pause)
...s    # Sentence break (0.1-0.3s pause)
...c    # Clause break (shorter pause)

Example:

This is the first paragraph. ...s
It has multiple sentences. ...p

This is a second paragraph. ...s

Emphasis

Add vocal emphasis to words or phrases:

*text*      # Moderate emphasis
**text**    # Strong emphasis

Example:

Harry was a *highly unusual* boy. ...s
He **hated** the summer holidays. ...s

Custom Phonemes

Override pronunciation using IPA phonemes:

[word]{ph="phoneme"}

Examples:

[Hermione]{ph="hɝmˈIni"} Granger was Harry's best friend. ...s
The [API]{ph="ˌeɪpiˈaɪ"} supports [JSON]{ph="dʒˈeɪsɑn"}. ...s
[Kubernetes]{ph="kubɚnˈɛtɪs"} is a container orchestrator. ...s

Language Switching (Planned)

Mark text as a different language (placeholder for future):

[Bonjour]{lang="fr"}      # French text
[Hola]{lang="es"}         # Spanish text

Complete Example

Here’s a complete SSMD file example:

Chapter One ...p

[Harry]{ph="hæɹi"} Potter was a *highly unusual* boy in many ways. ...s
For one thing, he **hated** the summer holidays more than any other
time of year. ...s For another, he really wanted to do his homework,
but was forced to do it in secret, in the dead of the night. ...p

And he also happened to be a wizard. ...p

The [Dursleys]{ph="dɝzliz"} had everything they wanted, but they
also had a secret. ...s And their greatest fear was that somebody
would discover it. ...p

Automatic Features

SSMD files are automatically enhanced with:

Phoneme Dictionary Injection

If you use --phoneme-dict, all phoneme substitutions are automatically injected into the SSMD:

ttsforge convert book.epub --phoneme-dict custom_phonemes.json

The generated SSMD will include:

[Hermione]{ph="hɝmˈIni"} loved reading books. ...s

HTML Emphasis Detection

Emphasis from the original EPUB HTML is automatically converted:

  • <em>text</em>*text*

  • <strong>text</strong>**text**

Structural Break Preservation

ttsforge preserves paragraph structure but does not insert explicit ...p or ...s markers. Sentence detection is handled internally by pykokoro at synthesis time. Use manual break markers only when you need precise control over pauses.

Use Cases

When to Edit SSMD

  1. Pronunciation issues: Character names, technical terms, foreign words

  2. Pacing problems: Adjust paragraph and sentence breaks for better flow

  3. Emphasis corrections: Add or remove emphasis on specific words

  4. Consistency: Ensure consistent pronunciation across chapters

Combining with Phoneme Dictionary

For best results, use both features together:

  1. Create a phoneme dictionary for common names/terms

  2. Let ttsforge auto-inject into SSMD

  3. Edit SSMD files for chapter-specific tweaks

# 1. Extract and review names
ttsforge extract-names book.epub
vim custom_phonemes.json

# 2. Start conversion (phonemes auto-injected into SSMD)
ttsforge convert book.epub --phoneme-dict custom_phonemes.json

# 3. Edit specific SSMD files as needed
vim .book_chapters/chapter_005.ssmd

# 4. Resume (regenerates edited chapters)
ttsforge convert book.epub

Tips and Best Practices

  1. Start with phoneme dictionary: Create a global dictionary first, then use SSMD for chapter-specific overrides

  2. Test edits incrementally: Edit one chapter, let it regenerate, listen to verify before editing more

  3. Use emphasis sparingly: Too much emphasis can sound unnatural

  4. Keep backups: SSMD files are regenerated if missing, but manual edits are preserved

  5. Consistent phonemes: Use the same IPA notation throughout for consistency

Technical Details

Hash-Based Change Detection

ttsforge tracks SSMD file changes using MD5 hashes (12 characters) stored in the state file. When you resume:

  1. Current SSMD file is hashed

  2. Compared with saved hash in state

  3. If different, audio is regenerated

  4. New hash is saved

File Format

SSMD files are plain text UTF-8 files with the .ssmd extension. They can be edited with any text editor.

Error Handling

If SSMD generation fails, ttsforge falls back to plain text conversion and logs a warning. The conversion continues without SSMD features.

Validation

SSMD is not automatically validated during conversion. For manual checks, use the validate_ssmd helper from ttsforge.ssmd_generator to get warnings about unbalanced markers before you synthesize.

Limitations

  • Language switching is not yet implemented (planned feature)

  • Phoneme syntax must use valid IPA characters

  • Very long lines may be truncated in some editors

  • Hash detection only works with resumable conversions

See Also

For more SSMD examples and a quick reference, see SSMD_QUICKSTART.md in the repository root.