Customizing TTS Models#

SSML Customization#

Speech Synthesis Markup Language (SSML) controls the performance of the virtual speaker. The TTS NIM microservice supports a subset of SSML that lets you override pronunciation for specific words.

The following SSML tag is supported:

  • <phoneme>: Overrides pronunciation for specific words.

SSML Support by Model#

Model

Phoneme

Custom Dictionary

Magpie TTS Multilingual

Magpie TTS Zeroshot

Magpie TTS Flow

Note

All SSML inputs must be a valid XML document wrapped in a <speak> root tag. Input that is not valid XML, or valid XML with a different root tag, is treated as raw text.

Example#

Customize pronunciation with the phoneme tag:

python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
    --text "<speak>You say <phoneme alphabet='ipa' ph='təˈmeɪˌtoʊ'>tomato</phoneme>, I say <phoneme alphabet='ipa' ph='təˈmɑˌtoʊ'>tomato</phoneme>.</speak>" \
    --language-code en-US

The synthesized audio file output.wav contains the resulting speech with the phoneme overrides applied.

Custom Pronunciation Dictionary#

The TTS NIM microservice supports custom pronunciation through a text-based dictionary that maps words (graphemes) to IPA phonetic representations (phonemes). Use the --custom-dictionary flag to pass the dictionary file to the client.

Dictionary format:

  • Each line contains a word followed by its pronunciation, separated by exactly two spaces.

  • Split multi-word entries into individual lines.

  • Refer to Phoneme Support for the list of supported IPA phonemes.

Example#

python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
    --text "Today is a sunny day, a great day to eat fresh tomato" \
    --language-code en-US \
    --custom-dictionary custom_dict.txt

The custom dictionary file custom_dict.txt contains word-to-phoneme mappings:

sunny  ˈsʌnɪ
tomato  ˈtɑˌməʊ

The synthesized audio file output.wav contains the resulting speech with the custom pronunciations applied.