Customizing TTS Models#
SSML Customization#
Speech Synthesis Markup Language (SSML) controls the performance of the virtual speaker. The TTS NIM microservice supports a subset of SSML that lets you override pronunciation for specific words.
The following SSML tag is supported:
<phoneme>: Overrides pronunciation for specific words.
SSML Support by Model#
Model |
Phoneme |
Custom Dictionary |
|---|---|---|
✅ |
✅ |
|
✅ |
✅ |
|
❌ |
❌ |
Note
All SSML inputs must be a valid XML document wrapped in a <speak> root tag. Input that is not valid XML, or valid XML with a different root tag, is treated as raw text.
Example#
Customize pronunciation with the phoneme tag:
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
--text "<speak>You say <phoneme alphabet='ipa' ph='təˈmeɪˌtoʊ'>tomato</phoneme>, I say <phoneme alphabet='ipa' ph='təˈmɑˌtoʊ'>tomato</phoneme>.</speak>" \
--language-code en-US
The synthesized audio file output.wav contains the resulting speech with the phoneme overrides applied.
Custom Pronunciation Dictionary#
The TTS NIM microservice supports custom pronunciation through a text-based dictionary that maps words (graphemes) to IPA phonetic representations (phonemes). Use the --custom-dictionary flag to pass the dictionary file to the client.
Dictionary format:
Each line contains a word followed by its pronunciation, separated by exactly two spaces.
Split multi-word entries into individual lines.
Refer to Phoneme Support for the list of supported IPA phonemes.
Example#
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
--text "Today is a sunny day, a great day to eat fresh tomato" \
--language-code en-US \
--custom-dictionary custom_dict.txt
The custom dictionary file custom_dict.txt contains word-to-phoneme mappings:
sunny ˈsʌnɪ
tomato ˈtɑˌməʊ
The synthesized audio file output.wav contains the resulting speech with the custom pronunciations applied.