Text-To-Speech (Latest)
Text-To-Speech (Latest)

Customization

This section provides some examples on how to customize Riva TTS through the following SSML tags:

  • The prosody tag, which supports attributes rate, pitch, and volume, through which we can control the rate, pitch, and volume of the generated audio.

  • The phoneme tag, which allows us to control the pronunciation of the generated audio.

  • The sub tag, which allows us to replace the pronounciation of the specified word or phrase with a different word or phrase.

Customizing rate, pitch, and volume with the prosody tag

Copy
Copied!
            

python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 --text "<speak><prosody pitch='2.5'>Today is a sunny day</prosody>. <prosody rate='high' volume='+1dB'>But it might rain tomorrow.</prosody></speak>" --language-code en-US

Customizing pronunciation with the phoneme tag

Copy
Copied!
            

python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 --text "<speak>You say <phoneme alphabet="ipa" ph="təˈmeɪˌtoʊ">tomato</phoneme>, I say <phoneme alphabet="ipa" ph="təˈmɑˌtoʊ">tomato</phoneme>.</speak>" --language-code en-US

Replacing pronunciation with the sub tag

Copy
Copied!
            

python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 --text "<speak><sub alias='World Wide Web'>WWW</sub> is known as the web.</speak>" --language-code en-US

The synthesized audio file output.wav will contain the resulting speech with SSML attributes applied.

Previous Configuration
Next Performance
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 6, 2024.