riva/proto/jarvis_tts.proto

service JarvisTTS
rpc SynthesizeSpeechResponse Synthesize(SynthesizeSpeechRequest)

Used to request text-to-speech from the service. Submit a request containing the desired text and configuration, and receive audio bytes in the requested format.

rpc stream SynthesizeSpeechResponse SynthesizeOnline(SynthesizeSpeechRequest)

Used to request text-to-speech returned via stream as it becomes available. Submit a SynthesizeSpeechRequest with desired text and configuration, and receive stream of bytes in the requested format.

message SynthesizeSpeechRequest
string text
string language_code
nvidia.jarvis.AudioEncoding encoding

audio encoding params

int32 sample_rate_hz
string voice_name

voice params

message SynthesizeSpeechResponse
bytes audio
SynthesizeSpeechResponseMetadata meta
message SynthesizeSpeechResponseMetadata
string text

Currently experimental API addition that returns the input text after preprocessing has been completed as well as the predicted duration for each token. Note: this message is subject to future breaking changes, and potential removal.

string processed_text
float predicted_durations(repeated)