riva/proto/riva_tts.proto¶

service RivaSpeechSynthesis

rpc SynthesizeSpeechResponse Synthesize(SynthesizeSpeechRequest): Used to request text-to-speech from the service. Submit a request containing the desired text and configuration, and receive audio bytes in the requested format.

rpc stream SynthesizeSpeechResponse SynthesizeOnline(SynthesizeSpeechRequest): Used to request text-to-speech returned via stream as it becomes available. Submit a SynthesizeSpeechRequest with desired text and configuration, and receive stream of bytes in the requested format.

message SynthesizeSpeechRequest

string text

string language_code

nvidia.riva.AudioEncoding encoding: audio encoding params

int32 sample_rate_hz

string voice_name: voice params

message SynthesizeSpeechResponse

bytes audio

SynthesizeSpeechResponseMetadata meta

message SynthesizeSpeechResponseMetadata

string text: Currently experimental API addition that returns the input text after preprocessing has been completed as well as the predicted duration for each token. Note: this message is subject to future breaking changes, and potential removal.

string processed_text

float predicted_durations(repeated)