riva/proto/riva_tts.proto#

service RivaSpeechSynthesis
rpc SynthesizeSpeechResponse Synthesize(SynthesizeSpeechRequest)

Used to request text-to-speech from the service. Submit a request containing the desired text and configuration, and receive audio bytes in the requested format.

rpc stream SynthesizeSpeechResponse SynthesizeOnline(SynthesizeSpeechRequest)

Used to request text-to-speech returned via stream as it becomes available. Submit a SynthesizeSpeechRequest with desired text and configuration, and receive stream of bytes in the requested format.

rpc RivaSynthesisConfigResponse GetRivaSynthesisConfig(RivaSynthesisConfigRequest)

Enables clients to request the configuration of the current Synthesize service, or a specific model within the service.

message RivaSynthesisConfigRequest
string model_name

If model is specified only return config for model, otherwise return all configs.

message RivaSynthesisConfigResponse
RivaSynthesisConfigResponse.Config model_config (repeated)
message RivaSynthesisConfigResponse.Config
string model_name
RivaSynthesisConfigResponse.Config.ParametersEntry parameters (repeated)
message RivaSynthesisConfigResponse.Config.ParametersEntry
string key
string value
message SynthesizeSpeechRequest
string text
string language_code
nvidia.riva.AudioEncoding encoding

audio encoding params

int32 sample_rate_hz
string voice_name

voice params

message SynthesizeSpeechResponse
bytes audio
SynthesizeSpeechResponseMetadata meta
message SynthesizeSpeechResponseMetadata
string text

Currently experimental API addition that returns the input text after preprocessing has been completed as well as the predicted duration for each token. Note: this message is subject to future breaking changes, and potential removal.

string processed_text
float predicted_durations(repeated)