riva/proto/riva_tts.proto

riva/proto/riva_tts.proto#

service RivaSpeechSynthesis

rpc SynthesizeSpeechResponse Synthesize(SynthesizeSpeechRequest): Used to request text-to-speech from the service. Submit a request containing the desired text and configuration, and receive audio bytes in the requested format.

rpc stream SynthesizeSpeechResponse SynthesizeOnline(SynthesizeSpeechRequest): Used to request text-to-speech returned via stream as it becomes available. Submit a SynthesizeSpeechRequest with desired text and configuration, and receive stream of bytes in the requested format.

rpc RivaSynthesisConfigResponse GetRivaSynthesisConfig(RivaSynthesisConfigRequest): Enables clients to request the configuration of the current Synthesize service, or a specific model within the service.

message RivaSynthesisConfigRequest

string model_name: If model is specified only return config for model, otherwise return all configs.

message RivaSynthesisConfigResponse

RivaSynthesisConfigResponse.Config model_config (repeated)

message RivaSynthesisConfigResponse.Config

string model_name

RivaSynthesisConfigResponse.Config.ParametersEntry parameters (repeated)

message RivaSynthesisConfigResponse.Config.ParametersEntry

string key

string value

message SynthesizeSpeechRequest

string text

string language_code

nvidia.riva.AudioEncoding encoding: audio encoding params

int32 sample_rate_hz

string voice_name: voice params

message SynthesizeSpeechResponse

bytes audio

SynthesizeSpeechResponseMetadata meta

message SynthesizeSpeechResponseMetadata

string text: Currently experimental API addition that returns the input text after preprocessing has been completed as well as the predicted duration for each token. Note: this message is subject to future breaking changes, and potential removal.

string processed_text

float predicted_durations(repeated)