riva/proto/riva_tts.proto
riva/proto/riva_tts.proto#
-
service RivaSpeechSynthesis
- rpc SynthesizeSpeechResponse Synthesize(SynthesizeSpeechRequest)
Used to request text-to-speech from the service. Submit a request containing the desired text and configuration, and receive audio bytes in the requested format.
- rpc stream SynthesizeSpeechResponse SynthesizeOnline(SynthesizeSpeechRequest)
Used to request text-to-speech returned via stream as it becomes available. Submit a SynthesizeSpeechRequest with desired text and configuration, and receive stream of bytes in the requested format.
- rpc RivaSynthesisConfigResponse GetRivaSynthesisConfig(RivaSynthesisConfigRequest)
Enables clients to request the configuration of the current Synthesize service, or a specific model within the service.
-
message RivaSynthesisConfigRequest
-
string model_name
If model is specified only return config for model, otherwise return all configs.
-
string model_name
-
message RivaSynthesisConfigResponse
- RivaSynthesisConfigResponse.Config model_config (repeated)
- message RivaSynthesisConfigResponse.Config
-
string model_name
- RivaSynthesisConfigResponse.Config.ParametersEntry parameters (repeated)
-
string model_name
- message RivaSynthesisConfigResponse.Config.ParametersEntry
-
string key
-
string value
-
string key
-
message SynthesizeSpeechRequest
-
string text
-
string language_code
- nvidia.riva.AudioEncoding encoding
audio encoding params
-
int32 sample_rate_hz
-
string voice_name
voice params
-
ZeroShotData zero_shot_data
Zero Shot model params
- nvidia.riva.RequestId id
The ID to be associated with the request. If provided, this will be returned in the corresponding response.
-
string text
-
message SynthesizeSpeechResponse
-
bytes audio
- nvidia.riva.RequestId id
The ID associated with the request
-
bytes audio
-
message SynthesizeSpeechResponseMetadata
-
string text
Currently experimental API addition that returns the input text after preprocessing has been completed as well as the predicted duration for each token. Note: this message is subject to future breaking changes, and potential removal.
-
string processed_text
-
float predicted_durations(repeated)
-
string text
-
message ZeroShotData
Required for Zero Shot model
-
bytes audio_prompt
Small (upto 5-seconds) audio prompt for Zero Shot model.
-
int32 sample_rate_hz
Sample rate for input audio prompt. Current defaults to 22050.
- nvidia.riva.AudioEncoding encoding
Encoding of audio prompt, defaults to PCM.
-
int32 quality
The number of times user wants to pass audio through decoder. This ranges between 1-40. Defaults to 20.
-
bytes audio_prompt