API Reference#

Top

riva/proto/health.proto#

HealthCheckRequest#

Field

Type

Label

Description

service

string

HealthCheckResponse#

Field

Type

Label

Description

status

HealthCheckResponse.ServingStatus

HealthCheckResponse.ServingStatus#

Name

Number

Description

UNKNOWN

0

SERVING

1

NOT_SERVING

2

Health#

Method Name

Request Type

Response Type

Description

Check

HealthCheckRequest

HealthCheckResponse

Watch

HealthCheckRequest

HealthCheckResponse stream

Top

riva/proto/riva_audio.proto#

AudioEncoding#

AudioEncoding specifies the encoding of the audio bytes in the encapsulating message.

Name

Number

Description

ENCODING_UNSPECIFIED

0

Not specified.

LINEAR_PCM

1

Uncompressed 16-bit signed little-endian samples (Linear PCM).

FLAC

2

FLAC (Free Lossless Audio Codec) is the recommended encoding because it is lossless–therefore recognition is not compromised–and requires only about half the bandwidth of LINEAR16. FLAC stream encoding supports 16-bit and 24-bit samples, however, not all fields in STREAMINFO are supported.

MULAW

3

8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.

OGGOPUS

4

ALAW

20

8-bit samples that compand 13-bit audio samples using G.711 PCMU/a-law.

Top

riva/proto/riva_common.proto#

RequestId#

Specifies the request ID of the request.

Field

Type

Label

Description

value

string

Top

riva/proto/riva_tts.proto#

RivaSynthesisConfigRequest#

Field

Type

Label

Description

model_name

string

If model is specified only return config for model, otherwise return all configs.

RivaSynthesisConfigResponse#

Field

Type

Label

Description

model_config

RivaSynthesisConfigResponse.Config

repeated

RivaSynthesisConfigResponse.Config#

Field

Type

Label

Description

model_name

string

parameters

RivaSynthesisConfigResponse.Config.ParametersEntry

repeated

RivaSynthesisConfigResponse.Config.ParametersEntry#

Field

Type

Label

Description

key

string

value

string

SynthesizeSpeechRequest#

Field

Type

Label

Description

text

string

language_code

string

encoding

nvidia.riva.AudioEncoding

audio encoding params

sample_rate_hz

int32

voice_name

string

voice params

zero_shot_data

ZeroShotData

Zero Shot model params

custom_dictionary

string

A string containing comma-separated key-value pairs of grapheme and corresponding phoneme separated by double spaces.

id

nvidia.riva.RequestId

The ID to be associated with the request. If provided, this will be returned in the corresponding response.

SynthesizeSpeechResponse#

Field

Type

Label

Description

audio

bytes

meta

SynthesizeSpeechResponseMetadata

id

nvidia.riva.RequestId

The ID associated with the request

SynthesizeSpeechResponseMetadata#

Field

Type

Label

Description

text

string

Currently experimental API addition that returns the input text after preprocessing has been completed as well as the predicted duration for each token. Note: this message is subject to future breaking changes, and potential removal.

processed_text

string

predicted_durations

float

repeated

ZeroShotData#

Required for Zero Shot model

Field

Type

Label

Description

audio_prompt

bytes

Audio prompt for Zero Shot model. Duration should be between 3 to 10 seconds.

sample_rate_hz

int32

Sample rate for input audio prompt.

encoding

nvidia.riva.AudioEncoding

Encoding of audio prompt. Supported encodings are LINEAR_PCM and OGGOPUS.

quality

int32

The number of times user wants to pass audio through decoder. This ranges between 1-40. Defaults to 20.

RivaSpeechSynthesis#

Method Name

Request Type

Response Type

Description

Synthesize

SynthesizeSpeechRequest

SynthesizeSpeechResponse

Used to request text-to-speech from the service. Submit a request containing the desired text and configuration, and receive audio bytes in the requested format.

SynthesizeOnline

SynthesizeSpeechRequest

SynthesizeSpeechResponse stream

Used to request text-to-speech returned via stream as it becomes available. Submit a SynthesizeSpeechRequest with desired text and configuration, and receive stream of bytes in the requested format.

GetRivaSynthesisConfig

RivaSynthesisConfigRequest

RivaSynthesisConfigResponse

Enables clients to request the configuration of the current Synthesize service, or a specific model within the service.

Scalar Value Types#

.proto Type

Notes

C++

Java

Python

Go

C#

PHP

Ruby

double

double

double

float

float64

double

float

Float

float

float

float

float

float32

float

float

Float

int32

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

int32

int

int

int32

int

integer

Bignum or Fixnum (as required)

int64

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.

int64

long

int/long

int64

long

integer/string

Bignum

uint32

Uses variable-length encoding.

uint32

int

int/long

uint32

uint

integer

Bignum or Fixnum (as required)

uint64

Uses variable-length encoding.

uint64

long

int/long

uint64

ulong

integer/string

Bignum or Fixnum (as required)

sint32

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

int32

int

int

int32

int

integer

Bignum or Fixnum (as required)

sint64

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.

int64

long

int/long

int64

long

integer/string

Bignum

fixed32

Always four bytes. More efficient than uint32 if values are often greater than 2^28.

uint32

int

int

uint32

uint

integer

Bignum or Fixnum (as required)

fixed64

Always eight bytes. More efficient than uint64 if values are often greater than 2^56.

uint64

long

int/long

uint64

ulong

integer/string

Bignum

sfixed32

Always four bytes.

int32

int

int

int32

int

integer

Bignum or Fixnum (as required)

sfixed64

Always eight bytes.

int64

long

int/long

int64

long

integer/string

Bignum

bool

bool

boolean

boolean

bool

bool

boolean

TrueClass/FalseClass

string

A string must always contain UTF-8 encoded or 7-bit ASCII text.

string

String

str/unicode

string

string

string

String (UTF-8)

bytes

May contain any arbitrary sequence of bytes.

string

ByteString

str

[]byte

ByteString

string

String (ASCII-8BIT)