Custom Models
Contents
Custom Models#
Model Deployment#
Like all Riva models, Riva TTS requires the following steps:
Create
.riva
files for each model from either a.tao
or.nemo
file as outlined in the respective Model Development with TAO Toolkit and NeMo sectionsCreate
.rmir
files for each Riva skill (for example, ASR, NLP, and TTS) usingriva-build
Create model directories using
riva_deploy
Deploy the model directory using
riva_server
The following sections provide examples for specific steps as outlined above.
Creating Riva Files#
Riva files can be created from .nemo
or .tao
files. As mentioned before in the
respective TAO and NeMo sections, the generation
of Riva files from .nemo
or .tao
files must be done on a Linux x86_64 workstation only.
The following is an example of how a
HiFi-GAN model can be converted to a .riva
file from a .nemo
file.
Download the
.nemo
file from NGC onto the host system.Run the NeMo container and share the
.nemo
file with the container including the-v
option.
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/tts_hifigan/versions/1.0.0rc1/zip -O tts_hifigan_1.0.0rc1.zip
unzip tts_hifigan_1.0.0rc1.zip
docker run --gpus all -it --rm \
-v $(pwd):/NeMo \
--shm-size=8g \
-p 8888:8888 \
-p 6006:6006 \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--device=/dev/snd \
nvcr.io/nvidia/nemo:1.4.0
After the container has launched, use
nemo2riva
to convert the.nemo
to.riva
run:
pip3 install nvidia-pyindex
VersionNum=???
ngc registry resource download-version "nvidia/riva/riva_quickstart:$2.4.0"
pip3 install "riva_quickstart_v$2.4.0/nemo2riva-$2.4.0-py3-none-any.whl"
nemo2riva --out /NeMo/hifigan.riva /NeMo/tts_hifigan.nemo
Repeat this process for each .nemo
model to generate .riva
files. It is suggested that
you do so for FastPitch before continuing to the next step. Be sure that you are getting the latest
tts_hifigan.nemo
checkpoint, latest nvcr.io/nvidia/nemo
container version, and latest
nemo2riva-{version}_beta-py3-none-any.whl
version when performing the above step:
Note
Tacotron2 is kept as a .nemo
file and is not supported with the nemo2riva
tool.
Note
WaveGlow NeMo built with newer NeMo versions do not work with the nemo2riva
tool nor the riva-build
tool.
Refer to the release-notes.html#riva-speech-skills-1-10-0-beta Known Issues section for more information.
Custom Pronunciations#
Speech synthesis models deployed in Riva are configured with a language-specific pronunciation
dictionary mapping a large vocabulary of words from their written form, graphemes, to a sequence
of perceptually distinct sounds, phonemes. In cases where pronunciation is ambiguous, for example
with heteronyms like bass
(the fish) and bass
(the musical instrument), the dictionary is
ignored and the synthesis model uses context clues from the sentence to predict an appropriate
pronunciation.
Modern speech synthesis algorithms are surprisingly capable of accurately predicting pronunciations of new and novel words. Sometimes, however, it is desirable or necessary to provide extra context to the model.
While custom pronunciations can be supplied at request time using SSML, request-time overrides are best suited for one-off adjustments. For domain-specific terms with fixed pronunciations, configure Riva with these pronunciations when deploying the server.
There are two key parameters that can be configured through riva-build
or in the
preprocessor configuration that affects the phoneme path:
--arpabet_file
path to pronunciation dictionary. For English, start with the dictionary available on NGC and add custom entries as needed. English language models use phonemes defined in ARPABET and CMUDict is the default pronunciation dictionary. Modify this dictionary with custom entries as needed.--preprocessor.g2p_ignore_ambiguous
IfTrue
, words that have more than one phonetic representation in the pronunciation dictionary such as “read” are not converted to phonemes. Defaults toTrue
.
To determine the appropriate phoneme sequence, use the SSML API to experiment with phone sequences and evaluate the quality. Once the mapping sounds correct, add the discovered mapping to a new line in the dictionary.
Multispeaker Models#
Riva supports models with multiple speakers. Currently, this feature is limited to FastPitch and HiFi-Gan models.
To enable this feature, specify the following parameters before building the model.
--voice_name
is the name of the model. Defaults toEnglish-US-Female-1
.--subvoices
is a comma-separated list of names for each subvoice, with the length equal to the number of subvoices as specified in the FastPitch model. For example, for a model with a “male” subvoice in the 0th speaker embedding and “female” subvoice in the first embedding, include the option--subvoices=Male:0,Female:1
. If not provided, the desired embedding can be requested by integer index.
The voice name and subvoices are maintained in the generated .rmir
file, and caried into the generated Triton
repositories. During inference, modify the voice name of the request by appending voice_name
with a
period followed by a valid subvoice. For example, <voice_name>.<subvoice>
.
Custom Voice#
Riva is voice agnostic and can be run with any English-US TTS voice. In order to train a custom voice model, data must first be collected. We recommend at least 30 minutes of high-quality data. For collecting the data, refer to the Riva custom voice recoder. After the data has been collected, the FastPitch and Hifi-GAN models need to be fine-tuned on this dataset. Refer to the NeMo fine-tuning notebook or the TAO fine-tuning notebook for how to train these models. A Riva pipeline using these models can be built according to the instructions on this page.
Pretrained Models#
Task |
Architecture |
Language |
Dataset |
Link |
---|---|---|---|---|
Mel Spectrogram Generation |
FastPitch |
English |
English-US-Female-1 |
|
Vocoder |
HiFi-GAN |
English |
English-US-Female-1 |
|
Mel Spectrogram Generation |
FastPitch |
English |
English-US-Male-1 |
|
Vocoder |
HiFi-GAN |
English |
English-US-Male-1 |
|
Mel Spectrogram Generation |
FastPitch |
English |
LJSpeech |
|
Mel Spectrogram Generation |
Tacotron2 |
English |
LJSpeech |
|
Vocoder |
HiFi-GAN |
English |
LJSpeech |
|
Vocoder |
Waveglow |
English |
LJSpeech |
Pipeline Configuration#
FastPitch and HiFi-GAN#
Deploy a FastPitch and HiFi-GAN TTS pipeline as follows from within the ServiceMaker container:
riva-build speech_synthesis \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<fastpitch_riva_filename>:<encryption_key> \
/servicemaker-dev/<hifigan_riva_filename>:<encryption_key> \
--voice_name=<pipeline_name> \
--abbreviations_file=/servicemaker-dev/<abbr_file> \
--arpabet_file=/servicemaker-dev/<dictionary_file> \
--wfst_tokenizer_model=/servicemaker-dev/<tokenizer_far_file> \
--wfst_verbalizer_model=/servicemaker-dev/<verbalizer_far_file> \
where:
<rmir_filename>
is the Rivarmir
file that is generated<encryption_key>
is the key used to encrypt the files. The encryption key for the pre-trained Riva models uploaded on NGC istlt_encode
.pipeline_name
is an optional user-defined name for the components in the model repository<fastpitch_riva_filename>
is the name of theriva
file for FastPitch<hifigan_riva_filename>
is the name of theriva
file for HiFi-GAN<abbr_file>
is the name of the file containing abbreviations and their corresponding expansions<dictionary_file>
is the name of the file containing the pronunciation dictionary mapping from words to their phonetic representation in ARPABET.<subvoices>
is a comma-separated list of names for each subvoice. Defaults to naming by integer index. This is needed and only used for multi-speaker models.<wfst_tokenizer_model>
is the location of thetokenize_and_classify.far
file that is generated from running the NeMo’s Text Processing’sexport_grammar.sh
script<wfst_verbalizer_model>
is the location of theverbalize.far
file that is generated from running the NeMo’s Text Processing’sexport_grammar.sh
script
Upon successful completion of this command, a file named <rmir_filename>
is created in the
/servicemaker-dev/
folder. If your .riva
archives are encrypted, you need to include
:<encryption_key>
at the end of the RMIR filename and riva
filename, otherwise this is
unnecessary.
For embedded platforms, using a batch size of 1 is recommended since it achieves the lowest memory footprint. To use a batch size of 1, refer to the riva-build-optional-parameters section and set the various max_batch_size
parameters to 1 while executing the riva-build
command.
Tacotron2 and WaveGlow#
Warning
Tacotron2 and WaveGlow are deprecated and no longer recommended.
In the simplest use case, you can deploy a Tacotron2 or WaveGlow TTS model as follows:
riva-build speech_synthesis \
/servicemaker-dev/<rmir_filename>:<encryption_key> \
/servicemaker-dev/<tacotron_nemo_filename> \
/servicemaker-dev/<waveglow_riva_filename>:<encryption_key> \
--voice_name=<pipeline_name> \
--abbreviations_file=/servicemaker-dev/<abbr_file> \
--arpabet_file=/servicemaker-dev/<dictionary_file> \
--wfst_tokenizer_model=/servicemaker-dev/<tokenizer_far_file> \
--wfst_verbalizer_model=/servicemaker-dev/<verbalizer_far_file> \
where:
<rmir_filename>
is the Rivarmir
file that is generated<encryption_key>
is the encryption key used during the export of the.riva
filepipeline_name
is an optional user-defined name for the components in the model repository<tacotron_nemo_filename>
is the name of thenemo
checkpoint file for Tacotron 2<waveglow_riva_filename>
is the name of theriva
file for the universal WaveGlow model<abbr_file>
is the name of the file containing abbreviations and their corresponding expansions<dictionary_file>
is the name of the file containing the pronunciation dictionary mapping from words to their phonetic representation in ARPABET.<wfst_tokenizer_model>
is the location of thetokenize_and_classify.far
file that is generated from running the NeMo’s Text Processing’sexport_grammar.sh
script<wfst_verbalizer_model>
is the location of theverbalize.far
file that is generated from running the NeMo’s Text Processing’sexport_grammar.sh
script
Upon successful completion of this command, a file named <rmir_filename>
is created in the
/servicemaker-dev/
folder. If your .riva
archives are encrypted, you need to include
:<encryption_key>
at the end of the RMIR filename and riva
filename, otherwise this is
unnecessary.
Pretrained Quick Start Pipelines#
Pipeline |
|
---|---|
FastPitch + HiFi-GAN Female 1 |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<fastpitch_riva_filename>:<key> \
<hifigan_riva_filename>:<key> \
--sample_rate 44100 \
--voice_name English-US-Female-1 \
--arpabet_file=cmudict-0.7b-nv0.01 \
--abbreviations_file=abbr.txt \
--wfst_tokenizer_model=tokenize_and_classify.far \
--wfst_verbalizer_model=verbalize.far
|
FastPitch + HiFi-GAN Male 1 |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<fastpitch_riva_filename>:<key> \
<hifigan_riva_filename>:<key> \
--sample_rate 44100 \
--voice_name English-US-Male-1 \
--arpabet_file=cmudict-0.7b-nv0.01 \
--abbreviations_file=abbr.txt \
--wfst_tokenizer_model=tokenize_and_classify.far \
--wfst_verbalizer_model=verbalize.far
|
FastPitch + HiFi-GAN LJSpeech |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<fastpitch_riva_filename>:<key> \
<hifigan_riva_filename>:<key> \
--voice_name ljspeech \
--arpabet_file=cmudict-0.7b-nv0.01 \
--abbreviations_file=abbr.txt \
--wfst_tokenizer_model=tokenize_and_classify.far \
--wfst_verbalizer_model=verbalize.far
|
Tacotron2 + WaveGlow |
riva-build speech_synthesis \
<rmir_filename>:<key> \
<tacotron_nemo_filename>:<key> \
<waveglow_riva_filename>:<key> \
--arpabet_file=cmudict-0.7b-nv0.01 \
--abbreviations_file=abbr.txt \
--wfst_tokenizer_model=tokenize_and_classify.far \
--wfst_verbalizer_model=verbalize.far
|
All text normalization .far
files are in NGC on the Riva TTS English Normalization Grammar page. All other auxiliary files that are not .riva
files (such as pronunciation dictionaries) are in NGC on the Riva TTS English US Auxiliary Files page.
Riva-build Optional Parameters#
For details about the parameters passed to riva-build
to customize the TTS pipeline, issue:
riva-build speech_synthesis -h
The following list includes descriptions for all optional parameters currently recognized by riva-build
:
usage: riva-build speech_synthesis [-h] [-f] [-v]
[--language_code LANGUAGE_CODE]
[--max_batch_size MAX_BATCH_SIZE]
[--voice_name VOICE_NAME]
[--num_speakers NUM_SPEAKERS]
[--subvoices SUBVOICES]
[--sample_rate SAMPLE_RATE]
[--chunk_length CHUNK_LENGTH]
[--overlap_length OVERLAP_LENGTH]
[--num_mels NUM_MELS]
[--num_samples_per_frame NUM_SAMPLES_PER_FRAME]
[--abbreviations_file ABBREVIATIONS_FILE]
[--has_mapping_file HAS_MAPPING_FILE]
[--arpabet_file ARPABET_FILE]
[--wfst_tokenizer_model WFST_TOKENIZER_MODEL]
[--wfst_verbalizer_model WFST_VERBALIZER_MODEL]
[--denoiser.max_sequence_idle_microseconds DENOISER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--denoiser.max_batch_size DENOISER.MAX_BATCH_SIZE]
[--denoiser.min_batch_size DENOISER.MIN_BATCH_SIZE]
[--denoiser.opt_batch_size DENOISER.OPT_BATCH_SIZE]
[--denoiser.preferred_batch_size DENOISER.PREFERRED_BATCH_SIZE]
[--denoiser.batching_type DENOISER.BATCHING_TYPE]
[--denoiser.preserve_ordering DENOISER.PRESERVE_ORDERING]
[--denoiser.instance_group_count DENOISER.INSTANCE_GROUP_COUNT]
[--denoiser.max_queue_delay_microseconds DENOISER.MAX_QUEUE_DELAY_MICROSECONDS]
[--denoiser.fade_length DENOISER.FADE_LENGTH]
[--preprocessor.max_sequence_idle_microseconds PREPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--preprocessor.max_batch_size PREPROCESSOR.MAX_BATCH_SIZE]
[--preprocessor.min_batch_size PREPROCESSOR.MIN_BATCH_SIZE]
[--preprocessor.opt_batch_size PREPROCESSOR.OPT_BATCH_SIZE]
[--preprocessor.preferred_batch_size PREPROCESSOR.PREFERRED_BATCH_SIZE]
[--preprocessor.batching_type PREPROCESSOR.BATCHING_TYPE]
[--preprocessor.preserve_ordering PREPROCESSOR.PRESERVE_ORDERING]
[--preprocessor.instance_group_count PREPROCESSOR.INSTANCE_GROUP_COUNT]
[--preprocessor.max_queue_delay_microseconds PREPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS]
[--preprocessor.mapping_path PREPROCESSOR.MAPPING_PATH]
[--preprocessor.g2p_ignore_ambiguous PREPROCESSOR.G2P_IGNORE_AMBIGUOUS]
[--preprocessor.language PREPROCESSOR.LANGUAGE]
[--preprocessor.max_sequence_length PREPROCESSOR.MAX_SEQUENCE_LENGTH]
[--preprocessor.max_input_length PREPROCESSOR.MAX_INPUT_LENGTH]
[--preprocessor.mapping PREPROCESSOR.MAPPING]
[--preprocessor.tolower PREPROCESSOR.TOLOWER]
[--preprocessor.pad_with_space PREPROCESSOR.PAD_WITH_SPACE]
[--encoder.max_sequence_idle_microseconds ENCODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--encoder.max_batch_size ENCODER.MAX_BATCH_SIZE]
[--encoder.min_batch_size ENCODER.MIN_BATCH_SIZE]
[--encoder.opt_batch_size ENCODER.OPT_BATCH_SIZE]
[--encoder.preferred_batch_size ENCODER.PREFERRED_BATCH_SIZE]
[--encoder.batching_type ENCODER.BATCHING_TYPE]
[--encoder.preserve_ordering ENCODER.PRESERVE_ORDERING]
[--encoder.instance_group_count ENCODER.INSTANCE_GROUP_COUNT]
[--encoder.max_queue_delay_microseconds ENCODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--encoder.trt_max_workspace_size ENCODER.TRT_MAX_WORKSPACE_SIZE]
[--encoder.use_onnx_runtime]
[--encoder.use_trt_fp32]
[--encoder.fp16_needs_obey_precision_pass]
[--encoderFastPitch.max_sequence_idle_microseconds ENCODERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--encoderFastPitch.max_batch_size ENCODERFASTPITCH.MAX_BATCH_SIZE]
[--encoderFastPitch.min_batch_size ENCODERFASTPITCH.MIN_BATCH_SIZE]
[--encoderFastPitch.opt_batch_size ENCODERFASTPITCH.OPT_BATCH_SIZE]
[--encoderFastPitch.preferred_batch_size ENCODERFASTPITCH.PREFERRED_BATCH_SIZE]
[--encoderFastPitch.batching_type ENCODERFASTPITCH.BATCHING_TYPE]
[--encoderFastPitch.preserve_ordering ENCODERFASTPITCH.PRESERVE_ORDERING]
[--encoderFastPitch.instance_group_count ENCODERFASTPITCH.INSTANCE_GROUP_COUNT]
[--encoderFastPitch.max_queue_delay_microseconds ENCODERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS]
[--encoderFastPitch.trt_max_workspace_size ENCODERFASTPITCH.TRT_MAX_WORKSPACE_SIZE]
[--encoderFastPitch.use_onnx_runtime]
[--encoderFastPitch.use_trt_fp32]
[--encoderFastPitch.fp16_needs_obey_precision_pass]
[--decoder.max_sequence_idle_microseconds DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--decoder.max_batch_size DECODER.MAX_BATCH_SIZE]
[--decoder.min_batch_size DECODER.MIN_BATCH_SIZE]
[--decoder.opt_batch_size DECODER.OPT_BATCH_SIZE]
[--decoder.preferred_batch_size DECODER.PREFERRED_BATCH_SIZE]
[--decoder.batching_type DECODER.BATCHING_TYPE]
[--decoder.preserve_ordering DECODER.PRESERVE_ORDERING]
[--decoder.instance_group_count DECODER.INSTANCE_GROUP_COUNT]
[--decoder.max_queue_delay_microseconds DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
[--chunkerFastPitch.max_sequence_idle_microseconds CHUNKERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--chunkerFastPitch.max_batch_size CHUNKERFASTPITCH.MAX_BATCH_SIZE]
[--chunkerFastPitch.min_batch_size CHUNKERFASTPITCH.MIN_BATCH_SIZE]
[--chunkerFastPitch.opt_batch_size CHUNKERFASTPITCH.OPT_BATCH_SIZE]
[--chunkerFastPitch.preferred_batch_size CHUNKERFASTPITCH.PREFERRED_BATCH_SIZE]
[--chunkerFastPitch.batching_type CHUNKERFASTPITCH.BATCHING_TYPE]
[--chunkerFastPitch.preserve_ordering CHUNKERFASTPITCH.PRESERVE_ORDERING]
[--chunkerFastPitch.instance_group_count CHUNKERFASTPITCH.INSTANCE_GROUP_COUNT]
[--chunkerFastPitch.max_queue_delay_microseconds CHUNKERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS]
[--waveglow.max_sequence_idle_microseconds WAVEGLOW.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--waveglow.max_batch_size WAVEGLOW.MAX_BATCH_SIZE]
[--waveglow.min_batch_size WAVEGLOW.MIN_BATCH_SIZE]
[--waveglow.opt_batch_size WAVEGLOW.OPT_BATCH_SIZE]
[--waveglow.preferred_batch_size WAVEGLOW.PREFERRED_BATCH_SIZE]
[--waveglow.batching_type WAVEGLOW.BATCHING_TYPE]
[--waveglow.preserve_ordering WAVEGLOW.PRESERVE_ORDERING]
[--waveglow.instance_group_count WAVEGLOW.INSTANCE_GROUP_COUNT]
[--waveglow.max_queue_delay_microseconds WAVEGLOW.MAX_QUEUE_DELAY_MICROSECONDS]
[--waveglow.trt_max_workspace_size WAVEGLOW.TRT_MAX_WORKSPACE_SIZE]
[--waveglow.use_onnx_runtime]
[--waveglow.use_trt_fp32]
[--waveglow.fp16_needs_obey_precision_pass]
[--hifigan.max_sequence_idle_microseconds HIFIGAN.MAX_SEQUENCE_IDLE_MICROSECONDS]
[--hifigan.max_batch_size HIFIGAN.MAX_BATCH_SIZE]
[--hifigan.min_batch_size HIFIGAN.MIN_BATCH_SIZE]
[--hifigan.opt_batch_size HIFIGAN.OPT_BATCH_SIZE]
[--hifigan.preferred_batch_size HIFIGAN.PREFERRED_BATCH_SIZE]
[--hifigan.batching_type HIFIGAN.BATCHING_TYPE]
[--hifigan.preserve_ordering HIFIGAN.PRESERVE_ORDERING]
[--hifigan.instance_group_count HIFIGAN.INSTANCE_GROUP_COUNT]
[--hifigan.max_queue_delay_microseconds HIFIGAN.MAX_QUEUE_DELAY_MICROSECONDS]
[--hifigan.trt_max_workspace_size HIFIGAN.TRT_MAX_WORKSPACE_SIZE]
[--hifigan.use_onnx_runtime]
[--hifigan.use_trt_fp32]
[--hifigan.fp16_needs_obey_precision_pass]
output_path source_path [source_path ...]
Generate a Riva Model from a speech_synthesis model trained with NVIDIA NeMo.
positional arguments:
output_path Location to write compiled Riva pipeline
source_path Source file(s)
optional arguments:
-h, --help show this help message and exit
-f, --force Overwrite existing artifacts if they exist
-v, --verbose Verbose log outputs
--language_code LANGUAGE_CODE
Language of the model
--max_batch_size MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--voice_name VOICE_NAME
Set the voice name for speech synthesis
--num_speakers NUM_SPEAKERS
Number of unqiue speakers.
--subvoices SUBVOICES
Comma-seprated list of subvoices (no whitespace).
--sample_rate SAMPLE_RATE
Sample rate of the output signal
--chunk_length CHUNK_LENGTH
Chunk length in mel frames to synthesize at one time
--overlap_length OVERLAP_LENGTH
Chunk length in mel frames to overlap neighboring
chunks
--num_mels NUM_MELS number of mels
--num_samples_per_frame NUM_SAMPLES_PER_FRAME
number of samples per frame
--abbreviations_file ABBREVIATIONS_FILE
Path to file with list of abbreviations and
corresponding expansions
--has_mapping_file HAS_MAPPING_FILE
--arpabet_file ARPABET_FILE
Path to pronunciation dictionary
--wfst_tokenizer_model WFST_TOKENIZER_MODEL
Sparrowhawk model to use for tokenization and
classification, must be in .far format
--wfst_verbalizer_model WFST_VERBALIZER_MODEL
Sparrowhawk model to use for verbalizer, must be in
.far format.
denoiser:
--denoiser.max_sequence_idle_microseconds DENOISER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--denoiser.max_batch_size DENOISER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--denoiser.min_batch_size DENOISER.MIN_BATCH_SIZE
--denoiser.opt_batch_size DENOISER.OPT_BATCH_SIZE
--denoiser.preferred_batch_size DENOISER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--denoiser.batching_type DENOISER.BATCHING_TYPE
--denoiser.preserve_ordering DENOISER.PRESERVE_ORDERING
Preserve ordering
--denoiser.instance_group_count DENOISER.INSTANCE_GROUP_COUNT
How many instances in a group
--denoiser.max_queue_delay_microseconds DENOISER.MAX_QUEUE_DELAY_MICROSECONDS
max queue delta in microseconds
--denoiser.fade_length DENOISER.FADE_LENGTH
fade length
preprocessor:
--preprocessor.max_sequence_idle_microseconds PREPROCESSOR.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--preprocessor.max_batch_size PREPROCESSOR.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--preprocessor.min_batch_size PREPROCESSOR.MIN_BATCH_SIZE
--preprocessor.opt_batch_size PREPROCESSOR.OPT_BATCH_SIZE
--preprocessor.preferred_batch_size PREPROCESSOR.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--preprocessor.batching_type PREPROCESSOR.BATCHING_TYPE
--preprocessor.preserve_ordering PREPROCESSOR.PRESERVE_ORDERING
Preserve ordering
--preprocessor.instance_group_count PREPROCESSOR.INSTANCE_GROUP_COUNT
How many instances in a group
--preprocessor.max_queue_delay_microseconds PREPROCESSOR.MAX_QUEUE_DELAY_MICROSECONDS
max queue delta in microseconds
--preprocessor.mapping_path PREPROCESSOR.MAPPING_PATH
--preprocessor.g2p_ignore_ambiguous PREPROCESSOR.G2P_IGNORE_AMBIGUOUS
--preprocessor.language PREPROCESSOR.LANGUAGE
--preprocessor.max_sequence_length PREPROCESSOR.MAX_SEQUENCE_LENGTH
maximum length of every emitted sequence
--preprocessor.max_input_length PREPROCESSOR.MAX_INPUT_LENGTH
maximum length of input string
--preprocessor.mapping PREPROCESSOR.MAPPING
--preprocessor.tolower PREPROCESSOR.TOLOWER
--preprocessor.pad_with_space PREPROCESSOR.PAD_WITH_SPACE
encoder:
--encoder.max_sequence_idle_microseconds ENCODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--encoder.max_batch_size ENCODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--encoder.min_batch_size ENCODER.MIN_BATCH_SIZE
--encoder.opt_batch_size ENCODER.OPT_BATCH_SIZE
--encoder.preferred_batch_size ENCODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--encoder.batching_type ENCODER.BATCHING_TYPE
--encoder.preserve_ordering ENCODER.PRESERVE_ORDERING
Preserve ordering
--encoder.instance_group_count ENCODER.INSTANCE_GROUP_COUNT
How many instances in a group
--encoder.max_queue_delay_microseconds ENCODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--encoder.trt_max_workspace_size ENCODER.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--encoder.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--encoder.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--encoder.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
encoderFastPitch:
--encoderFastPitch.max_sequence_idle_microseconds ENCODERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--encoderFastPitch.max_batch_size ENCODERFASTPITCH.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--encoderFastPitch.min_batch_size ENCODERFASTPITCH.MIN_BATCH_SIZE
--encoderFastPitch.opt_batch_size ENCODERFASTPITCH.OPT_BATCH_SIZE
--encoderFastPitch.preferred_batch_size ENCODERFASTPITCH.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--encoderFastPitch.batching_type ENCODERFASTPITCH.BATCHING_TYPE
--encoderFastPitch.preserve_ordering ENCODERFASTPITCH.PRESERVE_ORDERING
Preserve ordering
--encoderFastPitch.instance_group_count ENCODERFASTPITCH.INSTANCE_GROUP_COUNT
How many instances in a group
--encoderFastPitch.max_queue_delay_microseconds ENCODERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--encoderFastPitch.trt_max_workspace_size ENCODERFASTPITCH.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--encoderFastPitch.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--encoderFastPitch.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--encoderFastPitch.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
decoder:
--decoder.max_sequence_idle_microseconds DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--decoder.max_batch_size DECODER.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--decoder.min_batch_size DECODER.MIN_BATCH_SIZE
--decoder.opt_batch_size DECODER.OPT_BATCH_SIZE
--decoder.preferred_batch_size DECODER.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--decoder.batching_type DECODER.BATCHING_TYPE
--decoder.preserve_ordering DECODER.PRESERVE_ORDERING
Preserve ordering
--decoder.instance_group_count DECODER.INSTANCE_GROUP_COUNT
How many instances in a group
--decoder.max_queue_delay_microseconds DECODER.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
chunkerFastPitch:
--chunkerFastPitch.max_sequence_idle_microseconds CHUNKERFASTPITCH.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--chunkerFastPitch.max_batch_size CHUNKERFASTPITCH.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--chunkerFastPitch.min_batch_size CHUNKERFASTPITCH.MIN_BATCH_SIZE
--chunkerFastPitch.opt_batch_size CHUNKERFASTPITCH.OPT_BATCH_SIZE
--chunkerFastPitch.preferred_batch_size CHUNKERFASTPITCH.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--chunkerFastPitch.batching_type CHUNKERFASTPITCH.BATCHING_TYPE
--chunkerFastPitch.preserve_ordering CHUNKERFASTPITCH.PRESERVE_ORDERING
Preserve ordering
--chunkerFastPitch.instance_group_count CHUNKERFASTPITCH.INSTANCE_GROUP_COUNT
How many instances in a group
--chunkerFastPitch.max_queue_delay_microseconds CHUNKERFASTPITCH.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
waveglow:
--waveglow.max_sequence_idle_microseconds WAVEGLOW.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--waveglow.max_batch_size WAVEGLOW.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--waveglow.min_batch_size WAVEGLOW.MIN_BATCH_SIZE
--waveglow.opt_batch_size WAVEGLOW.OPT_BATCH_SIZE
--waveglow.preferred_batch_size WAVEGLOW.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--waveglow.batching_type WAVEGLOW.BATCHING_TYPE
--waveglow.preserve_ordering WAVEGLOW.PRESERVE_ORDERING
Preserve ordering
--waveglow.instance_group_count WAVEGLOW.INSTANCE_GROUP_COUNT
How many instances in a group
--waveglow.max_queue_delay_microseconds WAVEGLOW.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--waveglow.trt_max_workspace_size WAVEGLOW.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--waveglow.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--waveglow.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--waveglow.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network
hifigan:
--hifigan.max_sequence_idle_microseconds HIFIGAN.MAX_SEQUENCE_IDLE_MICROSECONDS
Global timeout, in ms
--hifigan.max_batch_size HIFIGAN.MAX_BATCH_SIZE
Default maximum parallel requests in a single forward
pass
--hifigan.min_batch_size HIFIGAN.MIN_BATCH_SIZE
--hifigan.opt_batch_size HIFIGAN.OPT_BATCH_SIZE
--hifigan.preferred_batch_size HIFIGAN.PREFERRED_BATCH_SIZE
Preferred batch size, must be smaller than Max batch
size
--hifigan.batching_type HIFIGAN.BATCHING_TYPE
--hifigan.preserve_ordering HIFIGAN.PRESERVE_ORDERING
Preserve ordering
--hifigan.instance_group_count HIFIGAN.INSTANCE_GROUP_COUNT
How many instances in a group
--hifigan.max_queue_delay_microseconds HIFIGAN.MAX_QUEUE_DELAY_MICROSECONDS
Maximum amount of time to allow requests to queue to
form a batch in microseconds
--hifigan.trt_max_workspace_size HIFIGAN.TRT_MAX_WORKSPACE_SIZE
Maximum workspace size (in Mb) to use for model export
to TensorRT
--hifigan.use_onnx_runtime
Use ONNX runtime instead of TensorRT
--hifigan.use_trt_fp32
Use TensorRT engine with FP32 instead of FP16
--hifigan.fp16_needs_obey_precision_pass
Flag to explicitly mark layers as float when parsing
the ONNX network