Pipeline Configuration#

In the simplest use case, you can deploy an ASR pipeline to be used with the StreamingRecognize API call (refer to riva/proto/riva_asr.proto) without any language model as follows:

riva-build speech_recognition \
    /servicemaker-dev/<rmir_filename>:<encryption_key>  \
    /servicemaker-dev/<riva_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --wfst_tokenizer_model=<wfst_tokenizer_model> \
    --wfst_verbalizer_model=<wfst_verbalizer_model> \
    --decoder_type=greedy

where:

<rmir_filename> is the Riva rmir file that is generated
<riva_filename> is the name of the riva file to use as input
<encryption_key> is the key used to encrypt the files. The encryption key for the pre-trained Riva models uploaded on NGC is tlt_encode.
<name>,<acoustic_model_name> and <featurizer_name> are optional user-defined names for the components in the model repository.
<wfst_tokenizer_model> is the name of the WFST tokenizer model file to use for inverse text normalization of ASR transcripts. Refer to inverse-text-normalization for more details.
<wfst_verbalizer_model> is the name of the WFST verbalizer model file to use for inverse text normalization of ASR transcripts. Refer to inverse-text-normalization for more details.
decoder_type is the type of decoder to use. Valid values are flashlight, os2s, greedy and pass_through. We recommend using flashlight for all CTC models. Refer to Decoder Hyper-Parameters for more details.

Upon successful completion of this command, a file named <rmir_filename> is created in the /servicemaker-dev/ folder. Since no language model is specified, the Riva greedy decoder is used to predict the transcript based on the output of the acoustic model. If your .riva archives are encrypted you need to include :<encryption_key> at the end of the RMIR filename and Riva filename. Otherwise, this is unnecessary.

For embedded platforms, using a batch size of 1 is recommended since it achieves the lowest memory footprint. To use a batch size of 1, refer to the riva-build-optional-parameters section and set the various min_batch_size, max_batch_size, opt_batch_size, and max_execution_batch_size parameters to 1 while executing the riva-build command.

The following summary lists the riva-build commands used to generate the RMIR files from the Quick Start scripts for different models, modes, and their limitations:

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-GB-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<vocab_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=en-GB \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-GB-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-GB \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-GB-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<vocab_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=en-GB \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-en-GB-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-GB \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-en-GB-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<vocab_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=en-GB \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-en-GB-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-GB \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=es-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=es-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-es-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=es-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-es-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-ES-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --language_code=es-ES \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-ES-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-ES \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-ES-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --language_code=es-ES \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-es-ES-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-ES \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-es-ES-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --language_code=es-ES \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-es-ES-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-ES \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-de-DE-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-de-DE-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-de-DE-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-de-DE-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-de-DE-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-de-DE-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ru-RU-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ru-RU \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ru-RU-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ru-RU \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ru-RU-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ru-RU \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ru-RU-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ru-RU \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ru-RU-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ru-RU \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ru-RU-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ru-RU \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-zh-CN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.5 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-zh-CN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-zh-CN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.5 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-zh-CN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-zh-CN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.5 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-zh-CN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-hi-IN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-hi-IN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-hi-IN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-hi-IN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-hi-IN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-hi-IN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ja-JP-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ja-JP-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ja-JP-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ja-JP-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ja-JP-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ja-JP-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ar-AR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ar-AR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ar-AR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ar-AR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ar-AR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ar-AR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ar-AR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-it-IT-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.num_tokenization=1 \
  --language_code=it-IT

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-it-IT-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=it-IT

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-it-IT-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.num_tokenization=1 \
  --language_code=it-IT

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-it-IT-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=it-IT

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-it-IT-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.num_tokenization=1 \
  --language_code=it-IT

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-it-IT-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=it-IT

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ko-KR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --language_code=ko-KR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ko-KR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ko-KR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ko-KR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --language_code=ko-KR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ko-KR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ko-KR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ko-KR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.beam_size=32 \
  --language_code=ko-KR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ko-KR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ko-KR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-pt-BR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=pt-BR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-pt-BR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=pt-BR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-pt-BR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=pt-BR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-pt-BR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=pt-BR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-pt-BR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=pt-BR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-pt-BR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=pt-BR

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-fr-FR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=fr-FR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-fr-FR-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=fr-FR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-fr-FR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=fr-FR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-fr-FR-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=fr-FR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-fr-FR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=fr-FR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-fr-FR-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=fr-FR \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-NL-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --language_code=nl-NL \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-NL-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=nl-NL \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-NL-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --language_code=nl-NL \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-NL-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=nl-NL \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-nl-NL-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --language_code=nl-NL \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-nl-NL-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=nl-NL \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-BE-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --language_code=nl-BE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-BE-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=nl-BE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-BE-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --language_code=nl-BE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-nl-BE-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=nl-BE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-nl-BE-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.75 \
  --language_code=nl-BE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-nl-BE-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=nl-BE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-xl-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.use_trt_fp32 \
  --max_batch_size=4 \
  --nn.max_batch_size=4 \
  --nn.opt_batch_size=4 \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-xl-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.use_trt_fp32 \
  --max_batch_size=4 \
  --nn.max_batch_size=4 \
  --nn.opt_batch_size=4 \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-xl-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.use_trt_fp32 \
  --max_batch_size=4 \
  --nn.max_batch_size=4 \
  --nn.opt_batch_size=4 \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-xl-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.use_trt_fp32 \
  --max_batch_size=4 \
  --nn.max_batch_size=4 \
  --nn.opt_batch_size=4 \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-xl-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.use_trt_fp32 \
  --max_batch_size=4 \
  --nn.max_batch_size=4 \
  --nn.opt_batch_size=4 \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-xl-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.use_trt_fp32 \
  --max_batch_size=4 \
  --nn.max_batch_size=4 \
  --nn.opt_batch_size=4 \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-de-DE-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-de-DE-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-de-DE-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-de-DE-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-de-DE-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-de-DE-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=de-DE \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-zh-CN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=16 \
  --flashlight_decoder.beam_size_token=16 \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.5 \
  --flashlight_decoder.beam_threshold=10. \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --append_space_to_transcripts=False \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-zh-CN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-zh-CN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=16 \
  --flashlight_decoder.beam_size_token=16 \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.5 \
  --flashlight_decoder.beam_threshold=10. \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --append_space_to_transcripts=False \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-zh-CN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-zh-CN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=16 \
  --flashlight_decoder.beam_size_token=16 \
  --flashlight_decoder.lm_weight=0.7 \
  --flashlight_decoder.word_insertion_score=0.5 \
  --flashlight_decoder.beam_threshold=10. \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --append_space_to_transcripts=False \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-zh-CN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ja-JP-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ja-JP-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ja-JP-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ja-JP-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-ja-JP-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-ja-JP-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ml-cs-es-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ml-cs-es-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ml-cs-es-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-ml-cs-es-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ml-cs-es-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.lm_weight=0.2 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.beam_threshold=20. \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-ml-cs-es-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ml-cs-ja-en-JP-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-en-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ml-cs-ja-en-JP-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-en-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ml-cs-ja-en-JP-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-en-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=conformer-unified-ml-cs-ja-en-JP-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=0.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-en-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-ml-cs-ja-en-JP-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --append_space_to_transcripts=False \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --flashlight_decoder.use_lexicon_free_decoding=True \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.5 \
  --flashlight_decoder.word_insertion_score=0.2 \
  --flashlight_decoder.blank_token=_ \
  --flashlight_decoder.sil_token=" \
  " \
  --language_code=ja-en-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=conformer-unified-ml-cs-ja-en-JP-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=40 \
  --endpointing.start_history=200 \
  --nn.fp16_needs_obey_precision_pass \
  --endpointing.residue_blanks_at_start=-2 \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --max_batch_size=16 \
  --featurizer.max_batch_size=512 \
  --featurizer.max_execution_batch_size=512 \
  --append_space_to_transcripts=False \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=ja-en-JP

riva-build diarizer \
  <rmir_filename>:<key> \
  <riva_vad_file>:<key> \
  <riva_speaker_recognition_file>:<key> \
  --diarizer_backend.offline \
  --embedding_extractor_nn.max_batch_size=32 \
  --embedding_extractor_nn.use_onnx_runtime \
  --embedding_extractor_nn.optimization_graph_level=-1 \
  --clustering_backend.max_batch_size=0 \
  --chunk_size=300 \
  --audio_sec_limit=4001 \
  --diarizer_backend.language_code=generic

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --chunk_size=0.96 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --chunk_size=0.96 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-0.6b-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-0.6b-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_lexicon=<txt_decoding_lexicon_file> \
  --flashlight_decoder.lm_weight=0.1 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.96 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_lexicon=<txt_decoding_lexicon_file> \
  --flashlight_decoder.lm_weight=0.1 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.96 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-0.6b-unified-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_lexicon=<txt_decoding_lexicon_file> \
  --flashlight_decoder.lm_weight=0.1 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-0.6b-unified-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-zh-CN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.4 \
  --flashlight_decoder.word_insertion_score=1.5 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-zh-CN-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=greedy \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-zh-CN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.4 \
  --flashlight_decoder.word_insertion_score=1.5 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-zh-CN-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.16 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=greedy \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --offline  \
  --name=parakeet-0.6b-unified-zh-CN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --nn.fp16_needs_obey_precision_pass   \
  --unified_acoustic_model  \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=30. \
  --flashlight_decoder.lm_weight=0.4 \
  --flashlight_decoder.word_insertion_score=1.5 \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --offline  \
  --name=parakeet-0.6b-unified-zh-CN-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --nn.fp16_needs_obey_precision_pass   \
  --unified_acoustic_model  \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --language_code=zh-CN \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.32 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_lexicon=<txt_decoding_lexicon_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.32 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.96 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_lexicon=<txt_decoding_lexicon_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.96 \
  --left_padding_size=3.92 \
  --right_padding_size=3.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_lexicon=<txt_decoding_lexicon_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-0.6b-unified-ml-cs-es-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-16 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=es-en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=0.96 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=0.96 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-1.1b-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file> \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-1.1b-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --name=parakeet-1.1b-en-US-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --endpointing.residue_blanks_at_start=-2  \
  --nn.fp16_needs_obey_precision_pass   \
  --chunk_size=0.16 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=/mnt/nvdl/datasets<txt_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20.  \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file>  \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --name=parakeet-1.1b-en-US-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --endpointing.residue_blanks_at_start=-2  \
  --nn.fp16_needs_obey_precision_pass   \
  --chunk_size=0.96 \
  --left_padding_size=1.92 \
  --right_padding_size=1.92 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=/mnt/nvdl/datasets<txt_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20.  \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file>  \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --offline  \
  --name=parakeet-1.1b-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --nn.fp16_needs_obey_precision_pass   \
  --chunk_size=4.8 \
  --left_padding_size=1.6 \
  --right_padding_size=1.6 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --flashlight_decoder.asr_model_delay=-1 \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=/mnt/nvdl/datasets<txt_file> \
  --flashlight_decoder.lm_weight=0.8 \
  --flashlight_decoder.word_insertion_score=1.0 \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_threshold=20.  \
  --flashlight_decoder.num_tokenization=1 \
  --profane_words_file=<txt_profane_words_file>  \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-universal-multi-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.32 \
  --left_padding_size=4.64 \
  --right_padding_size=4.64 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-universal-multi-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=1.6 \
  --left_padding_size=4.0 \
  --right_padding_size=4.0 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-1.1b-unified-ml-cs-universal-multi-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=3.2 \
  --right_padding_size=3.2 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-concat-multi-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.32 \
  --left_padding_size=4.64 \
  --right_padding_size=4.64 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-concat-multi-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=1.6 \
  --left_padding_size=4.0 \
  --right_padding_size=4.0 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-1.1b-unified-ml-cs-concat-multi-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=3.2 \
  --right_padding_size=3.2 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.32 \
  --left_padding_size=4.64 \
  --right_padding_size=4.64 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.lm_weight=0.33 \
  --flashlight_decoder.word_insertion_score=0.01 \
  --language_code=em-ea

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=0.32 \
  --left_padding_size=4.64 \
  --right_padding_size=4.64 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=em-ea

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=1.6 \
  --left_padding_size=4.0 \
  --right_padding_size=4.0 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.lm_weight=0.33 \
  --flashlight_decoder.word_insertion_score=0.01 \
  --language_code=em-ea

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --endpointing.residue_blanks_at_start=-2 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=1.6 \
  --left_padding_size=4.0 \
  --right_padding_size=4.0 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=em-ea

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=3.2 \
  --right_padding_size=3.2 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=flashlight \
  --decoding_language_model_binary=<bin_file> \
  --decoding_vocab=<txt_decoding_vocab_file> \
  --flashlight_decoder.beam_size=32 \
  --flashlight_decoder.beam_size_token=32 \
  --flashlight_decoder.beam_threshold=20. \
  --flashlight_decoder.lm_weight=0.33 \
  --flashlight_decoder.word_insertion_score=0.01 \
  --language_code=em-ea

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-1.1b-unified-ml-cs-em-ea-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --unified_acoustic_model \
  --chunk_size=4.8 \
  --left_padding_size=3.2 \
  --right_padding_size=3.2 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --decoder_type=greedy \
  --greedy_decoder.asr_model_delay=-1 \
  --language_code=em-ea

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=parakeet-rnnt-1.1b-en-US-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80 \
  --nn.fp16_needs_obey_precision_pass \
  --chunk_size=8.0 \
  --left_padding_size=0 \
  --right_padding_size=0 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --max_batch_size=128 \
  --decoder_type=nemo \
  --language_code=en-US \
  --wfst_tokenizer_model=<far_tokenizer_file> \
  --wfst_verbalizer_model=<far_verbalizer_file> \
  --speech_hints_model=<far_speech_hints_file>

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --endpointing.residue_blanks_at_start=-2  \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi  \
  --nn.fp16_needs_obey_precision_pass   \
  --unified_acoustic_model  \
  --chunk_size=0.32 \
  --left_padding_size=4.64 \
  --right_padding_size=4.64 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --max_batch_size=128 \
  --append_space_to_transcripts=False  \
  --decoder_type=nemo

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming-throughput \
  --return_separate_utterances=False \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --endpointing.residue_blanks_at_start=-2  \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi  \
  --nn.fp16_needs_obey_precision_pass   \
  --unified_acoustic_model  \
  --chunk_size=1.6 \
  --left_padding_size=4.0 \
  --right_padding_size=4.0 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --max_batch_size=128 \
  --append_space_to_transcripts=False  \
  --decoder_type=nemo

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --offline  \
  --name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-offline \
  --return_separate_utterances=True \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --ms_per_timestep=80  \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi  \
  --nn.fp16_needs_obey_precision_pass   \
  --unified_acoustic_model  \
  --chunk_size=8.0 \
  --left_padding_size=0 \
  --right_padding_size=0 \
  --featurizer.max_batch_size=256 \
  --featurizer.max_execution_batch_size=256 \
  --max_batch_size=128 \
  --append_space_to_transcripts=False  \
  --decoder_type=nemo

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=whisper-large-v3-multi-asr-offline \
  --return_separate_utterances=True \
  --chunk_size 30 \
  --left_padding_size 0 \
  --right_padding_size 0 \
  --decoder_type trtllm \
  --unified_acoustic_model \
  --feature_extractor_type torch \
  --featurizer.norm_per_feature false \
  --max_batch_size 8 \
  --featurizer.precalc_norm_params False \
  --featurizer.max_batch_size=8 \
  --featurizer.max_execution_batch_size=8 \
  --language_code=en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --offline  \
  --name=whisper-large-v3-turbo-multi-asr-offline \
  --return_separate_utterances=True \
  --unified_acoustic_model  \
  --chunk_size 30 \
  --left_padding_size 0 \
  --right_padding_size 0  \
  --decoder_type trtllm  \
  --feature_extractor_type torch \
  --torch_feature_type whisper \
  --featurizer.norm_per_feature false \
  --max_batch_size 8 \
  --featurizer.precalc_norm_params False  \
  --featurizer.max_batch_size=8 \
  --featurizer.max_execution_batch_size=8 \
  --language_code=en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=distil-whisper-large-v3-en-US-asr-offline \
  --return_separate_utterances=True \
  --chunk_size 30 \
  --left_padding_size 0 \
  --right_padding_size 0 \
  --decoder_type trtllm \
  --unified_acoustic_model \
  --feature_extractor_type torch \
  --featurizer.norm_per_feature false \
  --max_batch_size 8 \
  --featurizer.precalc_norm_params False \
  --featurizer.max_batch_size=8 \
  --featurizer.max_execution_batch_size=8 \
  --language_code=en-US

riva-build speech_recognition <rmir_filename>:<key>  \
  <riva_file>:<key> \
  --offline  \
  --name=kotoba-whisper-v2.2-ja-JP-asr-offline \
  --return_separate_utterances=True \
  --unified_acoustic_model  \
  --chunk_size 30 \
  --left_padding_size 0 \
  --right_padding_size 0  \
  --decoder_type trtllm  \
  --feature_extractor_type torch \
  --torch_feature_type whisper \
  --featurizer.norm_per_feature false \
  --max_batch_size 8 \
  --featurizer.precalc_norm_params False  \
  --featurizer.max_batch_size=8 \
  --featurizer.max_execution_batch_size=8 \
  --language_code=ja-JP

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=canary-1b-multi-asr-offline \
  --return_separate_utterances=True \
  --chunk_size 30 \
  --left_padding_size 0 \
  --right_padding_size 0 \
  --decoder_type nemo \
  --nemo_decoder.nemo_decoder_type canary \
  --feature_extractor_type torch \
  --torch_feature_type nemo \
  --featurizer.norm_per_feature false \
  --max_batch_size 8 \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_params False \
  --featurizer.max_batch_size=128 \
  --featurizer.max_execution_batch_size=128 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN

riva-build speech_recognition \
  <rmir_filename>:<key> \
  <riva_file>:<key> \
  --offline \
  --name=canary-0.6b-turbo-multi-asr-offline \
  --return_separate_utterances=True \
  --chunk_size 30 \
  --left_padding_size 0 \
  --right_padding_size 0 \
  --decoder_type nemo \
  --nemo_decoder.nemo_decoder_type canary \
  --feature_extractor_type torch \
  --torch_feature_type nemo \
  --featurizer.norm_per_feature false \
  --max_batch_size 8 \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_params False \
  --featurizer.max_batch_size=128 \
  --featurizer.max_execution_batch_size=128 \
  --language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN

For details about the parameters passed to riva-build to customize the ASR pipeline, run:

riva-build <pipeline> -h

Streaming/Offline Recognition#

The Riva ASR pipeline can be configured for both streaming and offline recognition use cases. When using the StreamingRecognize API call (refer to riva/proto/riva_asr.proto), we recommend the following riva-build parameters for low-latency streaming recognition with the Conformer acoustic model:

riva-build speech_recognition \
    /servicemaker-dev/<rmir_filename>:<encryption_key> \
    /servicemaker-dev/<riva_filename>:<encryption_key> \
    --name=<pipeline_name> \
    --wfst_tokenizer_model=<wfst_tokenizer_model> \
    --wfst_verbalizer_model=<wfst_verbalizer_model> \
    --decoder_type=greedy \
    --chunk_size=0.16 \
    --padding_size=1.92 \
    --ms_per_timestep=40 \
    --nn.fp16_needs_obey_precision_pass \
    --greedy_decoder.asr_model_delay=-1 \
    --endpointing.residue_blanks_at_start=-2 \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False

For high throughput streaming recognition with the StreamingRecognize API call, chunk_size and padding_size can be set as follows:

    --chunk_size=0.8 \
    --padding_size=1.6

Finally, to configure the ASR pipeline for offline recognition with the Recognize API call (refer to riva/proto/riva_asr.proto), we recommend the following settings with the Conformer acoustic model:

     --offline \
     --chunk_size=4.8 \
     --padding_size=1.6

Note

When deploying the offline ASR models with riva-deploy, TensorRT warnings indicating that memory requirements of format conversion cannot be satisfied might appear in the logs. These warnings should not affect functionality and can be ignored.

Language Models#

Riva ASR supports decoding with an n-gram language model. The n-gram language model can be provided in a few different ways.

A .arpa format file.
A KenLM binary format file.

For more information on building language models, refer to the training-language-models section.

ARPA Format Language Model#

To configure the Riva ASR pipeline to use an n-gram language model stored in arpa format, replace:

    --decoder_type=greedy

with

    --decoder_type=flashlight \
    --decoding_language_model_arpa=<arpa_filename> \
    --decoding_vocab=<decoder_vocab_file>

KenLM Binary Language Model#

To generate the Riva RMIR file when using a KenLM binary file to specify the language model, replace:

    --decoder_type=greedy

with

    --decoder_type=flashlight \
    --decoding_language_model_binary=<KENLM_binary_filename> \
    --decoding_vocab=<decoder_vocab_file>

Decoder Hyper-Parameters#

The decoder language model hyper-parameters can also be specified from the riva-build command.

You can specify the Flashlight decoder hyper-parameters beam_size, beam_size_token, beam_threshold, lm_weight and word_insertion_score by specifying

    --decoder_type=flashlight \
    --decoding_language_model_binary=<arpa_filename> \
    --decoding_vocab=<decoder_vocab_file> \
    --flashlight_decoder.beam_size=<beam_size> \
    --flashlight_decoder.beam_size_token=<beam_size_token> \
    --flashlight_decoder.beam_threshold=<beam_threshold> \
    --flashlight_decoder.lm_weight=<lm_weight> \
    --flashlight_decoder.word_insertion_score=<word_insertion_score>

Where:

beam_size is the maximum number of hypothesis the decoder holds at each step
beam_size_token is the maximum number of tokens the decoder considers at each step
beam_threshold is the threshold to prune hypothesis
lm_weight is the weight of the language model used when scoring hypothesis
word_insertion_score is the word insertion score used when scoring hypothesis

For advanced users, additional decoder hyper-parameters can also be specified. Refer to Riva-build Optional Parameters for a list of those parameters and their description.

Flashlight Decoder Lexicon#

The Flashlight decoder used in Riva is a lexicon-based decoder and only emits words that are present in the decoder vocabulary file passed to the riva-build command. The decoder vocabulary file used to generate the ASR pipelines in the Quick Start scripts include words that cover a wide range of domains and should provide accurate transcripts for most applications.

It is also possible to build an ASR pipeline using your own decoder vocabulary file by using the parameter --decoding_vocab of the riva-build command. For example, you could start with the riva-build commands used to generate the ASR pipelines in our Quick Start scripts from section Pipeline Configuration and provide your own lexicon decoder vocabulary file. You will need to ensure that words of interest are in the decoder vocabulary file. The Riva ServiceMaker automatically tokenizes the words in the decoder vocabulary file. The number of tokenization for each word in the decoder vocabulary file can be controlled with the --flashlight_decoder.num_tokenization parameter.

(Advanced) Manually Adding Additional Tokenizations of Words in Lexicon#

It is also possible to manually add additional tokenizations for the words in the decoder vocabulary by performing the following steps:

The riva-build and riva-deploy commands provided in the previous section store the lexicon in the /data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt file of the Triton model repository.

To add additional tokenizations to the lexicon, copy the lexicon file:

cp /data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt decoding_lexicon.txt

and add the SentencePiece tokenization for the word of interest. For example, you could add:

manu ▁ma n u
manu ▁man n n ew
manu ▁man n ew

to the decoding_lexicon.txt file so that the word manu is generated in the transcript if the acoustic model predicts those tokens. You will need to ensure that the new lines follow the indentation/space pattern like the rest of the file and that the tokens used are part of the tokenizer model. After this is done, regenerate the model repository using the new decoding lexicon by passing --decoding_lexicon=decoding_lexicon.txt to riva-build instead of --decoding_vocab=decoding_vocab.txt.

Flashlight Decoder Lexicon Free#

The Flashlight decoder can also be used without a lexicon. Lexicon free decoding is performed with a character based language model. Lexicon free decoding with flashlight can be enabled by adding --flashlight_decoder.use_lexicon_free_decoding=True to riva-build and specifying a character based language model via --decoding_language_model_binary=<path/to/charlm>.

OpenSeq2Seq Decoder#

Riva uses the OpenSeq2Seq decoder for beam-search decoding with a language model. For example:

riva-build speech_recognition \
   <rmir_filename>:<key> <riva_filename>:<key> \
   --name=citrinet-1024-zh-CN-asr-streaming \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --endpointing.residue_blanks_at_start=-2 \
   --chunk_size=0.16 \
   --left_padding_size=1.92 \
   --right_padding_size=1.92 \
   --decoder_type=os2s \
   --os2s_decoder.language_model_alpha=0.5 \
   --os2s_decoder.language_model_beta=1.0 \
   --os2s_decoder.beam_search_width=128 \
   --language_code=zh-CN

Where:

--os2s_decoder.language_model_alpha is the weight given to the language model during the beam search.
--os2s_decoder.language_model_beta is the word insertion score.
--os2s_decoder.beam_search_width is the number of partial hypotheses to keep at each step of the beam search.

All of these parameters effect performance. Latency increases as these parameters increase in value. The suggested ranges are listed below.

Parameter	Minimum	Maximum
`--os2s_decoder.beam_search_width`	16	64
`--os2s_decoder.language_model_alpha`	0.5	1.5
`--os2s_decoder.language_model_beta`	1.0	3.0

Beginning/End of Utterance Detection#

Riva ASR uses an algorithm that detects the beginning and end of utterances. This algorithm is used to reset the ASR decoder state, and to trigger a call to the punctuator model. By default, the beginning of an utterance is flagged when 20% of the frames in a 300ms window has nonblank characters. The end of an utterance is flagged when 98% of the frames in a 800ms window are blank characters. You can tune those values for their particular use case by using the following riva-build parameters:

  --endpointing.start_history=300 \
  --endpointing.start_th=0.2 \
  --endpointing.stop_history=800 \
  --endpointing.stop_th=0.98

Additionally, it is possible to disable the beginning/end of utterance detection by passing --endpointing_type=none to riva-build.

Note that in this case, the decoder state resets after the full audio signal has been sent by the client. Similarly, the punctuator model is only called once.

Streaming Speaker Diarization#

Riva currently supports speaker diarization in streaming mode via the Sortformer Diarizer model. For more details on Sortformer speaker diarization, refer to the Streaming Speaker Diarization section in the ASR Overview.

Sortformer#

To enable Sortformer speaker diarization in the ASR pipeline, pass the following additional parameters to riva-build when building a streaming ASR model:

<sortformer_diarizer_riva_filename>:<encryption_key>
--diarizer_type=sortformer

where:

<sortformer_diarizer_riva_filename> is the .riva Sortformer model to use. For example, you can use the Sortformer Diarizer Riva model available on NGC.
<encryption_key> is the key used to encrypt the file. The encryption key for the pre-trained Riva models uploaded on NGC is tlt_encode.

Note: Sortformer currently supports up to maximum of 4 speakers.

Neural-Based Voice Activity Detection#

It is possible to use a neural-based Voice Activity Detection (VAD) algorithm in Riva ASR. This can help to filter out noise in the audio, and can help reduce spurious words from appearing in the ASR transcripts. To use the neural-based VAD algorithm in the ASR pipeline, pass the following additional parameters to riva-build:

Silero VAD#

<silero_vad_riva_filename>:<encryption_key>
--vad_type=silero
--neural_vad_nn.optimization_graph_level=-1
--neural_vad.filter_speech_first false
--neural_vad.onset=0.85
--neural_vad.offset=0.3
--neural_vad.min_duration_on=0.2
--neural_vad.min_duration_off=0.5
--neural_vad.pad_offset=0.08
--neural_vad.pad_onset=0.3
--neural_vad.features_mask_value=-16.635

where:

<silero_vad_riva_filename> is the .riva silero VAD model to use. For example, you can use the Silero VAD Riva model available on NGC.
<encryption_key> is the key used to encrypt the file. The encryption key for the pre-trained Riva models uploaded on NGC is tlt_encode.
--neural_vad.onset is the minimum probability threshold for detecting the start of a speech segment.
--neural_vad.offset is the minimum probability threshold for detecting the end of a speech segment.
--neural_vad.min_duration_on is the minimum duration of a speech segment to be considered as a speech segment.
--neural_vad.min_duration_off is the minimum duration of a non-speech segment to be considered as a non-speech segment.
--neural_vad.pad_onset is the duration of audio (in seconds) to pad the onset of a speech segment.
--neural_vad.pad_offset is the duration of audio (in seconds) to pad the offset of a speech segment.
--neural_vad.features_mask_value is the value to use to mask the features of a non-speech segment.

MarbleNet VAD#

<marblenet_vad_riva_filename>:<encryption_key>
--vad_type=neural
--neural_vad_nn.optimization_graph_level=-1

where:

<marblenet_vad_riva_filename> is the .riva marblenet VAD model to use. For example, you can use the MarbleNet VAD Riva model available on NGC.
<encryption_key> is the key used to encrypt the file. The encryption key for the pre-trained Riva models uploaded on NGC is tlt_encode.

Note that using a neural VAD component in the ASR pipeline will have an impact on latency and throughput of the deployed Riva ASR server.

Generating Multiple Transcript Hypotheses#

By default, the Riva ASR pipeline is configured to only generate the best transcript hypothesis for each utterance. It is possible to generate multiple transcript hypotheses by passing the parameter --max_supported_transcripts=N to the riva-build command, where N is the maximum number of hypotheses to generate. With these changes, the client application can retrieve the multiple hypotheses by setting the max_alternatives field of RecognitionConfig to values greater than 1.

Impact of Chunk Size and Padding Size on Performance and Accuracy (Advanced)#

The chunk_size and padding_size parameters used to configure Riva ASR can have a significant impact on accuracy and performance. A brief description of those parameters can be found in section Riva-build Optional Parameters. Riva provides pre-configured ASR pipelines, with preset values of chunk_size and padding_size: a low-latency streaming configuration, a high throughput streaming configuration, and an offline configuration. Those configurations should suit most deployment scenarios. The chunk_size and padding_size values used for those configurations can be found in a table in section Pipeline Configuration.

The chunk_size parameter is the duration of the audio chunk in seconds processed by the Riva server for every streaming request. Hence, in streaming mode, Riva returns one response for every chunk_size seconds of audio. A lower value of chunk_size will therefore reduce the user-perceived latency as the transcript will get updated more frequently.

The padding_size parameter is the duration in seconds of the padding prepended and appended to the chunk_size. The Riva acoustic model processes an input tensor corresponding to an audio duration of 2*(padding_size) + chunk_size for every new chunk of audio it receives. Increasing padding_size or chunk_size typically helps to improve accuracy of the transcripts since the acoustic model has access to more context. However, increasing padding_size reduces the maximum number of concurrent streams supported by Riva ASR, since it will increase the size of the input tensor fed to the acoustic model for every new chunk.

Riva-build Optional Parameters#

For details about the parameters passed to riva-build to customize the ASR pipeline, issue:

riva-build speech_recognition -h

The following list includes descriptions for all optional parameters currently recognized by riva-build:

usage: riva-build speech_recognition [-h] [-f] [-v]
                                     [--language_code LANGUAGE_CODE]
                                     [--instance_group_count INSTANCE_GROUP_COUNT]
                                     [--kind KIND]
                                     [--max_batch_size MAX_BATCH_SIZE]
                                     [--max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--batching_type BATCHING_TYPE]
                                     [--acoustic_model_name ACOUSTIC_MODEL_NAME]
                                     [--featurizer_name FEATURIZER_NAME]
                                     [--name NAME] [--streaming STREAMING]
                                     [--offline] [--vad_type VAD_TYPE]
                                     [--diarizer_type DIARIZER_TYPE]
                                     [--unified_acoustic_model]
                                     [--endpointing_type ENDPOINTING_TYPE]
                                     [--chunk_size CHUNK_SIZE]
                                     [--padding_factor PADDING_FACTOR]
                                     [--left_padding_size LEFT_PADDING_SIZE]
                                     [--right_padding_size RIGHT_PADDING_SIZE]
                                     [--padding_size PADDING_SIZE]
                                     [--max_supported_transcripts MAX_SUPPORTED_TRANSCRIPTS]
                                     [--ms_per_timestep MS_PER_TIMESTEP]
                                     [--force_decoder_reset_after_ms FORCE_DECODER_RESET_AFTER_MS]
                                     [--lattice_beam LATTICE_BEAM]
                                     [--decoding_language_model_arpa DECODING_LANGUAGE_MODEL_ARPA]
                                     [--decoding_language_model_binary DECODING_LANGUAGE_MODEL_BINARY]
                                     [--decoding_language_model_fst DECODING_LANGUAGE_MODEL_FST]
                                     [--decoding_language_model_words DECODING_LANGUAGE_MODEL_WORDS]
                                     [--rescoring_language_model_arpa RESCORING_LANGUAGE_MODEL_ARPA]
                                     [--decoding_language_model_carpa DECODING_LANGUAGE_MODEL_CARPA]
                                     [--rescoring_language_model_carpa RESCORING_LANGUAGE_MODEL_CARPA]
                                     [--decoding_lexicon DECODING_LEXICON]
                                     [--decoding_vocab DECODING_VOCAB]
                                     [--tokenizer_model TOKENIZER_MODEL]
                                     [--decoder_type DECODER_TYPE]
                                     [--stddev_floor STDDEV_FLOOR]
                                     [--wfst_tokenizer_model WFST_TOKENIZER_MODEL]
                                     [--wfst_verbalizer_model WFST_VERBALIZER_MODEL]
                                     [--wfst_pre_process_model WFST_PRE_PROCESS_MODEL]
                                     [--wfst_post_process_model WFST_POST_PROCESS_MODEL]
                                     [--speech_hints_model SPEECH_HINTS_MODEL]
                                     [--buffer_look_ahead BUFFER_LOOK_AHEAD]
                                     [--buffer_context_history BUFFER_CONTEXT_HISTORY]
                                     [--buffer_threshold BUFFER_THRESHOLD]
                                     [--buffer_max_timeout_frames BUFFER_MAX_TIMEOUT_FRAMES]
                                     [--profane_words_file PROFANE_WORDS_FILE]
                                     [--append_space_to_transcripts APPEND_SPACE_TO_TRANSCRIPTS]
                                     [--enable_vad_endpointing ENABLE_VAD_ENDPOINTING]
                                     [--return_separate_utterances RETURN_SEPARATE_UTTERANCES]
                                     [--mel_basis_file_path MEL_BASIS_FILE_PATH]
                                     [--diar_mel_basis_file_path DIAR_MEL_BASIS_FILE_PATH]
                                     [--feature_extractor_type FEATURE_EXTRACTOR_TYPE]
                                     [--torch_feature_type TORCH_FEATURE_TYPE]
                                     [--torch_feature_device TORCH_FEATURE_DEVICE]
                                     [--execution_environment_path EXECUTION_ENVIRONMENT_PATH]
                                     [--share_flags SHARE_FLAGS]
                                     [--att_context_size ATT_CONTEXT_SIZE]
                                     [--featurizer.max_sequence_idle_microseconds FEATURIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--featurizer.max_batch_size FEATURIZER.MAX_BATCH_SIZE]
                                     [--featurizer.min_batch_size FEATURIZER.MIN_BATCH_SIZE]
                                     [--featurizer.opt_batch_size FEATURIZER.OPT_BATCH_SIZE]
                                     [--featurizer.preferred_batch_size FEATURIZER.PREFERRED_BATCH_SIZE]
                                     [--featurizer.batching_type FEATURIZER.BATCHING_TYPE]
                                     [--featurizer.preserve_ordering FEATURIZER.PRESERVE_ORDERING]
                                     [--featurizer.instance_group_count FEATURIZER.INSTANCE_GROUP_COUNT]
                                     [--featurizer.max_queue_delay_microseconds FEATURIZER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--featurizer.optimization_graph_level FEATURIZER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--featurizer.max_execution_batch_size FEATURIZER.MAX_EXECUTION_BATCH_SIZE]
                                     [--featurizer.gain FEATURIZER.GAIN]
                                     [--featurizer.dither FEATURIZER.DITHER]
                                     [--featurizer.use_utterance_norm_params FEATURIZER.USE_UTTERANCE_NORM_PARAMS]
                                     [--featurizer.precalc_norm_time_steps FEATURIZER.PRECALC_NORM_TIME_STEPS]
                                     [--featurizer.precalc_norm_params FEATURIZER.PRECALC_NORM_PARAMS]
                                     [--featurizer.norm_per_feature FEATURIZER.NORM_PER_FEATURE]
                                     [--featurizer.mean FEATURIZER.MEAN]
                                     [--featurizer.stddev FEATURIZER.STDDEV]
                                     [--featurizer.transpose FEATURIZER.TRANSPOSE]
                                     [--featurizer.padding_size FEATURIZER.PADDING_SIZE]
                                     [--featurizer.int64_features_length FEATURIZER.INT64_FEATURES_LENGTH]
                                     [--nn.max_sequence_idle_microseconds NN.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--nn.max_batch_size NN.MAX_BATCH_SIZE]
                                     [--nn.min_batch_size NN.MIN_BATCH_SIZE]
                                     [--nn.opt_batch_size NN.OPT_BATCH_SIZE]
                                     [--nn.preferred_batch_size NN.PREFERRED_BATCH_SIZE]
                                     [--nn.batching_type NN.BATCHING_TYPE]
                                     [--nn.preserve_ordering NN.PRESERVE_ORDERING]
                                     [--nn.instance_group_count NN.INSTANCE_GROUP_COUNT]
                                     [--nn.max_queue_delay_microseconds NN.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--nn.optimization_graph_level NN.OPTIMIZATION_GRAPH_LEVEL]
                                     [--nn.trt_max_workspace_size NN.TRT_MAX_WORKSPACE_SIZE]
                                     [--nn.use_onnx_runtime]
                                     [--nn.use_torchscript]
                                     [--nn.use_trt_fp32]
                                     [--nn.fp16_needs_obey_precision_pass]
                                     [--nn.am_len_input_use_int64 NN.AM_LEN_INPUT_USE_INT64]
                                     [--nn.am_cache_len_input_use_int64 NN.AM_CACHE_LEN_INPUT_USE_INT64]
                                     [--nn.language_code NN.LANGUAGE_CODE]
                                     [--nn.engine_dir NN.ENGINE_DIR]
                                     [--nn.EXECUTION_ENV_PATH NN.EXECUTION_ENV_PATH]
                                     [--endpointing.max_sequence_idle_microseconds ENDPOINTING.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--endpointing.max_batch_size ENDPOINTING.MAX_BATCH_SIZE]
                                     [--endpointing.min_batch_size ENDPOINTING.MIN_BATCH_SIZE]
                                     [--endpointing.opt_batch_size ENDPOINTING.OPT_BATCH_SIZE]
                                     [--endpointing.preferred_batch_size ENDPOINTING.PREFERRED_BATCH_SIZE]
                                     [--endpointing.batching_type ENDPOINTING.BATCHING_TYPE]
                                     [--endpointing.preserve_ordering ENDPOINTING.PRESERVE_ORDERING]
                                     [--endpointing.instance_group_count ENDPOINTING.INSTANCE_GROUP_COUNT]
                                     [--endpointing.max_queue_delay_microseconds ENDPOINTING.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--endpointing.optimization_graph_level ENDPOINTING.OPTIMIZATION_GRAPH_LEVEL]
                                     [--endpointing.ms_per_timestep ENDPOINTING.MS_PER_TIMESTEP]
                                     [--endpointing.start_history ENDPOINTING.START_HISTORY]
                                     [--endpointing.stop_history ENDPOINTING.STOP_HISTORY]
                                     [--endpointing.stop_history_eou ENDPOINTING.STOP_HISTORY_EOU]
                                     [--endpointing.start_th ENDPOINTING.START_TH]
                                     [--endpointing.stop_th ENDPOINTING.STOP_TH]
                                     [--endpointing.stop_th_eou ENDPOINTING.STOP_TH_EOU]
                                     [--endpointing.residue_blanks_at_start ENDPOINTING.RESIDUE_BLANKS_AT_START]
                                     [--endpointing.residue_blanks_at_end ENDPOINTING.RESIDUE_BLANKS_AT_END]
                                     [--endpointing.vocab_file ENDPOINTING.VOCAB_FILE]
                                     [--neural_vad.max_sequence_idle_microseconds NEURAL_VAD.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--neural_vad.max_batch_size NEURAL_VAD.MAX_BATCH_SIZE]
                                     [--neural_vad.min_batch_size NEURAL_VAD.MIN_BATCH_SIZE]
                                     [--neural_vad.opt_batch_size NEURAL_VAD.OPT_BATCH_SIZE]
                                     [--neural_vad.preferred_batch_size NEURAL_VAD.PREFERRED_BATCH_SIZE]
                                     [--neural_vad.batching_type NEURAL_VAD.BATCHING_TYPE]
                                     [--neural_vad.preserve_ordering NEURAL_VAD.PRESERVE_ORDERING]
                                     [--neural_vad.instance_group_count NEURAL_VAD.INSTANCE_GROUP_COUNT]
                                     [--neural_vad.max_queue_delay_microseconds NEURAL_VAD.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--neural_vad.optimization_graph_level NEURAL_VAD.OPTIMIZATION_GRAPH_LEVEL]
                                     [--neural_vad.load_model NEURAL_VAD.LOAD_MODEL]
                                     [--neural_vad.batch_mode NEURAL_VAD.BATCH_MODE]
                                     [--neural_vad.decoupled_mode NEURAL_VAD.DECOUPLED_MODE]
                                     [--neural_vad.onset NEURAL_VAD.ONSET]
                                     [--neural_vad.offset NEURAL_VAD.OFFSET]
                                     [--neural_vad.pad_onset NEURAL_VAD.PAD_ONSET]
                                     [--neural_vad.pad_offset NEURAL_VAD.PAD_OFFSET]
                                     [--neural_vad.min_duration_on NEURAL_VAD.MIN_DURATION_ON]
                                     [--neural_vad.min_duration_off NEURAL_VAD.MIN_DURATION_OFF]
                                     [--neural_vad.filter_speech_first NEURAL_VAD.FILTER_SPEECH_FIRST]
                                     [--neural_vad.features_mask_value NEURAL_VAD.FEATURES_MASK_VALUE]
                                     [--neural_vad_nn.max_sequence_idle_microseconds NEURAL_VAD_NN.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--neural_vad_nn.max_batch_size NEURAL_VAD_NN.MAX_BATCH_SIZE]
                                     [--neural_vad_nn.min_batch_size NEURAL_VAD_NN.MIN_BATCH_SIZE]
                                     [--neural_vad_nn.opt_batch_size NEURAL_VAD_NN.OPT_BATCH_SIZE]
                                     [--neural_vad_nn.preferred_batch_size NEURAL_VAD_NN.PREFERRED_BATCH_SIZE]
                                     [--neural_vad_nn.batching_type NEURAL_VAD_NN.BATCHING_TYPE]
                                     [--neural_vad_nn.preserve_ordering NEURAL_VAD_NN.PRESERVE_ORDERING]
                                     [--neural_vad_nn.instance_group_count NEURAL_VAD_NN.INSTANCE_GROUP_COUNT]
                                     [--neural_vad_nn.max_queue_delay_microseconds NEURAL_VAD_NN.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--neural_vad_nn.optimization_graph_level NEURAL_VAD_NN.OPTIMIZATION_GRAPH_LEVEL]
                                     [--neural_vad_nn.trt_max_workspace_size NEURAL_VAD_NN.TRT_MAX_WORKSPACE_SIZE]
                                     [--neural_vad_nn.use_onnx_runtime]
                                     [--neural_vad_nn.use_torchscript]
                                     [--neural_vad_nn.use_trt_fp32]
                                     [--neural_vad_nn.fp16_needs_obey_precision_pass]
                                     [--neural_vad_nn.onnx_path NEURAL_VAD_NN.ONNX_PATH]
                                     [--neural_vad_nn.sample_rate NEURAL_VAD_NN.SAMPLE_RATE]
                                     [--neural_vad_nn.min_seq_len NEURAL_VAD_NN.MIN_SEQ_LEN]
                                     [--neural_vad_nn.opt_seq_len NEURAL_VAD_NN.OPT_SEQ_LEN]
                                     [--neural_vad_nn.max_seq_len NEURAL_VAD_NN.MAX_SEQ_LEN]
                                     [--streaming_diarizer.max_sequence_idle_microseconds STREAMING_DIARIZER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--streaming_diarizer.max_batch_size STREAMING_DIARIZER.MAX_BATCH_SIZE]
                                     [--streaming_diarizer.min_batch_size STREAMING_DIARIZER.MIN_BATCH_SIZE]
                                     [--streaming_diarizer.opt_batch_size STREAMING_DIARIZER.OPT_BATCH_SIZE]
                                     [--streaming_diarizer.preferred_batch_size STREAMING_DIARIZER.PREFERRED_BATCH_SIZE]
                                     [--streaming_diarizer.batching_type STREAMING_DIARIZER.BATCHING_TYPE]
                                     [--streaming_diarizer.preserve_ordering STREAMING_DIARIZER.PRESERVE_ORDERING]
                                     [--streaming_diarizer.instance_group_count STREAMING_DIARIZER.INSTANCE_GROUP_COUNT]
                                     [--streaming_diarizer.max_queue_delay_microseconds STREAMING_DIARIZER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--streaming_diarizer.optimization_graph_level STREAMING_DIARIZER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--streaming_diarizer.load_model STREAMING_DIARIZER.LOAD_MODEL]
                                     [--streaming_diarizer.batch_mode STREAMING_DIARIZER.BATCH_MODE]
                                     [--streaming_diarizer.decoupled_mode STREAMING_DIARIZER.DECOUPLED_MODE]
                                     [--streaming_diarizer.max_execution_batch_size STREAMING_DIARIZER.MAX_EXECUTION_BATCH_SIZE]
                                     [--streaming_diarizer.gain STREAMING_DIARIZER.GAIN]
                                     [--streaming_diarizer.dither STREAMING_DIARIZER.DITHER]
                                     [--streaming_diarizer.norm_per_feature STREAMING_DIARIZER.NORM_PER_FEATURE]
                                     [--streaming_diarizer.transpose STREAMING_DIARIZER.TRANSPOSE]
                                     [--streaming_diarizer.padding_size STREAMING_DIARIZER.PADDING_SIZE]
                                     [--streaming_diarizer.int64_features_length STREAMING_DIARIZER.INT64_FEATURES_LENGTH]
                                     [--streaming_diarizer_nn.max_sequence_idle_microseconds STREAMING_DIARIZER_NN.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--streaming_diarizer_nn.max_batch_size STREAMING_DIARIZER_NN.MAX_BATCH_SIZE]
                                     [--streaming_diarizer_nn.min_batch_size STREAMING_DIARIZER_NN.MIN_BATCH_SIZE]
                                     [--streaming_diarizer_nn.opt_batch_size STREAMING_DIARIZER_NN.OPT_BATCH_SIZE]
                                     [--streaming_diarizer_nn.preferred_batch_size STREAMING_DIARIZER_NN.PREFERRED_BATCH_SIZE]
                                     [--streaming_diarizer_nn.batching_type STREAMING_DIARIZER_NN.BATCHING_TYPE]
                                     [--streaming_diarizer_nn.preserve_ordering STREAMING_DIARIZER_NN.PRESERVE_ORDERING]
                                     [--streaming_diarizer_nn.instance_group_count STREAMING_DIARIZER_NN.INSTANCE_GROUP_COUNT]
                                     [--streaming_diarizer_nn.max_queue_delay_microseconds STREAMING_DIARIZER_NN.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--streaming_diarizer_nn.optimization_graph_level STREAMING_DIARIZER_NN.OPTIMIZATION_GRAPH_LEVEL]
                                     [--streaming_diarizer_nn.trt_max_workspace_size STREAMING_DIARIZER_NN.TRT_MAX_WORKSPACE_SIZE]
                                     [--streaming_diarizer_nn.use_onnx_runtime]
                                     [--streaming_diarizer_nn.use_torchscript]
                                     [--streaming_diarizer_nn.use_trt_fp32]
                                     [--streaming_diarizer_nn.fp16_needs_obey_precision_pass]
                                     [--streaming_diarizer_nn.model_path STREAMING_DIARIZER_NN.MODEL_PATH]
                                     [--flashlight_decoder.max_sequence_idle_microseconds FLASHLIGHT_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--flashlight_decoder.max_batch_size FLASHLIGHT_DECODER.MAX_BATCH_SIZE]
                                     [--flashlight_decoder.min_batch_size FLASHLIGHT_DECODER.MIN_BATCH_SIZE]
                                     [--flashlight_decoder.opt_batch_size FLASHLIGHT_DECODER.OPT_BATCH_SIZE]
                                     [--flashlight_decoder.preferred_batch_size FLASHLIGHT_DECODER.PREFERRED_BATCH_SIZE]
                                     [--flashlight_decoder.batching_type FLASHLIGHT_DECODER.BATCHING_TYPE]
                                     [--flashlight_decoder.preserve_ordering FLASHLIGHT_DECODER.PRESERVE_ORDERING]
                                     [--flashlight_decoder.instance_group_count FLASHLIGHT_DECODER.INSTANCE_GROUP_COUNT]
                                     [--flashlight_decoder.max_queue_delay_microseconds FLASHLIGHT_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--flashlight_decoder.optimization_graph_level FLASHLIGHT_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--flashlight_decoder.max_execution_batch_size FLASHLIGHT_DECODER.MAX_EXECUTION_BATCH_SIZE]
                                     [--flashlight_decoder.decoder_type FLASHLIGHT_DECODER.DECODER_TYPE]
                                     [--flashlight_decoder.padding_size FLASHLIGHT_DECODER.PADDING_SIZE]
                                     [--flashlight_decoder.max_supported_transcripts FLASHLIGHT_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
                                     [--flashlight_decoder.asr_model_delay FLASHLIGHT_DECODER.ASR_MODEL_DELAY]
                                     [--flashlight_decoder.ms_per_timestep FLASHLIGHT_DECODER.MS_PER_TIMESTEP]
                                     [--flashlight_decoder.vocab_file FLASHLIGHT_DECODER.VOCAB_FILE]
                                     [--flashlight_decoder.decoder_num_worker_threads FLASHLIGHT_DECODER.DECODER_NUM_WORKER_THREADS]
                                     [--flashlight_decoder.force_decoder_reset_after_ms FLASHLIGHT_DECODER.FORCE_DECODER_RESET_AFTER_MS]
                                     [--flashlight_decoder.language_model_file FLASHLIGHT_DECODER.LANGUAGE_MODEL_FILE]
                                     [--flashlight_decoder.lexicon_file FLASHLIGHT_DECODER.LEXICON_FILE]
                                     [--flashlight_decoder.use_lexicon_free_decoding FLASHLIGHT_DECODER.USE_LEXICON_FREE_DECODING]
                                     [--flashlight_decoder.beam_size FLASHLIGHT_DECODER.BEAM_SIZE]
                                     [--flashlight_decoder.beam_size_token FLASHLIGHT_DECODER.BEAM_SIZE_TOKEN]
                                     [--flashlight_decoder.beam_threshold FLASHLIGHT_DECODER.BEAM_THRESHOLD]
                                     [--flashlight_decoder.lm_weight FLASHLIGHT_DECODER.LM_WEIGHT]
                                     [--flashlight_decoder.blank_token FLASHLIGHT_DECODER.BLANK_TOKEN]
                                     [--flashlight_decoder.sil_token FLASHLIGHT_DECODER.SIL_TOKEN]
                                     [--flashlight_decoder.unk_token FLASHLIGHT_DECODER.UNK_TOKEN]
                                     [--flashlight_decoder.set_default_index_to_unk_token FLASHLIGHT_DECODER.SET_DEFAULT_INDEX_TO_UNK_TOKEN]
                                     [--flashlight_decoder.word_insertion_score FLASHLIGHT_DECODER.WORD_INSERTION_SCORE]
                                     [--flashlight_decoder.forerunner_beam_size FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE]
                                     [--flashlight_decoder.forerunner_beam_size_token FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE_TOKEN]
                                     [--flashlight_decoder.forerunner_beam_threshold FLASHLIGHT_DECODER.FORERUNNER_BEAM_THRESHOLD]
                                     [--flashlight_decoder.smearing_mode FLASHLIGHT_DECODER.SMEARING_MODE]
                                     [--flashlight_decoder.forerunner_use_lm FLASHLIGHT_DECODER.FORERUNNER_USE_LM]
                                     [--flashlight_decoder.num_tokenization FLASHLIGHT_DECODER.NUM_TOKENIZATION]
                                     [--flashlight_decoder.unk_score FLASHLIGHT_DECODER.UNK_SCORE]
                                     [--flashlight_decoder.log_add FLASHLIGHT_DECODER.LOG_ADD]
                                     [--pass_through_decoder.max_sequence_idle_microseconds PASS_THROUGH_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--pass_through_decoder.max_batch_size PASS_THROUGH_DECODER.MAX_BATCH_SIZE]
                                     [--pass_through_decoder.min_batch_size PASS_THROUGH_DECODER.MIN_BATCH_SIZE]
                                     [--pass_through_decoder.opt_batch_size PASS_THROUGH_DECODER.OPT_BATCH_SIZE]
                                     [--pass_through_decoder.preferred_batch_size PASS_THROUGH_DECODER.PREFERRED_BATCH_SIZE]
                                     [--pass_through_decoder.batching_type PASS_THROUGH_DECODER.BATCHING_TYPE]
                                     [--pass_through_decoder.preserve_ordering PASS_THROUGH_DECODER.PRESERVE_ORDERING]
                                     [--pass_through_decoder.instance_group_count PASS_THROUGH_DECODER.INSTANCE_GROUP_COUNT]
                                     [--pass_through_decoder.max_queue_delay_microseconds PASS_THROUGH_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--pass_through_decoder.optimization_graph_level PASS_THROUGH_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--pass_through_decoder.vocab_file PASS_THROUGH_DECODER.VOCAB_FILE]
                                     [--pass_through_decoder.asr_model_delay PASS_THROUGH_DECODER.ASR_MODEL_DELAY]
                                     [--nemo_decoder.max_sequence_idle_microseconds NEMO_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--nemo_decoder.max_batch_size NEMO_DECODER.MAX_BATCH_SIZE]
                                     [--nemo_decoder.min_batch_size NEMO_DECODER.MIN_BATCH_SIZE]
                                     [--nemo_decoder.opt_batch_size NEMO_DECODER.OPT_BATCH_SIZE]
                                     [--nemo_decoder.preferred_batch_size NEMO_DECODER.PREFERRED_BATCH_SIZE]
                                     [--nemo_decoder.batching_type NEMO_DECODER.BATCHING_TYPE]
                                     [--nemo_decoder.preserve_ordering NEMO_DECODER.PRESERVE_ORDERING]
                                     [--nemo_decoder.instance_group_count NEMO_DECODER.INSTANCE_GROUP_COUNT]
                                     [--nemo_decoder.max_queue_delay_microseconds NEMO_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--nemo_decoder.optimization_graph_level NEMO_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--nemo_decoder.vocab_file NEMO_DECODER.VOCAB_FILE]
                                     [--nemo_decoder.asr_model_delay NEMO_DECODER.ASR_MODEL_DELAY]
                                     [--nemo_decoder.compute_dtype]
                                     [--nemo_decoder.amp_dtype]
                                     [--nemo_decoder.nemo_decoder_type NEMO_DECODER.NEMO_DECODER_TYPE]
                                     [--nemo_decoder.use_stateful_decoding]
                                     [--nemo_decoder.use_amp]
                                     [--trtllm_decoder.max_sequence_idle_microseconds TRTLLM_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--trtllm_decoder.max_batch_size TRTLLM_DECODER.MAX_BATCH_SIZE]
                                     [--trtllm_decoder.min_batch_size TRTLLM_DECODER.MIN_BATCH_SIZE]
                                     [--trtllm_decoder.opt_batch_size TRTLLM_DECODER.OPT_BATCH_SIZE]
                                     [--trtllm_decoder.preferred_batch_size TRTLLM_DECODER.PREFERRED_BATCH_SIZE]
                                     [--trtllm_decoder.batching_type TRTLLM_DECODER.BATCHING_TYPE]
                                     [--trtllm_decoder.preserve_ordering TRTLLM_DECODER.PRESERVE_ORDERING]
                                     [--trtllm_decoder.instance_group_count TRTLLM_DECODER.INSTANCE_GROUP_COUNT]
                                     [--trtllm_decoder.max_queue_delay_microseconds TRTLLM_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--trtllm_decoder.optimization_graph_level TRTLLM_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--trtllm_decoder.world_size TRTLLM_DECODER.WORLD_SIZE]
                                     [--trtllm_decoder.quantize_dir TRTLLM_DECODER.QUANTIZE_DIR]
                                     [--trtllm_decoder.dtype TRTLLM_DECODER.DTYPE]
                                     [--trtllm_decoder.max_input_len TRTLLM_DECODER.MAX_INPUT_LEN]
                                     [--trtllm_decoder.max_output_len TRTLLM_DECODER.MAX_OUTPUT_LEN]
                                     [--trtllm_decoder.max_beam_width TRTLLM_DECODER.MAX_BEAM_WIDTH]
                                     [--trtllm_decoder.use_gpt_attention_plugin TRTLLM_DECODER.USE_GPT_ATTENTION_PLUGIN]
                                     [--trtllm_decoder.use_bert_attention_plugin TRTLLM_DECODER.USE_BERT_ATTENTION_PLUGIN]
                                     [--trtllm_decoder.use_gemm_plugin TRTLLM_DECODER.USE_GEMM_PLUGIN]
                                     [--trtllm_decoder.remove_input_padding TRTLLM_DECODER.REMOVE_INPUT_PADDING]
                                     [--trtllm_decoder.enable_context_fmha TRTLLM_DECODER.ENABLE_CONTEXT_FMHA]
                                     [--trtllm_decoder.use_weight_only TRTLLM_DECODER.USE_WEIGHT_ONLY]
                                     [--trtllm_decoder.weight_only_precision TRTLLM_DECODER.WEIGHT_ONLY_PRECISION]
                                     [--trtllm_decoder.int8_kv_cache TRTLLM_DECODER.INT8_KV_CACHE]
                                     [--trtllm_decoder.debug_mode TRTLLM_DECODER.DEBUG_MODE]
                                     [--trtllm_decoder.vocab_file TRTLLM_DECODER.VOCAB_FILE]
                                     [--trtllm_decoder.asr_model_delay TRTLLM_DECODER.ASR_MODEL_DELAY]
                                     [--greedy_decoder.max_sequence_idle_microseconds GREEDY_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--greedy_decoder.max_batch_size GREEDY_DECODER.MAX_BATCH_SIZE]
                                     [--greedy_decoder.min_batch_size GREEDY_DECODER.MIN_BATCH_SIZE]
                                     [--greedy_decoder.opt_batch_size GREEDY_DECODER.OPT_BATCH_SIZE]
                                     [--greedy_decoder.preferred_batch_size GREEDY_DECODER.PREFERRED_BATCH_SIZE]
                                     [--greedy_decoder.batching_type GREEDY_DECODER.BATCHING_TYPE]
                                     [--greedy_decoder.preserve_ordering GREEDY_DECODER.PRESERVE_ORDERING]
                                     [--greedy_decoder.instance_group_count GREEDY_DECODER.INSTANCE_GROUP_COUNT]
                                     [--greedy_decoder.max_queue_delay_microseconds GREEDY_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--greedy_decoder.optimization_graph_level GREEDY_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--greedy_decoder.max_execution_batch_size GREEDY_DECODER.MAX_EXECUTION_BATCH_SIZE]
                                     [--greedy_decoder.decoder_type GREEDY_DECODER.DECODER_TYPE]
                                     [--greedy_decoder.padding_size GREEDY_DECODER.PADDING_SIZE]
                                     [--greedy_decoder.max_supported_transcripts GREEDY_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
                                     [--greedy_decoder.asr_model_delay GREEDY_DECODER.ASR_MODEL_DELAY]
                                     [--greedy_decoder.ms_per_timestep GREEDY_DECODER.MS_PER_TIMESTEP]
                                     [--greedy_decoder.vocab_file GREEDY_DECODER.VOCAB_FILE]
                                     [--greedy_decoder.decoder_num_worker_threads GREEDY_DECODER.DECODER_NUM_WORKER_THREADS]
                                     [--greedy_decoder.force_decoder_reset_after_ms GREEDY_DECODER.FORCE_DECODER_RESET_AFTER_MS]
                                     [--os2s_decoder.max_sequence_idle_microseconds OS2S_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--os2s_decoder.max_batch_size OS2S_DECODER.MAX_BATCH_SIZE]
                                     [--os2s_decoder.min_batch_size OS2S_DECODER.MIN_BATCH_SIZE]
                                     [--os2s_decoder.opt_batch_size OS2S_DECODER.OPT_BATCH_SIZE]
                                     [--os2s_decoder.preferred_batch_size OS2S_DECODER.PREFERRED_BATCH_SIZE]
                                     [--os2s_decoder.batching_type OS2S_DECODER.BATCHING_TYPE]
                                     [--os2s_decoder.preserve_ordering OS2S_DECODER.PRESERVE_ORDERING]
                                     [--os2s_decoder.instance_group_count OS2S_DECODER.INSTANCE_GROUP_COUNT]
                                     [--os2s_decoder.max_queue_delay_microseconds OS2S_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--os2s_decoder.optimization_graph_level OS2S_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--os2s_decoder.max_execution_batch_size OS2S_DECODER.MAX_EXECUTION_BATCH_SIZE]
                                     [--os2s_decoder.decoder_type OS2S_DECODER.DECODER_TYPE]
                                     [--os2s_decoder.padding_size OS2S_DECODER.PADDING_SIZE]
                                     [--os2s_decoder.max_supported_transcripts OS2S_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
                                     [--os2s_decoder.asr_model_delay OS2S_DECODER.ASR_MODEL_DELAY]
                                     [--os2s_decoder.ms_per_timestep OS2S_DECODER.MS_PER_TIMESTEP]
                                     [--os2s_decoder.vocab_file OS2S_DECODER.VOCAB_FILE]
                                     [--os2s_decoder.decoder_num_worker_threads OS2S_DECODER.DECODER_NUM_WORKER_THREADS]
                                     [--os2s_decoder.force_decoder_reset_after_ms OS2S_DECODER.FORCE_DECODER_RESET_AFTER_MS]
                                     [--os2s_decoder.language_model_file OS2S_DECODER.LANGUAGE_MODEL_FILE]
                                     [--os2s_decoder.beam_search_width OS2S_DECODER.BEAM_SEARCH_WIDTH]
                                     [--os2s_decoder.language_model_alpha OS2S_DECODER.LANGUAGE_MODEL_ALPHA]
                                     [--os2s_decoder.language_model_beta OS2S_DECODER.LANGUAGE_MODEL_BETA]
                                     [--kaldi_decoder.max_sequence_idle_microseconds KALDI_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--kaldi_decoder.max_batch_size KALDI_DECODER.MAX_BATCH_SIZE]
                                     [--kaldi_decoder.min_batch_size KALDI_DECODER.MIN_BATCH_SIZE]
                                     [--kaldi_decoder.opt_batch_size KALDI_DECODER.OPT_BATCH_SIZE]
                                     [--kaldi_decoder.preferred_batch_size KALDI_DECODER.PREFERRED_BATCH_SIZE]
                                     [--kaldi_decoder.batching_type KALDI_DECODER.BATCHING_TYPE]
                                     [--kaldi_decoder.preserve_ordering KALDI_DECODER.PRESERVE_ORDERING]
                                     [--kaldi_decoder.instance_group_count KALDI_DECODER.INSTANCE_GROUP_COUNT]
                                     [--kaldi_decoder.max_queue_delay_microseconds KALDI_DECODER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--kaldi_decoder.optimization_graph_level KALDI_DECODER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--kaldi_decoder.max_execution_batch_size KALDI_DECODER.MAX_EXECUTION_BATCH_SIZE]
                                     [--kaldi_decoder.decoder_type KALDI_DECODER.DECODER_TYPE]
                                     [--kaldi_decoder.padding_size KALDI_DECODER.PADDING_SIZE]
                                     [--kaldi_decoder.max_supported_transcripts KALDI_DECODER.MAX_SUPPORTED_TRANSCRIPTS]
                                     [--kaldi_decoder.asr_model_delay KALDI_DECODER.ASR_MODEL_DELAY]
                                     [--kaldi_decoder.ms_per_timestep KALDI_DECODER.MS_PER_TIMESTEP]
                                     [--kaldi_decoder.vocab_file KALDI_DECODER.VOCAB_FILE]
                                     [--kaldi_decoder.decoder_num_worker_threads KALDI_DECODER.DECODER_NUM_WORKER_THREADS]
                                     [--kaldi_decoder.force_decoder_reset_after_ms KALDI_DECODER.FORCE_DECODER_RESET_AFTER_MS]
                                     [--kaldi_decoder.fst_filename KALDI_DECODER.FST_FILENAME]
                                     [--kaldi_decoder.word_syms_filename KALDI_DECODER.WORD_SYMS_FILENAME]
                                     [--kaldi_decoder.default_beam KALDI_DECODER.DEFAULT_BEAM]
                                     [--kaldi_decoder.max_active KALDI_DECODER.MAX_ACTIVE]
                                     [--kaldi_decoder.acoustic_scale KALDI_DECODER.ACOUSTIC_SCALE]
                                     [--kaldi_decoder.decoder_num_copy_threads KALDI_DECODER.DECODER_NUM_COPY_THREADS]
                                     [--kaldi_decoder.determinize_lattice KALDI_DECODER.DETERMINIZE_LATTICE]
                                     [--rescorer.max_sequence_idle_microseconds RESCORER.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--rescorer.max_batch_size RESCORER.MAX_BATCH_SIZE]
                                     [--rescorer.min_batch_size RESCORER.MIN_BATCH_SIZE]
                                     [--rescorer.opt_batch_size RESCORER.OPT_BATCH_SIZE]
                                     [--rescorer.preferred_batch_size RESCORER.PREFERRED_BATCH_SIZE]
                                     [--rescorer.batching_type RESCORER.BATCHING_TYPE]
                                     [--rescorer.preserve_ordering RESCORER.PRESERVE_ORDERING]
                                     [--rescorer.instance_group_count RESCORER.INSTANCE_GROUP_COUNT]
                                     [--rescorer.max_queue_delay_microseconds RESCORER.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--rescorer.optimization_graph_level RESCORER.OPTIMIZATION_GRAPH_LEVEL]
                                     [--rescorer.max_supported_transcripts RESCORER.MAX_SUPPORTED_TRANSCRIPTS]
                                     [--rescorer.score_lm_carpa_filename RESCORER.SCORE_LM_CARPA_FILENAME]
                                     [--rescorer.decode_lm_carpa_filename RESCORER.DECODE_LM_CARPA_FILENAME]
                                     [--rescorer.word_syms_filename RESCORER.WORD_SYMS_FILENAME]
                                     [--rescorer.word_insertion_penalty RESCORER.WORD_INSERTION_PENALTY]
                                     [--rescorer.num_worker_threads RESCORER.NUM_WORKER_THREADS]
                                     [--rescorer.ms_per_timestep RESCORER.MS_PER_TIMESTEP]
                                     [--rescorer.boundary_character_ids RESCORER.BOUNDARY_CHARACTER_IDS]
                                     [--rescorer.vocab_file RESCORER.VOCAB_FILE]
                                     [--lm_decoder_cpu.beam_search_width LM_DECODER_CPU.BEAM_SEARCH_WIDTH]
                                     [--lm_decoder_cpu.decoder_type LM_DECODER_CPU.DECODER_TYPE]
                                     [--lm_decoder_cpu.padding_size LM_DECODER_CPU.PADDING_SIZE]
                                     [--lm_decoder_cpu.language_model_file LM_DECODER_CPU.LANGUAGE_MODEL_FILE]
                                     [--lm_decoder_cpu.max_supported_transcripts LM_DECODER_CPU.MAX_SUPPORTED_TRANSCRIPTS]
                                     [--lm_decoder_cpu.asr_model_delay LM_DECODER_CPU.ASR_MODEL_DELAY]
                                     [--lm_decoder_cpu.language_model_alpha LM_DECODER_CPU.LANGUAGE_MODEL_ALPHA]
                                     [--lm_decoder_cpu.language_model_beta LM_DECODER_CPU.LANGUAGE_MODEL_BETA]
                                     [--lm_decoder_cpu.ms_per_timestep LM_DECODER_CPU.MS_PER_TIMESTEP]
                                     [--lm_decoder_cpu.vocab_file LM_DECODER_CPU.VOCAB_FILE]
                                     [--lm_decoder_cpu.lexicon_file LM_DECODER_CPU.LEXICON_FILE]
                                     [--lm_decoder_cpu.beam_size LM_DECODER_CPU.BEAM_SIZE]
                                     [--lm_decoder_cpu.beam_size_token LM_DECODER_CPU.BEAM_SIZE_TOKEN]
                                     [--lm_decoder_cpu.beam_threshold LM_DECODER_CPU.BEAM_THRESHOLD]
                                     [--lm_decoder_cpu.lm_weight LM_DECODER_CPU.LM_WEIGHT]
                                     [--lm_decoder_cpu.word_insertion_score LM_DECODER_CPU.WORD_INSERTION_SCORE]
                                     [--lm_decoder_cpu.forerunner_beam_size LM_DECODER_CPU.FORERUNNER_BEAM_SIZE]
                                     [--lm_decoder_cpu.forerunner_beam_size_token LM_DECODER_CPU.FORERUNNER_BEAM_SIZE_TOKEN]
                                     [--lm_decoder_cpu.forerunner_beam_threshold LM_DECODER_CPU.FORERUNNER_BEAM_THRESHOLD]
                                     [--lm_decoder_cpu.smearing_mode LM_DECODER_CPU.SMEARING_MODE]
                                     [--lm_decoder_cpu.forerunner_use_lm LM_DECODER_CPU.FORERUNNER_USE_LM]
                                     [--asr_ensemble_backend.max_sequence_idle_microseconds ASR_ENSEMBLE_BACKEND.MAX_SEQUENCE_IDLE_MICROSECONDS]
                                     [--asr_ensemble_backend.max_batch_size ASR_ENSEMBLE_BACKEND.MAX_BATCH_SIZE]
                                     [--asr_ensemble_backend.min_batch_size ASR_ENSEMBLE_BACKEND.MIN_BATCH_SIZE]
                                     [--asr_ensemble_backend.opt_batch_size ASR_ENSEMBLE_BACKEND.OPT_BATCH_SIZE]
                                     [--asr_ensemble_backend.preferred_batch_size ASR_ENSEMBLE_BACKEND.PREFERRED_BATCH_SIZE]
                                     [--asr_ensemble_backend.batching_type ASR_ENSEMBLE_BACKEND.BATCHING_TYPE]
                                     [--asr_ensemble_backend.preserve_ordering ASR_ENSEMBLE_BACKEND.PRESERVE_ORDERING]
                                     [--asr_ensemble_backend.instance_group_count ASR_ENSEMBLE_BACKEND.INSTANCE_GROUP_COUNT]
                                     [--asr_ensemble_backend.max_queue_delay_microseconds ASR_ENSEMBLE_BACKEND.MAX_QUEUE_DELAY_MICROSECONDS]
                                     [--asr_ensemble_backend.optimization_graph_level ASR_ENSEMBLE_BACKEND.OPTIMIZATION_GRAPH_LEVEL]
                                     [--asr_ensemble_backend.language_code ASR_ENSEMBLE_BACKEND.LANGUAGE_CODE]
                                     [--asr_ensemble_backend.streaming ASR_ENSEMBLE_BACKEND.STREAMING]
                                     [--asr_ensemble_backend.offline]
                                     [--asr_ensemble_backend.type]
                                     output_path source_path [source_path ...]

Generate a Riva Model from a speech_recognition model trained with NVIDIA
NeMo.

positional arguments:
  output_path           Location to write compiled Riva pipeline
  source_path           Source file(s)

options:
  -h, --help            show this help message and exit
  -f, --force           Overwrite existing artifacts if they exist
  -v, --verbose         Verbose log outputs
  --language_code LANGUAGE_CODE
                        Language of the model
  --instance_group_count INSTANCE_GROUP_COUNT
                        How many instances in a group
  --kind KIND           Backend runs on CPU or GPU
  --max_batch_size MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --max_queue_delay_microseconds MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --batching_type BATCHING_TYPE
  --acoustic_model_name ACOUSTIC_MODEL_NAME
                        name of the acoustic model
  --featurizer_name FEATURIZER_NAME
                        name of the feature extractor model
  --name NAME           name of the ASR pipeline, used to set the model names
                        in the Riva model repository
  --streaming STREAMING
                        Execute model in streaming mode
  --offline             Marking the model to be used with offline API in Riva
  --vad_type VAD_TYPE   Type of pre-acoustic model VAD algorithm to use. Valid
                        entries are none, neural
  --diarizer_type DIARIZER_TYPE
                        Type of diarizer model to use. Valid entries are none,
                        sortformer
  --unified_acoustic_model
                        Marking the model as Unified Model (ASR+PnC combined)
  --endpointing_type ENDPOINTING_TYPE
                        Type of post-acoustic model endpointing algorithm to
                        use. Valid entries are none, greedy_ctc
  --chunk_size CHUNK_SIZE
                        Size of audio chunks to use during inference. If not
                        specified, default will be selected based on
                        online/offline setting
  --padding_factor PADDING_FACTOR
                        Multiple on the chunk_size. Deprecated and will be
                        ignored
  --left_padding_size LEFT_PADDING_SIZE
                        The duration in seconds of the backward looking
                        padding to prepend to the audio chunk. The acoustic
                        model input corresponds to a duration of
                        (left_padding_size + chunk_size + right_padding_size)
                        seconds
  --right_padding_size RIGHT_PADDING_SIZE
                        The duration in seconds of the forward looking padding
                        to append to the audio chunk. The acoustic model input
                        corresponds to a duration of (left_padding_size +
                        chunk_size + right_padding_size) seconds
  --padding_size PADDING_SIZE
                        padding_size
  --max_supported_transcripts MAX_SUPPORTED_TRANSCRIPTS
                        The maximum number of hypothesized transcripts
                        generated per utterance
  --ms_per_timestep MS_PER_TIMESTEP
                        The duration in milliseconds of one timestep of the
                        acoustic model output
  --force_decoder_reset_after_ms FORCE_DECODER_RESET_AFTER_MS
                        force decoder reset after this number of milliseconds
  --lattice_beam LATTICE_BEAM
  --decoding_language_model_arpa DECODING_LANGUAGE_MODEL_ARPA
                        Language model .arpa used during decoding
  --decoding_language_model_binary DECODING_LANGUAGE_MODEL_BINARY
                        Language model .binary used during decoding
  --decoding_language_model_fst DECODING_LANGUAGE_MODEL_FST
                        Language model fst used during decoding
  --decoding_language_model_words DECODING_LANGUAGE_MODEL_WORDS
                        Language model words used during decoding
  --rescoring_language_model_arpa RESCORING_LANGUAGE_MODEL_ARPA
                        Language model .arpa used during lattice rescoring
  --decoding_language_model_carpa DECODING_LANGUAGE_MODEL_CARPA
                        Language model .carpa used during decoding
  --rescoring_language_model_carpa RESCORING_LANGUAGE_MODEL_CARPA
                        Language model .carpa used during lattice rescoring
  --decoding_lexicon DECODING_LEXICON
                        Lexicon to use when decoding
  --decoding_vocab DECODING_VOCAB
                        File of unique words separated by white space. Only
                        used if decoding_lexicon not provided.
  --tokenizer_model TOKENIZER_MODEL
                        Sentencepiece model to use for encoding. Only include
                        if generating lexicon from vocab.
  --decoder_type DECODER_TYPE
                        Type of decoder to use. Valid entries are greedy,
                        os2s, flashlight, kaldi, trtllm, nemo or pass_through.
  --stddev_floor STDDEV_FLOOR
                        Add this value to computed features standard
                        deviation. Higher values help reduce spurious
                        transcripts with low energy signals.
  --wfst_tokenizer_model WFST_TOKENIZER_MODEL
                        Sparrowhawk model to use for tokenization and
                        classification, must be in .far format
  --wfst_verbalizer_model WFST_VERBALIZER_MODEL
                        Sparrowhawk model to use for verbalizer, must be in
                        .far format.
  --wfst_pre_process_model WFST_PRE_PROCESS_MODEL
                        Sparrowhawk model to use for pre process, must be in
                        .far format.
  --wfst_post_process_model WFST_POST_PROCESS_MODEL
                        Sparrowhawk model to use for post process, must be in
                        .far format.
  --speech_hints_model SPEECH_HINTS_MODEL
                        Speechhints class far file used to enable speechhints
  --buffer_look_ahead BUFFER_LOOK_AHEAD
                        Last 'n' words of the final transcript to be treated
                        as look ahead
  --buffer_context_history BUFFER_CONTEXT_HISTORY
                        Number of words from last previous response to be
                        maintained for extra context
  --buffer_threshold BUFFER_THRESHOLD
                        Minimum number of words (including history and
                        look_ahead) in buffer required for applying PnC.
                        Buffering is disabled by default.
  --buffer_max_timeout_frames BUFFER_MAX_TIMEOUT_FRAMES
                        Number of time frames after which PnC will be applied
                        to the buffer
  --profane_words_file PROFANE_WORDS_FILE
                        File containing newline separated profane words to be
                        filtered out if requested by user
  --append_space_to_transcripts APPEND_SPACE_TO_TRANSCRIPTS
                        Boolean that controls if a space should be added to
                        transcripts after end of utterance detection
  --enable_vad_endpointing ENABLE_VAD_ENDPOINTING
                        Boolean that controls whether to use VAD or AM logits
                        for endpointing
  --return_separate_utterances RETURN_SEPARATE_UTTERANCES
                        Boolean flag to return each utterance separately
                        instead of returning concatenated utterances
  --mel_basis_file_path MEL_BASIS_FILE_PATH
                        Pre calculated Mel basis file for pytorch Feature
                        Extractor used by ASR
  --diar_mel_basis_file_path DIAR_MEL_BASIS_FILE_PATH
                        Pre calculated Mel basis file for pytorch Feature
                        Extractor used by Diarizer
  --feature_extractor_type FEATURE_EXTRACTOR_TYPE
                        Feature extractor type
  --torch_feature_type TORCH_FEATURE_TYPE
                        Torch feature type ['whisper', 'nemo']
  --torch_feature_device TORCH_FEATURE_DEVICE
                        Torch feature type ['cuda', 'cpu']
  --execution_environment_path EXECUTION_ENVIRONMENT_PATH
                        Path to conda environment path for nemo runtime
  --share_flags SHARE_FLAGS
                        Share batched start end flags and corr_ids with AM
  --att_context_size ATT_CONTEXT_SIZE
                        Attention context size to use for cache aware models

featurizer:
  --featurizer.max_sequence_idle_microseconds FEATURIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --featurizer.max_batch_size FEATURIZER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --featurizer.min_batch_size FEATURIZER.MIN_BATCH_SIZE
  --featurizer.opt_batch_size FEATURIZER.OPT_BATCH_SIZE
  --featurizer.preferred_batch_size FEATURIZER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --featurizer.batching_type FEATURIZER.BATCHING_TYPE
  --featurizer.preserve_ordering FEATURIZER.PRESERVE_ORDERING
                        Preserve ordering
  --featurizer.instance_group_count FEATURIZER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --featurizer.max_queue_delay_microseconds FEATURIZER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --featurizer.optimization_graph_level FEATURIZER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --featurizer.max_execution_batch_size FEATURIZER.MAX_EXECUTION_BATCH_SIZE
                        Maximum Batch Size
  --featurizer.gain FEATURIZER.GAIN
                        Adjust input signal with this gain multiplier prior to
                        feature extraction
  --featurizer.dither FEATURIZER.DITHER
                        Augment signal with gaussian noise with this gain to
                        prevent quantization artifacts
  --featurizer.use_utterance_norm_params FEATURIZER.USE_UTTERANCE_NORM_PARAMS
                        Apply normalization at utterance level
  --featurizer.precalc_norm_time_steps FEATURIZER.PRECALC_NORM_TIME_STEPS
                        Weight of the precomputed normalization parameters, in
                        timesteps. Setting to 0 will disable use of
                        precalculated normalization parameters.
  --featurizer.precalc_norm_params FEATURIZER.PRECALC_NORM_PARAMS
                        Boolean that controls if precalculated Normalization
                        Parameters should be used
  --featurizer.norm_per_feature FEATURIZER.NORM_PER_FEATURE
                        Normalize Per Feature
  --featurizer.mean FEATURIZER.MEAN
                        Pre-computed mean values
  --featurizer.stddev FEATURIZER.STDDEV
                        Pre-computed Std Dev Values
  --featurizer.transpose FEATURIZER.TRANSPOSE
                        Take transpose of output features
  --featurizer.padding_size FEATURIZER.PADDING_SIZE
                        padding_size
  --featurizer.int64_features_length FEATURIZER.INT64_FEATURES_LENGTH
                        Use int64 for features length

nn:
  --nn.max_sequence_idle_microseconds NN.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --nn.max_batch_size NN.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --nn.min_batch_size NN.MIN_BATCH_SIZE
  --nn.opt_batch_size NN.OPT_BATCH_SIZE
  --nn.preferred_batch_size NN.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --nn.batching_type NN.BATCHING_TYPE
  --nn.preserve_ordering NN.PRESERVE_ORDERING
                        Preserve ordering
  --nn.instance_group_count NN.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --nn.max_queue_delay_microseconds NN.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --nn.optimization_graph_level NN.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --nn.trt_max_workspace_size NN.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in MB) to use for model export
                        to TensorRT
  --nn.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --nn.use_torchscript  Use TorchScript instead of TensorRT
  --nn.use_trt_fp32     Use TensorRT engine with FP32 instead of FP16
  --nn.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network
  --nn.am_len_input_use_int64 NN.AM_LEN_INPUT_USE_INT64
                        Use int64 for features length
  --nn.am_cache_len_input_use_int64 NN.AM_CACHE_LEN_INPUT_USE_INT64
                        Use int64 for cache length
  --nn.language_code NN.LANGUAGE_CODE
                        Language code of the model
  --nn.engine_dir NN.ENGINE_DIR
                        Absolute model directory path
  --nn.EXECUTION_ENV_PATH NN.EXECUTION_ENV_PATH
                        Path to conda environment file for Python backend

endpointing:
  --endpointing.max_sequence_idle_microseconds ENDPOINTING.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --endpointing.max_batch_size ENDPOINTING.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --endpointing.min_batch_size ENDPOINTING.MIN_BATCH_SIZE
  --endpointing.opt_batch_size ENDPOINTING.OPT_BATCH_SIZE
  --endpointing.preferred_batch_size ENDPOINTING.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --endpointing.batching_type ENDPOINTING.BATCHING_TYPE
  --endpointing.preserve_ordering ENDPOINTING.PRESERVE_ORDERING
                        Preserve ordering
  --endpointing.instance_group_count ENDPOINTING.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --endpointing.max_queue_delay_microseconds ENDPOINTING.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --endpointing.optimization_graph_level ENDPOINTING.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --endpointing.ms_per_timestep ENDPOINTING.MS_PER_TIMESTEP
  --endpointing.start_history ENDPOINTING.START_HISTORY
                        Size of the window, in milliseconds, to use to detect
                        start of utterance. If (start_th) of (start_history)
                        ms of the acoustic model output have non-blank tokens,
                        start of utterance is detected.
  --endpointing.stop_history ENDPOINTING.STOP_HISTORY
                        Size of the window, in milliseconds, to use to detect
                        end of utterance. If (stop_th) of (stop_history) ms of
                        the acoustic model output have non-blank tokens, end
                        of utterance is detected and decoder will be reset.
  --endpointing.stop_history_eou ENDPOINTING.STOP_HISTORY_EOU
                        Size of the window, in milliseconds, to trigger end of
                        utterance first pass. If (stop_th_eou) of
                        (stop_history_eou) ms of the acoustic model output
                        have non-blank tokens, a partial transcript with high
                        stability will be generated.
  --endpointing.start_th ENDPOINTING.START_TH
                        Percentage threshold to use to detect start of
                        utterance. If (start_th) of (start_history) ms of the
                        acoustic model output have non-blank tokens, start of
                        utterance is detected.
  --endpointing.stop_th ENDPOINTING.STOP_TH
                        Percentage threshold to use to detect end of
                        utterance. If (stop_th) of (stop_history) ms of the
                        acoustic model output have non-blank tokens, end of
                        utterance is detected.
  --endpointing.stop_th_eou ENDPOINTING.STOP_TH_EOU
                        Percentage threshold to use to detect end of
                        utterance. If (stop_th_eou) of (stop_history_eou) ms
                        of the acoustic model output have non-blank tokens,
                        end of utterance for the first pass will be triggered.
  --endpointing.residue_blanks_at_start ENDPOINTING.RESIDUE_BLANKS_AT_START
                        (Advanced) Number of time steps to ignore at the
                        beginning of the acoustic model output when trying to
                        detect start/end of speech
  --endpointing.residue_blanks_at_end ENDPOINTING.RESIDUE_BLANKS_AT_END
                        (Advanced) Number of time steps to ignore at the end
                        of the acoustic model output when trying to detect
                        start/end of speech
  --endpointing.vocab_file ENDPOINTING.VOCAB_FILE
                        Vocab file to be used with decoder

neural_vad:
  --neural_vad.max_sequence_idle_microseconds NEURAL_VAD.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --neural_vad.max_batch_size NEURAL_VAD.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --neural_vad.min_batch_size NEURAL_VAD.MIN_BATCH_SIZE
  --neural_vad.opt_batch_size NEURAL_VAD.OPT_BATCH_SIZE
  --neural_vad.preferred_batch_size NEURAL_VAD.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --neural_vad.batching_type NEURAL_VAD.BATCHING_TYPE
  --neural_vad.preserve_ordering NEURAL_VAD.PRESERVE_ORDERING
                        Preserve ordering
  --neural_vad.instance_group_count NEURAL_VAD.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --neural_vad.max_queue_delay_microseconds NEURAL_VAD.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --neural_vad.optimization_graph_level NEURAL_VAD.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --neural_vad.load_model NEURAL_VAD.LOAD_MODEL
  --neural_vad.batch_mode NEURAL_VAD.BATCH_MODE
                        Flag to enable batch inference mode
  --neural_vad.decoupled_mode NEURAL_VAD.DECOUPLED_MODE
                        Flag to enable decoupled inference mode
  --neural_vad.onset NEURAL_VAD.ONSET
                        Onset threshold for detecting the beginning and end of
                        a speech.
  --neural_vad.offset NEURAL_VAD.OFFSET
                        Offset threshold for detecting the end of a speech.
  --neural_vad.pad_onset NEURAL_VAD.PAD_ONSET
                        Add durations before each speech segment.
  --neural_vad.pad_offset NEURAL_VAD.PAD_OFFSET
                        Add durations after each speech segment.
  --neural_vad.min_duration_on NEURAL_VAD.MIN_DURATION_ON
                        Threshold for small non_speech deletion.
  --neural_vad.min_duration_off NEURAL_VAD.MIN_DURATION_OFF
                        Threshold for short speech segment deletion.
  --neural_vad.filter_speech_first NEURAL_VAD.FILTER_SPEECH_FIRST
                        Enable short speech segment deletion first.
  --neural_vad.features_mask_value NEURAL_VAD.FEATURES_MASK_VALUE
                        Features value to use to mask the non-speech segments

neural_vad_nn:
  --neural_vad_nn.max_sequence_idle_microseconds NEURAL_VAD_NN.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --neural_vad_nn.max_batch_size NEURAL_VAD_NN.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --neural_vad_nn.min_batch_size NEURAL_VAD_NN.MIN_BATCH_SIZE
  --neural_vad_nn.opt_batch_size NEURAL_VAD_NN.OPT_BATCH_SIZE
  --neural_vad_nn.preferred_batch_size NEURAL_VAD_NN.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --neural_vad_nn.batching_type NEURAL_VAD_NN.BATCHING_TYPE
  --neural_vad_nn.preserve_ordering NEURAL_VAD_NN.PRESERVE_ORDERING
                        Preserve ordering
  --neural_vad_nn.instance_group_count NEURAL_VAD_NN.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --neural_vad_nn.max_queue_delay_microseconds NEURAL_VAD_NN.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --neural_vad_nn.optimization_graph_level NEURAL_VAD_NN.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --neural_vad_nn.trt_max_workspace_size NEURAL_VAD_NN.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in MB) to use for model export
                        to TensorRT
  --neural_vad_nn.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --neural_vad_nn.use_torchscript
                        Use TorchScript instead of TensorRT
  --neural_vad_nn.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --neural_vad_nn.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network
  --neural_vad_nn.onnx_path NEURAL_VAD_NN.ONNX_PATH
  --neural_vad_nn.sample_rate NEURAL_VAD_NN.SAMPLE_RATE
  --neural_vad_nn.min_seq_len NEURAL_VAD_NN.MIN_SEQ_LEN
  --neural_vad_nn.opt_seq_len NEURAL_VAD_NN.OPT_SEQ_LEN
  --neural_vad_nn.max_seq_len NEURAL_VAD_NN.MAX_SEQ_LEN

streaming_diarizer:
  --streaming_diarizer.max_sequence_idle_microseconds STREAMING_DIARIZER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --streaming_diarizer.max_batch_size STREAMING_DIARIZER.MAX_BATCH_SIZE
                        Maximum Batch Size
  --streaming_diarizer.min_batch_size STREAMING_DIARIZER.MIN_BATCH_SIZE
  --streaming_diarizer.opt_batch_size STREAMING_DIARIZER.OPT_BATCH_SIZE
  --streaming_diarizer.preferred_batch_size STREAMING_DIARIZER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --streaming_diarizer.batching_type STREAMING_DIARIZER.BATCHING_TYPE
  --streaming_diarizer.preserve_ordering STREAMING_DIARIZER.PRESERVE_ORDERING
                        Preserve ordering
  --streaming_diarizer.instance_group_count STREAMING_DIARIZER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --streaming_diarizer.max_queue_delay_microseconds STREAMING_DIARIZER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --streaming_diarizer.optimization_graph_level STREAMING_DIARIZER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --streaming_diarizer.load_model STREAMING_DIARIZER.LOAD_MODEL
  --streaming_diarizer.batch_mode STREAMING_DIARIZER.BATCH_MODE
                        Flag to enable batch inference mode
  --streaming_diarizer.decoupled_mode STREAMING_DIARIZER.DECOUPLED_MODE
                        Flag to enable decoupled inference mode
  --streaming_diarizer.max_execution_batch_size STREAMING_DIARIZER.MAX_EXECUTION_BATCH_SIZE
                        Maximum Execution Batch Size
  --streaming_diarizer.gain STREAMING_DIARIZER.GAIN
                        Adjust input signal with this gain multiplier prior to
                        feature extraction
  --streaming_diarizer.dither STREAMING_DIARIZER.DITHER
                        Augment signal with gaussian noise with this gain to
                        prevent quantization artifacts
  --streaming_diarizer.norm_per_feature STREAMING_DIARIZER.NORM_PER_FEATURE
                        Normalize Per Feature
  --streaming_diarizer.transpose STREAMING_DIARIZER.TRANSPOSE
                        Take transpose of output features
  --streaming_diarizer.padding_size STREAMING_DIARIZER.PADDING_SIZE
                        padding_size
  --streaming_diarizer.int64_features_length STREAMING_DIARIZER.INT64_FEATURES_LENGTH
                        Use int64 for features length

streaming_diarizer_nn:
  --streaming_diarizer_nn.max_sequence_idle_microseconds STREAMING_DIARIZER_NN.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --streaming_diarizer_nn.max_batch_size STREAMING_DIARIZER_NN.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --streaming_diarizer_nn.min_batch_size STREAMING_DIARIZER_NN.MIN_BATCH_SIZE
  --streaming_diarizer_nn.opt_batch_size STREAMING_DIARIZER_NN.OPT_BATCH_SIZE
  --streaming_diarizer_nn.preferred_batch_size STREAMING_DIARIZER_NN.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --streaming_diarizer_nn.batching_type STREAMING_DIARIZER_NN.BATCHING_TYPE
  --streaming_diarizer_nn.preserve_ordering STREAMING_DIARIZER_NN.PRESERVE_ORDERING
                        Preserve ordering
  --streaming_diarizer_nn.instance_group_count STREAMING_DIARIZER_NN.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --streaming_diarizer_nn.max_queue_delay_microseconds STREAMING_DIARIZER_NN.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --streaming_diarizer_nn.optimization_graph_level STREAMING_DIARIZER_NN.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --streaming_diarizer_nn.trt_max_workspace_size STREAMING_DIARIZER_NN.TRT_MAX_WORKSPACE_SIZE
                        Maximum workspace size (in MB) to use for model export
                        to TensorRT
  --streaming_diarizer_nn.use_onnx_runtime
                        Use ONNX runtime instead of TensorRT
  --streaming_diarizer_nn.use_torchscript
                        Use TorchScript instead of TensorRT
  --streaming_diarizer_nn.use_trt_fp32
                        Use TensorRT engine with FP32 instead of FP16
  --streaming_diarizer_nn.fp16_needs_obey_precision_pass
                        Flag to explicitly mark layers as float when parsing
                        the ONNX network
  --streaming_diarizer_nn.model_path STREAMING_DIARIZER_NN.MODEL_PATH
                        Path to the diarizer model file

flashlight_decoder:
  --flashlight_decoder.max_sequence_idle_microseconds FLASHLIGHT_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --flashlight_decoder.max_batch_size FLASHLIGHT_DECODER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --flashlight_decoder.min_batch_size FLASHLIGHT_DECODER.MIN_BATCH_SIZE
  --flashlight_decoder.opt_batch_size FLASHLIGHT_DECODER.OPT_BATCH_SIZE
  --flashlight_decoder.preferred_batch_size FLASHLIGHT_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --flashlight_decoder.batching_type FLASHLIGHT_DECODER.BATCHING_TYPE
  --flashlight_decoder.preserve_ordering FLASHLIGHT_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --flashlight_decoder.instance_group_count FLASHLIGHT_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --flashlight_decoder.max_queue_delay_microseconds FLASHLIGHT_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --flashlight_decoder.optimization_graph_level FLASHLIGHT_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --flashlight_decoder.max_execution_batch_size FLASHLIGHT_DECODER.MAX_EXECUTION_BATCH_SIZE
  --flashlight_decoder.decoder_type FLASHLIGHT_DECODER.DECODER_TYPE
  --flashlight_decoder.padding_size FLASHLIGHT_DECODER.PADDING_SIZE
                        padding_size
  --flashlight_decoder.max_supported_transcripts FLASHLIGHT_DECODER.MAX_SUPPORTED_TRANSCRIPTS
  --flashlight_decoder.asr_model_delay FLASHLIGHT_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.
  --flashlight_decoder.ms_per_timestep FLASHLIGHT_DECODER.MS_PER_TIMESTEP
  --flashlight_decoder.vocab_file FLASHLIGHT_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --flashlight_decoder.decoder_num_worker_threads FLASHLIGHT_DECODER.DECODER_NUM_WORKER_THREADS
                        Number of threads to use for CPU decoders. If < 1,
                        maximum hardware concurrency is used.
  --flashlight_decoder.force_decoder_reset_after_ms FLASHLIGHT_DECODER.FORCE_DECODER_RESET_AFTER_MS
                        force decoder reset after this number of milliseconds
  --flashlight_decoder.language_model_file FLASHLIGHT_DECODER.LANGUAGE_MODEL_FILE
                        Language model file in binary format to be used by
                        KenLM
  --flashlight_decoder.lexicon_file FLASHLIGHT_DECODER.LEXICON_FILE
                        Lexicon file to be used with decoder
  --flashlight_decoder.use_lexicon_free_decoding FLASHLIGHT_DECODER.USE_LEXICON_FREE_DECODING
                        Enables lexicon-free decoding
  --flashlight_decoder.beam_size FLASHLIGHT_DECODER.BEAM_SIZE
                        Maximum number of hypothesis the decoder holds after
                        each step
  --flashlight_decoder.beam_size_token FLASHLIGHT_DECODER.BEAM_SIZE_TOKEN
                        Maximum number of tokens the decoder considers at each
                        step
  --flashlight_decoder.beam_threshold FLASHLIGHT_DECODER.BEAM_THRESHOLD
                        Threshold to prune hypothesis
  --flashlight_decoder.lm_weight FLASHLIGHT_DECODER.LM_WEIGHT
                        Weight of language model
  --flashlight_decoder.blank_token FLASHLIGHT_DECODER.BLANK_TOKEN
                        Blank token
  --flashlight_decoder.sil_token FLASHLIGHT_DECODER.SIL_TOKEN
                        Silence token
  --flashlight_decoder.unk_token FLASHLIGHT_DECODER.UNK_TOKEN
                        Unknown token
  --flashlight_decoder.set_default_index_to_unk_token FLASHLIGHT_DECODER.SET_DEFAULT_INDEX_TO_UNK_TOKEN
                        Flag that controls if default index should be set to
                        the index of the unk_token or not. If not, error will
                        be thrown if an invalid token is encountered in the
                        lexicon.
  --flashlight_decoder.word_insertion_score FLASHLIGHT_DECODER.WORD_INSERTION_SCORE
                        Word insertion score
  --flashlight_decoder.forerunner_beam_size FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE
                        Maximum number of hypothesis the decoder holds after
                        each step, for forerunner transcript
  --flashlight_decoder.forerunner_beam_size_token FLASHLIGHT_DECODER.FORERUNNER_BEAM_SIZE_TOKEN
                        Maximum number of tokens the decoder considers at each
                        step, for forerunner transcript
  --flashlight_decoder.forerunner_beam_threshold FLASHLIGHT_DECODER.FORERUNNER_BEAM_THRESHOLD
                        Threshold to prune hypothesis, for forerunner
                        transcript
  --flashlight_decoder.smearing_mode FLASHLIGHT_DECODER.SMEARING_MODE
                        Decoder smearing mode. Can be logadd, max or none
  --flashlight_decoder.forerunner_use_lm FLASHLIGHT_DECODER.FORERUNNER_USE_LM
                        Bool that controls if the forerunner decoder should
                        use a language model
  --flashlight_decoder.num_tokenization FLASHLIGHT_DECODER.NUM_TOKENIZATION
                        Number of tokenizations to generate for each word in
                        the lexicon
  --flashlight_decoder.unk_score FLASHLIGHT_DECODER.UNK_SCORE
                        Coefficient for inserting unknown words in the
                        flashlight decoder. The higher it is, the more likely
                        it is to insert unknown words. See https://github.com/
                        flashlight/flashlight/blob/e16682fa32df30cbf675c8fe010
                        f929c61e3b833/flashlight/lib/text/decoder/LexiconDecod
                        er.h#L106
  --flashlight_decoder.log_add FLASHLIGHT_DECODER.LOG_ADD
                        If true, when the same state is reached by two
                        separate paths in the decoder, add the paths' scores
                        with addition in loglikelihood space. Otherwise, just
                        pick the maximum likelihood score. See https://github.
                        com/flashlight/flashlight/blob/e16682fa32df30cbf675c8f
                        e010f929c61e3b833/flashlight/lib/text/decoder/Utils.h#
                        L105

pass_through_decoder:
  --pass_through_decoder.max_sequence_idle_microseconds PASS_THROUGH_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --pass_through_decoder.max_batch_size PASS_THROUGH_DECODER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --pass_through_decoder.min_batch_size PASS_THROUGH_DECODER.MIN_BATCH_SIZE
  --pass_through_decoder.opt_batch_size PASS_THROUGH_DECODER.OPT_BATCH_SIZE
  --pass_through_decoder.preferred_batch_size PASS_THROUGH_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --pass_through_decoder.batching_type PASS_THROUGH_DECODER.BATCHING_TYPE
  --pass_through_decoder.preserve_ordering PASS_THROUGH_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --pass_through_decoder.instance_group_count PASS_THROUGH_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --pass_through_decoder.max_queue_delay_microseconds PASS_THROUGH_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --pass_through_decoder.optimization_graph_level PASS_THROUGH_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --pass_through_decoder.vocab_file PASS_THROUGH_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --pass_through_decoder.asr_model_delay PASS_THROUGH_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.

nemo_decoder:
  --nemo_decoder.max_sequence_idle_microseconds NEMO_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --nemo_decoder.max_batch_size NEMO_DECODER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --nemo_decoder.min_batch_size NEMO_DECODER.MIN_BATCH_SIZE
  --nemo_decoder.opt_batch_size NEMO_DECODER.OPT_BATCH_SIZE
  --nemo_decoder.preferred_batch_size NEMO_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --nemo_decoder.batching_type NEMO_DECODER.BATCHING_TYPE
  --nemo_decoder.preserve_ordering NEMO_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --nemo_decoder.instance_group_count NEMO_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --nemo_decoder.max_queue_delay_microseconds NEMO_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --nemo_decoder.optimization_graph_level NEMO_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --nemo_decoder.vocab_file NEMO_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --nemo_decoder.asr_model_delay NEMO_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.
  --nemo_decoder.compute_dtype
                        Datatype to use for ASR model
  --nemo_decoder.amp_dtype
                        Datatype to use for AMP
  --nemo_decoder.nemo_decoder_type NEMO_DECODER.NEMO_DECODER_TYPE
                        decoder to use for decoding
  --nemo_decoder.use_stateful_decoding
                        Whether to pass states to next chunk
  --nemo_decoder.use_amp
                        Whether to use AMP for inference

trtllm_decoder:
  --trtllm_decoder.max_sequence_idle_microseconds TRTLLM_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --trtllm_decoder.max_batch_size TRTLLM_DECODER.MAX_BATCH_SIZE
                        Max batch size to use
  --trtllm_decoder.min_batch_size TRTLLM_DECODER.MIN_BATCH_SIZE
  --trtllm_decoder.opt_batch_size TRTLLM_DECODER.OPT_BATCH_SIZE
  --trtllm_decoder.preferred_batch_size TRTLLM_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --trtllm_decoder.batching_type TRTLLM_DECODER.BATCHING_TYPE
  --trtllm_decoder.preserve_ordering TRTLLM_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --trtllm_decoder.instance_group_count TRTLLM_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --trtllm_decoder.max_queue_delay_microseconds TRTLLM_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --trtllm_decoder.optimization_graph_level TRTLLM_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --trtllm_decoder.world_size TRTLLM_DECODER.WORLD_SIZE
  --trtllm_decoder.quantize_dir TRTLLM_DECODER.QUANTIZE_DIR
  --trtllm_decoder.dtype TRTLLM_DECODER.DTYPE
                        Valid choices are ["float16", "float32", "bfloat16"]
  --trtllm_decoder.max_input_len TRTLLM_DECODER.MAX_INPUT_LEN
                        Max number of tokens in prompts
  --trtllm_decoder.max_output_len TRTLLM_DECODER.MAX_OUTPUT_LEN
                        Max number of output tokens from the decoder
  --trtllm_decoder.max_beam_width TRTLLM_DECODER.MAX_BEAM_WIDTH
                        Max beam width
  --trtllm_decoder.use_gpt_attention_plugin TRTLLM_DECODER.USE_GPT_ATTENTION_PLUGIN
                        Activates attention plugin. You can specify the plugin
                        dtype or leave blank to use the model dtype.Choices
                        are ["float16", "float32", "bfloat16"]
  --trtllm_decoder.use_bert_attention_plugin TRTLLM_DECODER.USE_BERT_ATTENTION_PLUGIN
                        Activates BERT attention plugin. You can specify the
                        plugin dtype or leave blank to use the model
                        dtype.Choices are ["float16", "float32", "bfloat16"]
  --trtllm_decoder.use_gemm_plugin TRTLLM_DECODER.USE_GEMM_PLUGIN
                        Activates GEMM plugin. You can specify the plugin
                        dtype or leave blank to use the model dtypeChoices are
                        ["float16", "float32", "bfloat16"]
  --trtllm_decoder.remove_input_padding TRTLLM_DECODER.REMOVE_INPUT_PADDING
                        remove input padding
  --trtllm_decoder.enable_context_fmha TRTLLM_DECODER.ENABLE_CONTEXT_FMHA
  --trtllm_decoder.use_weight_only TRTLLM_DECODER.USE_WEIGHT_ONLY
                        Quantize weights for the various GEMMs to
                        INT4/INT8.See --weight_only_precision to set the
                        precision
  --trtllm_decoder.weight_only_precision TRTLLM_DECODER.WEIGHT_ONLY_PRECISION
                        Define the precision for the weights when using
                        weight-only quantization.You must also use
                        --use_weight_only for that argument to have an impact.
  --trtllm_decoder.int8_kv_cache TRTLLM_DECODER.INT8_KV_CACHE
                        By default, we use dtype for KV cache. int8_kv_cache
                        chooses int8 quantization for KV
  --trtllm_decoder.debug_mode TRTLLM_DECODER.DEBUG_MODE
  --trtllm_decoder.vocab_file TRTLLM_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --trtllm_decoder.asr_model_delay TRTLLM_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.

greedy_decoder:
  --greedy_decoder.max_sequence_idle_microseconds GREEDY_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --greedy_decoder.max_batch_size GREEDY_DECODER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --greedy_decoder.min_batch_size GREEDY_DECODER.MIN_BATCH_SIZE
  --greedy_decoder.opt_batch_size GREEDY_DECODER.OPT_BATCH_SIZE
  --greedy_decoder.preferred_batch_size GREEDY_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --greedy_decoder.batching_type GREEDY_DECODER.BATCHING_TYPE
  --greedy_decoder.preserve_ordering GREEDY_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --greedy_decoder.instance_group_count GREEDY_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --greedy_decoder.max_queue_delay_microseconds GREEDY_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --greedy_decoder.optimization_graph_level GREEDY_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --greedy_decoder.max_execution_batch_size GREEDY_DECODER.MAX_EXECUTION_BATCH_SIZE
  --greedy_decoder.decoder_type GREEDY_DECODER.DECODER_TYPE
  --greedy_decoder.padding_size GREEDY_DECODER.PADDING_SIZE
                        padding_size
  --greedy_decoder.max_supported_transcripts GREEDY_DECODER.MAX_SUPPORTED_TRANSCRIPTS
  --greedy_decoder.asr_model_delay GREEDY_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.
  --greedy_decoder.ms_per_timestep GREEDY_DECODER.MS_PER_TIMESTEP
  --greedy_decoder.vocab_file GREEDY_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --greedy_decoder.decoder_num_worker_threads GREEDY_DECODER.DECODER_NUM_WORKER_THREADS
                        Number of threads to use for CPU decoders. If < 1,
                        maximum hardware concurrency is used.
  --greedy_decoder.force_decoder_reset_after_ms GREEDY_DECODER.FORCE_DECODER_RESET_AFTER_MS
                        force decoder reset after this number of milliseconds

os2s_decoder:
  --os2s_decoder.max_sequence_idle_microseconds OS2S_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --os2s_decoder.max_batch_size OS2S_DECODER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --os2s_decoder.min_batch_size OS2S_DECODER.MIN_BATCH_SIZE
  --os2s_decoder.opt_batch_size OS2S_DECODER.OPT_BATCH_SIZE
  --os2s_decoder.preferred_batch_size OS2S_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --os2s_decoder.batching_type OS2S_DECODER.BATCHING_TYPE
  --os2s_decoder.preserve_ordering OS2S_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --os2s_decoder.instance_group_count OS2S_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --os2s_decoder.max_queue_delay_microseconds OS2S_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --os2s_decoder.optimization_graph_level OS2S_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --os2s_decoder.max_execution_batch_size OS2S_DECODER.MAX_EXECUTION_BATCH_SIZE
  --os2s_decoder.decoder_type OS2S_DECODER.DECODER_TYPE
  --os2s_decoder.padding_size OS2S_DECODER.PADDING_SIZE
                        padding_size
  --os2s_decoder.max_supported_transcripts OS2S_DECODER.MAX_SUPPORTED_TRANSCRIPTS
  --os2s_decoder.asr_model_delay OS2S_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.
  --os2s_decoder.ms_per_timestep OS2S_DECODER.MS_PER_TIMESTEP
  --os2s_decoder.vocab_file OS2S_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --os2s_decoder.decoder_num_worker_threads OS2S_DECODER.DECODER_NUM_WORKER_THREADS
                        Number of threads to use for CPU decoders. If < 1,
                        maximum hardware concurrency is used.
  --os2s_decoder.force_decoder_reset_after_ms OS2S_DECODER.FORCE_DECODER_RESET_AFTER_MS
                        force decoder reset after this number of milliseconds
  --os2s_decoder.language_model_file OS2S_DECODER.LANGUAGE_MODEL_FILE
                        Language model file in binary format to be used by
                        KenLM
  --os2s_decoder.beam_search_width OS2S_DECODER.BEAM_SEARCH_WIDTH
                        Number of partial hypotheses saves to keep at each
                        step of the beam search
  --os2s_decoder.language_model_alpha OS2S_DECODER.LANGUAGE_MODEL_ALPHA
                        Weight given to the language model during beam search
  --os2s_decoder.language_model_beta OS2S_DECODER.LANGUAGE_MODEL_BETA
                        Word insertion score

kaldi_decoder:
  --kaldi_decoder.max_sequence_idle_microseconds KALDI_DECODER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --kaldi_decoder.max_batch_size KALDI_DECODER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --kaldi_decoder.min_batch_size KALDI_DECODER.MIN_BATCH_SIZE
  --kaldi_decoder.opt_batch_size KALDI_DECODER.OPT_BATCH_SIZE
  --kaldi_decoder.preferred_batch_size KALDI_DECODER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --kaldi_decoder.batching_type KALDI_DECODER.BATCHING_TYPE
  --kaldi_decoder.preserve_ordering KALDI_DECODER.PRESERVE_ORDERING
                        Preserve ordering
  --kaldi_decoder.instance_group_count KALDI_DECODER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --kaldi_decoder.max_queue_delay_microseconds KALDI_DECODER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --kaldi_decoder.optimization_graph_level KALDI_DECODER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --kaldi_decoder.max_execution_batch_size KALDI_DECODER.MAX_EXECUTION_BATCH_SIZE
  --kaldi_decoder.decoder_type KALDI_DECODER.DECODER_TYPE
  --kaldi_decoder.padding_size KALDI_DECODER.PADDING_SIZE
                        padding_size
  --kaldi_decoder.max_supported_transcripts KALDI_DECODER.MAX_SUPPORTED_TRANSCRIPTS
  --kaldi_decoder.asr_model_delay KALDI_DECODER.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. For Riva Conformer-Large models, one time
                        step corresponds to 40ms while for Citrinet-1024
                        models, one time step corresponds to 80ms. Decreasing
                        the asr_model_delay parameter by 1 will cause all
                        timestamps to be increased by 40ms for Conformer-Large
                        models, and 80ms for Citrinet-1024 models. This
                        parameter must be tuned since the CTC-based models are
                        not guaranteed to predict correct alignment.
  --kaldi_decoder.ms_per_timestep KALDI_DECODER.MS_PER_TIMESTEP
  --kaldi_decoder.vocab_file KALDI_DECODER.VOCAB_FILE
                        Vocab file to be used with decoder
  --kaldi_decoder.decoder_num_worker_threads KALDI_DECODER.DECODER_NUM_WORKER_THREADS
                        Number of threads to use for CPU decoders. If < 1,
                        maximum hardware concurrency is used.
  --kaldi_decoder.force_decoder_reset_after_ms KALDI_DECODER.FORCE_DECODER_RESET_AFTER_MS
                        force decoder reset after this number of milliseconds
  --kaldi_decoder.fst_filename KALDI_DECODER.FST_FILENAME
                        Fst file to use during decoding
  --kaldi_decoder.word_syms_filename KALDI_DECODER.WORD_SYMS_FILENAME
  --kaldi_decoder.default_beam KALDI_DECODER.DEFAULT_BEAM
  --kaldi_decoder.max_active KALDI_DECODER.MAX_ACTIVE
  --kaldi_decoder.acoustic_scale KALDI_DECODER.ACOUSTIC_SCALE
  --kaldi_decoder.decoder_num_copy_threads KALDI_DECODER.DECODER_NUM_COPY_THREADS
  --kaldi_decoder.determinize_lattice KALDI_DECODER.DETERMINIZE_LATTICE

rescorer:
  --rescorer.max_sequence_idle_microseconds RESCORER.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --rescorer.max_batch_size RESCORER.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --rescorer.min_batch_size RESCORER.MIN_BATCH_SIZE
  --rescorer.opt_batch_size RESCORER.OPT_BATCH_SIZE
  --rescorer.preferred_batch_size RESCORER.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --rescorer.batching_type RESCORER.BATCHING_TYPE
  --rescorer.preserve_ordering RESCORER.PRESERVE_ORDERING
                        Preserve ordering
  --rescorer.instance_group_count RESCORER.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --rescorer.max_queue_delay_microseconds RESCORER.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --rescorer.optimization_graph_level RESCORER.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --rescorer.max_supported_transcripts RESCORER.MAX_SUPPORTED_TRANSCRIPTS
  --rescorer.score_lm_carpa_filename RESCORER.SCORE_LM_CARPA_FILENAME
  --rescorer.decode_lm_carpa_filename RESCORER.DECODE_LM_CARPA_FILENAME
  --rescorer.word_syms_filename RESCORER.WORD_SYMS_FILENAME
  --rescorer.word_insertion_penalty RESCORER.WORD_INSERTION_PENALTY
  --rescorer.num_worker_threads RESCORER.NUM_WORKER_THREADS
  --rescorer.ms_per_timestep RESCORER.MS_PER_TIMESTEP
  --rescorer.boundary_character_ids RESCORER.BOUNDARY_CHARACTER_IDS
  --rescorer.vocab_file RESCORER.VOCAB_FILE
                        Vocab file to be used with decoder

lm_decoder_cpu:
  --lm_decoder_cpu.beam_search_width LM_DECODER_CPU.BEAM_SEARCH_WIDTH
  --lm_decoder_cpu.decoder_type LM_DECODER_CPU.DECODER_TYPE
  --lm_decoder_cpu.padding_size LM_DECODER_CPU.PADDING_SIZE
                        padding_size
  --lm_decoder_cpu.language_model_file LM_DECODER_CPU.LANGUAGE_MODEL_FILE
                        Language model file in binary format to be used by
                        KenLM
  --lm_decoder_cpu.max_supported_transcripts LM_DECODER_CPU.MAX_SUPPORTED_TRANSCRIPTS
  --lm_decoder_cpu.asr_model_delay LM_DECODER_CPU.ASR_MODEL_DELAY
                        (Advanced) Number of time steps by which the acoustic
                        model output should be shifted when computing
                        timestamps. This parameter must be tuned since the CTC
                        model is not guaranteed to predict correct alignment.
  --lm_decoder_cpu.language_model_alpha LM_DECODER_CPU.LANGUAGE_MODEL_ALPHA
  --lm_decoder_cpu.language_model_beta LM_DECODER_CPU.LANGUAGE_MODEL_BETA
  --lm_decoder_cpu.ms_per_timestep LM_DECODER_CPU.MS_PER_TIMESTEP
  --lm_decoder_cpu.vocab_file LM_DECODER_CPU.VOCAB_FILE
                        Vocab file to be used with decoder
  --lm_decoder_cpu.lexicon_file LM_DECODER_CPU.LEXICON_FILE
                        Lexicon file to be used with decoder
  --lm_decoder_cpu.beam_size LM_DECODER_CPU.BEAM_SIZE
                        Maximum number of hypothesis the decoder holds after
                        each step
  --lm_decoder_cpu.beam_size_token LM_DECODER_CPU.BEAM_SIZE_TOKEN
                        Maximum number of tokens the decoder considers at each
                        step
  --lm_decoder_cpu.beam_threshold LM_DECODER_CPU.BEAM_THRESHOLD
                        Threshold to prune hypothesis
  --lm_decoder_cpu.lm_weight LM_DECODER_CPU.LM_WEIGHT
                        Weight of language model
  --lm_decoder_cpu.word_insertion_score LM_DECODER_CPU.WORD_INSERTION_SCORE
                        Word insertion score
  --lm_decoder_cpu.forerunner_beam_size LM_DECODER_CPU.FORERUNNER_BEAM_SIZE
                        Maximum number of hypothesis the decoder holds after
                        each step, for forerunner transcript
  --lm_decoder_cpu.forerunner_beam_size_token LM_DECODER_CPU.FORERUNNER_BEAM_SIZE_TOKEN
                        Maximum number of tokens the decoder considers at each
                        step, for forerunner transcript
  --lm_decoder_cpu.forerunner_beam_threshold LM_DECODER_CPU.FORERUNNER_BEAM_THRESHOLD
                        Threshold to prune hypothesis, for forerunner
                        transcript
  --lm_decoder_cpu.smearing_mode LM_DECODER_CPU.SMEARING_MODE
                        Decoder smearing mode. Can be logadd, max or none
  --lm_decoder_cpu.forerunner_use_lm LM_DECODER_CPU.FORERUNNER_USE_LM
                        Bool that controls if the forerunner decoder should
                        use a language model

asr_ensemble_backend:
  --asr_ensemble_backend.max_sequence_idle_microseconds ASR_ENSEMBLE_BACKEND.MAX_SEQUENCE_IDLE_MICROSECONDS
                        Global timeout, in ms
  --asr_ensemble_backend.max_batch_size ASR_ENSEMBLE_BACKEND.MAX_BATCH_SIZE
                        Default maximum parallel requests in a single forward
                        pass
  --asr_ensemble_backend.min_batch_size ASR_ENSEMBLE_BACKEND.MIN_BATCH_SIZE
  --asr_ensemble_backend.opt_batch_size ASR_ENSEMBLE_BACKEND.OPT_BATCH_SIZE
  --asr_ensemble_backend.preferred_batch_size ASR_ENSEMBLE_BACKEND.PREFERRED_BATCH_SIZE
                        Preferred batch size, must be smaller than Max batch
                        size
  --asr_ensemble_backend.batching_type ASR_ENSEMBLE_BACKEND.BATCHING_TYPE
  --asr_ensemble_backend.preserve_ordering ASR_ENSEMBLE_BACKEND.PRESERVE_ORDERING
                        Preserve ordering
  --asr_ensemble_backend.instance_group_count ASR_ENSEMBLE_BACKEND.INSTANCE_GROUP_COUNT
                        How many instances in a group
  --asr_ensemble_backend.max_queue_delay_microseconds ASR_ENSEMBLE_BACKEND.MAX_QUEUE_DELAY_MICROSECONDS
                        Maximum amount of time to allow requests to queue to
                        form a batch in microseconds
  --asr_ensemble_backend.optimization_graph_level ASR_ENSEMBLE_BACKEND.OPTIMIZATION_GRAPH_LEVEL
                        The Graph optimization level to use in Triton model
                        configuration
  --asr_ensemble_backend.language_code ASR_ENSEMBLE_BACKEND.LANGUAGE_CODE
                        Language of the model
  --asr_ensemble_backend.streaming ASR_ENSEMBLE_BACKEND.STREAMING
                        Execute model in streaming mode
  --asr_ensemble_backend.offline
                        Marking the model to be used with offline API in Riva
  --asr_ensemble_backend.type
                        Marking the model to be used with offline API in Riva

NVIDIA Riva

Pipeline Configuration

Contents

Pipeline Configuration#

Streaming/Offline Recognition#

Language Models#

ARPA Format Language Model#

KenLM Binary Language Model#

Decoder Hyper-Parameters#

Flashlight Decoder Lexicon#

(Advanced) Manually Adding Additional Tokenizations of Words in Lexicon#

Flashlight Decoder Lexicon Free#

OpenSeq2Seq Decoder#

Beginning/End of Utterance Detection#

Streaming Speaker Diarization#

Sortformer#

Neural-Based Voice Activity Detection#

Silero VAD#

MarbleNet VAD#

Generating Multiple Transcript Hypotheses#

Impact of Chunk Size and Padding Size on Performance and Accuracy (Advanced)#

Riva-build Optional Parameters#

NVIDIA Riva

Pipeline Configuration

Contents

Pipeline Configuration#

Streaming/Offline Recognition#

Language Models#

ARPA Format Language Model#

KenLM Binary Language Model#

Decoder Hyper-Parameters#

Flashlight Decoder Lexicon#

(Advanced) Manually Adding Additional Tokenizations of Words in Lexicon#

Flashlight Decoder Lexicon Free#

OpenSeq2Seq Decoder#

Beginning/End of Utterance Detection#

Streaming Speaker Diarization#

Sortformer#

Neural-Based Voice Activity Detection#

Silero VAD#

MarbleNet VAD#

Generating Multiple Transcript Hypotheses#

Impact of Chunk Size and Padding Size on Performance and Accuracy (Advanced)#

Sharing Acoustic and Feature Extractor Models Across Multiple ASR Pipelines (Advanced)#

Riva-build Optional Parameters#