riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
Note
For deploying the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=conformer-es-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
Note
For deploying the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=conformer-es-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--max_batch_size=16 \
--featurizer.max_batch_size=512 \
--featurizer.max_execution_batch_size=512 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_file> \
--flashlight_decoder.lm_weight=0.2 \
--flashlight_decoder.word_insertion_score=0.2 \
--flashlight_decoder.beam_threshold=20. \
--language_code=es-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
Note
For deploying the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.1 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.64 --streaming_diarizer.right_context_size=0.56.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build punctuation <rmir_filename>:<key> <riva_file>:<key> --language_code=en-US --name=riva-punctuation-en-US
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.1 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.96 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build punctuation <rmir_filename>:<key> <riva_file>:<key> --language_code=en-US --name=riva-punctuation-en-US
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.lm_weight=0.1 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=4.8 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build punctuation <rmir_filename>:<key> <riva_file>:<key> --language_code=en-US --name=riva-punctuation-en-US
riva-build speech_recognition \ <rmir_filename>:<key> \ <riva_file>:<key> \ --profane_words_file=<txt_profane_words_file> \ --name=parakeet-0.6b-unified-ml-cs-es-US-asr-streaming \ --return_separate_utterances=False \ --featurizer.use_utterance_norm_params=False \ --featurizer.precalc_norm_time_steps=0 \ --featurizer.precalc_norm_params=False \ --ms_per_timestep=80 \ --endpointing.residue_blanks_at_start=-16 \ --nn.fp16_needs_obey_precision_pass \ --unified_acoustic_model \ --chunk_size=0.32 \ --left_padding_size=3.92 \ --right_padding_size=3.92 \ --decoder_chunk_size=0.96 \ --decoder_type=flashlight \ --decoding_language_model_binary=<bin_file> \ --decoding_vocab=<txt_decoding_vocab_file> \ --flashlight_decoder.lm_weight=0.8 \ --flashlight_decoder.word_insertion_score=1.0 \ --flashlight_decoder.beam_size=32 \ --flashlight_decoder.beam_threshold=20. \ --flashlight_decoder.num_tokenization=1 \ --language_code=es-US \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.64 --streaming_diarizer.right_context_size=0.56.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \ <rmir_filename>:<key> \ <riva_file>:<key> \ --profane_words_file=<txt_profane_words_file> \ --name=parakeet-0.6b-unified-ml-cs-es-US-asr-streaming-throughput \ --return_separate_utterances=False \ --featurizer.use_utterance_norm_params=False \ --featurizer.precalc_norm_time_steps=0 \ --featurizer.precalc_norm_params=False \ --ms_per_timestep=80 \ --endpointing.residue_blanks_at_start=-16 \ --nn.fp16_needs_obey_precision_pass \ --unified_acoustic_model \ --chunk_size=0.96 \ --left_padding_size=3.92 \ --right_padding_size=3.92 \ --decoder_type=flashlight \ --decoding_language_model_binary=<bin_file> \ --decoding_vocab=<txt_decoding_vocab_file> \ --flashlight_decoder.lm_weight=0.8 \ --flashlight_decoder.word_insertion_score=1.0 \ --flashlight_decoder.beam_size=32 \ --flashlight_decoder.beam_threshold=20. \ --flashlight_decoder.num_tokenization=1 \ --language_code=es-US \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.96 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \ <rmir_filename>:<key> \ <riva_file>:<key> \ --profane_words_file=<txt_profane_words_file> \ --offline \ --name=parakeet-0.6b-unified-ml-cs-es-US-asr-offline \ --return_separate_utterances=True \ --featurizer.use_utterance_norm_params=False \ --featurizer.precalc_norm_time_steps=0 \ --featurizer.precalc_norm_params=False \ --ms_per_timestep=80 \ --endpointing.residue_blanks_at_start=-16 \ --nn.fp16_needs_obey_precision_pass \ --unified_acoustic_model \ --chunk_size=4.8 \ --left_padding_size=1.6 \ --right_padding_size=1.6 \ --featurizer.max_batch_size=256 \ --featurizer.max_execution_batch_size=256 \ --decoder_type=flashlight \ --decoding_language_model_binary=<bin_file> \ --decoding_vocab=<txt_decoding_vocab_file> \ --flashlight_decoder.lm_weight=0.8 \ --flashlight_decoder.word_insertion_score=1.0 \ --flashlight_decoder.beam_size=32 \ --flashlight_decoder.beam_threshold=20. \ --flashlight_decoder.num_tokenization=1 \ --language_code=es-US \ --wfst_tokenizer_model=<far_tokenizer_file> \ --wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=4.8 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-vi-VN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.use_trt_fp32 \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_chunk_size=0.96 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=1 \
--flashlight_decoder.asr_model_delay=-1 \
--endpointing.residue_blanks_at_start=-2 \
--profane_words_file=<txt_profane_words_file> \
--language_code=vi-VN
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.64 --streaming_diarizer.right_context_size=0.56.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-vi-VN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.use_trt_fp32 \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=1 \
--flashlight_decoder.asr_model_delay=-1 \
--endpointing.residue_blanks_at_start=-2 \
--profane_words_file=<txt_profane_words_file> \
--language_code=vi-VN
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.96 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-vi-VN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.use_trt_fp32 \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_lexicon=<txt_decoding_lexicon_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.lm_weight=0.5 \
--flashlight_decoder.word_insertion_score=1 \
--flashlight_decoder.asr_model_delay=-1 \
--endpointing.residue_blanks_at_start=-2 \
--profane_words_file=<txt_profane_words_file> \
--language_code=vi-VN
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=4.8 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-zh-CN-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.16 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_chunk_size=0.96 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.4 \
--flashlight_decoder.word_insertion_score=1.5 \
--profane_words_file=<txt_profane_words_file> \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.64 --streaming_diarizer.right_context_size=0.56.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation: No separate model required, the ASR model automatically generates punctuated text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-0.6b-unified-zh-CN-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.96 \
--left_padding_size=3.92 \
--right_padding_size=3.92 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.4 \
--flashlight_decoder.word_insertion_score=1.5 \
--profane_words_file=<txt_profane_words_file> \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.96 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation: No separate model required, the ASR model automatically generates punctuated text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-0.6b-unified-zh-CN-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_size_token=32 \
--flashlight_decoder.beam_threshold=30. \
--flashlight_decoder.lm_weight=0.4 \
--flashlight_decoder.word_insertion_score=1.5 \
--profane_words_file=<txt_profane_words_file> \
--language_code=zh-CN \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=4.8 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation: No separate model required, the ASR model automatically generates punctuated text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-en-US-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.64 --streaming_diarizer.right_context_size=0.56.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build punctuation <rmir_filename>:<key> <riva_file>:<key> --language_code=en-US --name=riva-punctuation-en-US
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--name=parakeet-1.1b-en-US-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.96 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=0.96 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build punctuation <rmir_filename>:<key> <riva_file>:<key> --language_code=en-US --name=riva-punctuation-en-US
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=parakeet-1.1b-en-US-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=4.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=<bin_file> \
--decoding_vocab=<txt_decoding_vocab_file> \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--profane_words_file=<txt_profane_words_file> \
--language_code=en-US \
--wfst_tokenizer_model=<far_tokenizer_file> \
--wfst_verbalizer_model=<far_verbalizer_file> \
--speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: To deploy the model in greedy mode, replace the --flashlight_decoder related parameters from the above command and add --decoder_type=greedy --greedy_decoder.asr_model_delay=-1.
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=4.8 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <VAD_riva_file>:<key> and add the following parameters: --vad_type=silero --neural_vad_nn.optimization_graph_level=-1 --neural_vad.filter_speech_first=false --neural_vad.min_duration_on=0.2 --neural_vad.min_duration_off=0.5 --neural_vad.onset=0.85 --neural_vad.offset=0.3 --neural_vad.pad_offset=0.08 --neural_vad.pad_onset=0.3 --enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build punctuation <rmir_filename>:<key> <riva_file>:<key> --language_code=en-US --name=riva-punctuation-en-US
riva-build speech_recognition <rmir_filename>:<key> \
<riva_file>:<key> \
--profane_words_file=<txt_profane_words_file> \
--name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=0.32 \
--left_padding_size=4.64 \
--right_padding_size=4.64 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--max_batch_size=32 \
--nn.max_batch_size=32 \
--nn.opt_batch_size=32 \
--endpointing_type=niva \
--endpointing.stop_history=800 \
--endpointing.stop_th=1.0 \
--endpointing.residue_blanks_at_end=0 \
--nemo_decoder.use_stateful_decoding \
--decoder_type=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: --nemo_decoder.language_model_alpha=0.5 --nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=1.6 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: Not supported.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition <rmir_filename>:<key> \
<riva_file>:<key> \
--profane_words_file=<txt_profane_words_file> \
--name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming-throughput \
--return_separate_utterances=False \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--endpointing.residue_blanks_at_start=-2 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=1.6 \
--left_padding_size=4.0 \
--right_padding_size=4.0 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--max_batch_size=64 \
--nn.opt_batch_size=64 \
--endpointing_type=niva \
--endpointing.stop_history=800 \
--endpointing.stop_th=1.0 \
--endpointing.residue_blanks_at_end=0 \
--nemo_decoder.use_stateful_decoding \
--decoder_type=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: --nemo_decoder.language_model_alpha=0.5 --nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see the (nvidia-riva/tutorials).
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=1.6 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: Not supported.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition <rmir_filename>:<key> \
<riva_file>:<key> \
--profane_words_file=<txt_profane_words_file> \
--offline \
--name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=8.0 \
--left_padding_size=0 \
--right_padding_size=0 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--max_batch_size=128 \
--nn.opt_batch_size=128 \
--endpointing_type=niva \
--endpointing.stop_history=0 \
--decoder_type=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: --nemo_decoder.language_model_alpha=0.5 --nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
Speaker Diarization: To enable speaker diarization, include the Sortformer RIVA file in the command: riva-build speech_recognition <rmir_filename>:<key> <riva_file>:<key> <sortformer_riva_file>:<key> and add the following parameters: --diarizer_type=sortformer --streaming_diarizer.center_chunk_size=1.6 --streaming_diarizer.right_context_size=0.
Voice Activity Detection: Not supported.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition <rmir_filename>:<key> \
<riva_file>:<key> \
--profane_words_file=<txt_profane_words_file> \
--offline \
--name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-offline \
--return_separate_utterances=True \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=80 \
--language_code=en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi \
--nn.fp16_needs_obey_precision_pass \
--unified_acoustic_model \
--chunk_size=8.0 \
--left_padding_size=0 \
--right_padding_size=0 \
--featurizer.max_batch_size=256 \
--featurizer.max_execution_batch_size=256 \
--max_batch_size=128 \
--nn.opt_batch_size=128 \
--endpointing_type=niva \
--endpointing.stop_history=0 \
--decoder_type=nemo
Note
GPU-based Language Model: Not supported.
Speaker Diarization: Not supported.
Voice Activity Detection: Not supported.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--offline \
--name=whisper-large-v3-multi-asr-offline \
--return_separate_utterances=True \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--decoder_type trtllm \
--unified_acoustic_model \
--feature_extractor_type torch \
--featurizer.norm_per_feature false \
--max_batch_size 8 \
--featurizer.precalc_norm_params False \
--featurizer.max_batch_size=8 \
--featurizer.max_execution_batch_size=8 \
--language_code=en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi
riva-build speech_recognition \
<rmir_filename>:<key> \
<riva_file>:<key> \
--profane_words_file=<profane_words_file> \
--offline \
--name=canary-1b-multi-asr-offline \
--return_separate_utterances=True \
--unified_acoustic_model \
--language_code=en-US,ar-AR,bg-BG,ca-ES,cs-CZ,da-DK,de-AT,de-CH,de-DE,el-GR,el-IL,et-EE,en-AM,en-AU,en-CA,en-EU,en-GB,en-IN,en-ME,en-MY,en-PH,en-SA,en-SG,en-UA,en-ZA,es-AR,es-CL,es-ES,es-LA,es-PY,es-UY,es-US,es-MX,fi-FI,fr-BE,fr-CA,fr-CH,fr-FR,he-IL,hi-IN,hu-HU,hr-HR,id-ID,it-IT,it-CH,lt-LT,lv-LV,ja-JP,km-KH,ko-KR,my-MM,nb-NO,nn-NO,nl-NL,nl-BE,nn-NB,pl-PL,pt-BR,pt-PT,ro-RO,ru-AM,ru-RU,ru-UA,sk-SK,sl-SI,sv-SE,th-TH,tr-TR,uk-UA,vi-VN,zh-CN,zh-TW \
--chunk_size 30 \
--left_padding_size 0 \
--right_padding_size 0 \
--feature_extractor_type torch \
--torch_feature_type nemo \
--max_batch_size 8 \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_params=False \
--featurizer.max_batch_size=128 \
--featurizer.max_execution_batch_size=128 \
--ms_per_timestep=80 \
--share_flags=True \
--featurizer.norm_per_feature false \
--decoder_type trtllm \
--trtllm_decoder.max_output_len 200 \
--trtllm_decoder.decoupled_mode true