riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-en-US-asr-streaming \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=0.16 \
left_padding_size=1.92 \
right_padding_size=1.92 \
decoder_chunk_size=0.96 \
decoder=flashlight \
flashlight_decoder.asr_model_delay=-1 \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_lexicon_file> \
flashlight_decoder.lm_weight=0.8 \
flashlight_decoder.word_insertion_score=1.0 \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.num_tokenization=1 \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-en-US-asr-streaming-throughput \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=0.96 \
left_padding_size=1.92 \
right_padding_size=1.92 \
decoder=flashlight \
flashlight_decoder.asr_model_delay=-1 \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_lexicon_file> \
flashlight_decoder.lm_weight=0.8 \
flashlight_decoder.word_insertion_score=1.0 \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.num_tokenization=1 \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=256 streaming_diarizer_nn.fifo_len=112.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-en-US-asr-offline \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=4.8 \
left_padding_size=1.6 \
right_padding_size=1.6 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
decoder=flashlight \
flashlight_decoder.asr_model_delay=-1 \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_lexicon_file> \
flashlight_decoder.lm_weight=0.8 \
flashlight_decoder.word_insertion_score=1.0 \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.num_tokenization=1 \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=4.8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=480 streaming_diarizer_nn.spkcache_len=332 streaming_diarizer_nn.fifo_len=120.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \ output_path=<rmir_filename>:<key> \ 'source_path=[<riva_file>:<key>]' \ profane_words_file=<txt_profane_words_file> \ name=parakeet-0.6b-unified-ml-cs-es-US-asr-streaming \ featurizer.use_utterance_norm_params=False \ featurizer.precalc_norm_time_steps=0 \ featurizer.precalc_norm_params=False \ ms_per_timestep=80 \ endpointing.residue_blanks_at_start=-16 \ nn.fp16_needs_obey_precision_pass=True \ unified_acoustic_model=true \ chunk_size=0.32 \ left_padding_size=3.92 \ right_padding_size=3.92 \ decoder_chunk_size=0.96 \ decoder=flashlight \ decoding_language_model_binary=<bin_file> \ decoding_vocab=<txt_file> \ flashlight_decoder.lm_weight=0.8 \ flashlight_decoder.word_insertion_score=1.0 \ flashlight_decoder.beam_size=32 \ flashlight_decoder.beam_threshold=20. \ flashlight_decoder.num_tokenization=1 \ language_code=es-US \ wfst_tokenizer_model=<far_tokenizer_file> \ wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \ output_path=<rmir_filename>:<key> \ 'source_path=[<riva_file>:<key>]' \ profane_words_file=<txt_profane_words_file> \ name=parakeet-0.6b-unified-ml-cs-es-US-asr-streaming-throughput \ featurizer.use_utterance_norm_params=False \ featurizer.precalc_norm_time_steps=0 \ featurizer.precalc_norm_params=False \ ms_per_timestep=80 \ endpointing.residue_blanks_at_start=-16 \ nn.fp16_needs_obey_precision_pass=True \ unified_acoustic_model=true \ chunk_size=0.96 \ left_padding_size=3.92 \ right_padding_size=3.92 \ decoder=flashlight \ decoding_language_model_binary=<bin_file> \ decoding_vocab=<txt_file> \ flashlight_decoder.lm_weight=0.8 \ flashlight_decoder.word_insertion_score=1.0 \ flashlight_decoder.beam_size=32 \ flashlight_decoder.beam_threshold=20. \ flashlight_decoder.num_tokenization=1 \ language_code=es-US \ wfst_tokenizer_model=<far_tokenizer_file> \ wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=256 streaming_diarizer_nn.fifo_len=112.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \ output_path=<rmir_filename>:<key> \ 'source_path=[<riva_file>:<key>]' \ profane_words_file=<txt_profane_words_file> \ name=parakeet-0.6b-unified-ml-cs-es-US-asr-offline \ featurizer.use_utterance_norm_params=False \ featurizer.precalc_norm_time_steps=0 \ featurizer.precalc_norm_params=False \ ms_per_timestep=80 \ endpointing.residue_blanks_at_start=-16 \ nn.fp16_needs_obey_precision_pass=True \ unified_acoustic_model=true \ chunk_size=4.8 \ left_padding_size=1.6 \ right_padding_size=1.6 \ featurizer.max_batch_size=256 \ featurizer.max_execution_batch_size=256 \ decoder=flashlight \ decoding_language_model_binary=<bin_file> \ decoding_vocab=<txt_file> \ flashlight_decoder.lm_weight=0.8 \ flashlight_decoder.word_insertion_score=1.0 \ flashlight_decoder.beam_size=32 \ flashlight_decoder.beam_threshold=20. \ flashlight_decoder.num_tokenization=1 \ language_code=es-US \ wfst_tokenizer_model=<far_tokenizer_file> \ wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=4.8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=480 streaming_diarizer_nn.spkcache_len=332 streaming_diarizer_nn.fifo_len=120.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-unified-vi-VN-asr-streaming \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.use_trt_fp32=True \
unified_acoustic_model=true \
chunk_size=0.16 \
left_padding_size=1.92 \
right_padding_size=1.92 \
decoder_chunk_size=0.96 \
decoder=flashlight \
decoding_language_model_binary=<bin_file> \
decoding_lexicon=<txt_decoding_lexicon_file> \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_size_token=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.lm_weight=0.5 \
flashlight_decoder.word_insertion_score=1 \
flashlight_decoder.asr_model_delay=-1 \
endpointing.residue_blanks_at_start=-2 \
language_code=vi-VN
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-unified-vi-VN-asr-streaming-throughput \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.use_trt_fp32=True \
unified_acoustic_model=true \
chunk_size=0.96 \
left_padding_size=1.92 \
right_padding_size=1.92 \
decoder=flashlight \
decoding_language_model_binary=<bin_file> \
decoding_lexicon=<txt_decoding_lexicon_file> \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_size_token=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.lm_weight=0.5 \
flashlight_decoder.word_insertion_score=1 \
flashlight_decoder.asr_model_delay=-1 \
endpointing.residue_blanks_at_start=-2 \
language_code=vi-VN
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=256 streaming_diarizer_nn.fifo_len=112.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-unified-vi-VN-asr-offline \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
nn.use_trt_fp32=True \
unified_acoustic_model=true \
chunk_size=4.8 \
left_padding_size=1.6 \
right_padding_size=1.6 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
decoder=flashlight \
decoding_language_model_binary=<bin_file> \
decoding_lexicon=<txt_decoding_lexicon_file> \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_size_token=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.lm_weight=0.5 \
flashlight_decoder.word_insertion_score=1 \
flashlight_decoder.asr_model_delay=-1 \
endpointing.residue_blanks_at_start=-2 \
language_code=vi-VN
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=4.8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=480 streaming_diarizer_nn.spkcache_len=332 streaming_diarizer_nn.fifo_len=120.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-unified-zh-CN-asr-streaming \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=0.16 \
left_padding_size=3.92 \
right_padding_size=3.92 \
decoder_chunk_size=0.96 \
decoder=flashlight \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_vocab_file> \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_size_token=32 \
flashlight_decoder.beam_threshold=30. \
flashlight_decoder.lm_weight=0.4 \
flashlight_decoder.word_insertion_score=1.5 \
language_code=zh-CN \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation: No separate model required, the ASR model automatically generates punctuated text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-unified-zh-CN-asr-streaming-throughput \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=0.96 \
left_padding_size=3.92 \
right_padding_size=3.92 \
decoder=flashlight \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_vocab_file> \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_size_token=32 \
flashlight_decoder.beam_threshold=30. \
flashlight_decoder.lm_weight=0.4 \
flashlight_decoder.word_insertion_score=1.5 \
language_code=zh-CN \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=256 streaming_diarizer_nn.fifo_len=112.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation: No separate model required, the ASR model automatically generates punctuated text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-0.6b-unified-zh-CN-asr-offline \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=4.8 \
left_padding_size=1.6 \
right_padding_size=1.6 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
decoder=flashlight \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_vocab_file> \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_size_token=32 \
flashlight_decoder.beam_threshold=30. \
flashlight_decoder.lm_weight=0.4 \
flashlight_decoder.word_insertion_score=1.5 \
language_code=zh-CN \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=4.8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=480 streaming_diarizer_nn.spkcache_len=332 streaming_diarizer_nn.fifo_len=120.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation: No separate model required, the ASR model automatically generates punctuated text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-1.1b-en-US-asr-streaming \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=0.16 \
left_padding_size=1.92 \
right_padding_size=1.92 \
decoder_chunk_size=0.96 \
decoder=flashlight \
flashlight_decoder.asr_model_delay=-1 \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_vocab_file> \
flashlight_decoder.lm_weight=0.8 \
flashlight_decoder.word_insertion_score=1.0 \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.num_tokenization=1 \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-1.1b-en-US-asr-streaming-throughput \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=0.96 \
left_padding_size=1.92 \
right_padding_size=1.92 \
decoder=flashlight \
flashlight_decoder.asr_model_delay=-1 \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_vocab_file> \
flashlight_decoder.lm_weight=0.8 \
flashlight_decoder.word_insertion_score=1.0 \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.num_tokenization=1 \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=256 streaming_diarizer_nn.fifo_len=112.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-1.1b-en-US-asr-offline \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=4.8 \
left_padding_size=1.6 \
right_padding_size=1.6 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
decoder=flashlight \
flashlight_decoder.asr_model_delay=-1 \
decoding_language_model_binary=<bin_file> \
decoding_vocab=<txt_decoding_vocab_file> \
flashlight_decoder.lm_weight=0.8 \
flashlight_decoder.word_insertion_score=1.0 \
flashlight_decoder.beam_size=32 \
flashlight_decoder.beam_threshold=20. \
flashlight_decoder.num_tokenization=1 \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.
FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=4.8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=480 streaming_diarizer_nn.spkcache_len=332 streaming_diarizer_nn.fifo_len=120.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
language_code=\'en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi\' \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=0.32 \
left_padding_size=4.64 \
right_padding_size=4.64 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=32 \
nn.max_batch_size=32 \
nn.opt_batch_size=32 \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.stop_th=1.0 \
endpointing.residue_blanks_at_end=0 \
nemo_decoder.use_stateful_decoding=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=1.6 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=160 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming-throughput-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
language_code=\'en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi\' \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=1.6 \
left_padding_size=4.0 \
right_padding_size=4.0 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=64 \
nn.opt_batch_size=64 \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.stop_th=1.0 \
endpointing.residue_blanks_at_end=0 \
nemo_decoder.use_stateful_decoding=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see the (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=272 streaming_diarizer_nn.fifo_len=96.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-offline-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
language_code=\'en-US,en-GB,es-ES,ar-AR,es-US,pt-BR,fr-FR,de-DE,it-IT,ja-JP,ko-KR,ru-RU,hi-IN,he-IL,nb-NO,nl-NL,cs-CZ,da-DK,fr-CA,pl-PL,sv-SE,th-TH,tr-TR,pt-PT,nn-NO,multi\' \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=8.0 \
left_padding_size=0 \
right_padding_size=0 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=128 \
nn.opt_batch_size=128 \
endpointing_type=niva \
endpointing.stop_history=0 \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=800 streaming_diarizer_nn.spkcache_len=312 streaming_diarizer_nn.fifo_len=100.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-unified-ml-cs-universal-prompt-asr-streaming-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
language_code=\'en-US,en-GB,es-ES,es-US,fr-FR,de-DE,ar-AR,pt-BR,it-IT,ja-JP,ko-KR,ru-RU,hi-IN\' \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=0.32 \
left_padding_size=4.64 \
right_padding_size=4.64 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=32 \
nn.max_batch_size=32 \
nn.opt_batch_size=32 \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.stop_th=1.0 \
endpointing.residue_blanks_at_end=0 \
nemo_decoder.use_stateful_decoding=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=1.6 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=160 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-unified-ml-cs-universal-prompt-asr-streaming-throughput-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
language_code=\'en-US,en-GB,es-ES,es-US,fr-FR,de-DE,ar-AR,pt-BR,it-IT,ja-JP,ko-KR,ru-RU,hi-IN\' \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=0.96 \
left_padding_size=4.32 \
right_padding_size=4.32 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=64 \
nn.opt_batch_size=64 \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.stop_th=1.0 \
endpointing.residue_blanks_at_end=0 \
nemo_decoder.use_stateful_decoding=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see the (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=1.6 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=160 streaming_diarizer_nn.spkcache_len=264 streaming_diarizer_nn.fifo_len=100.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-unified-ml-cs-universal-prompt-asr-offline-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
language_code=\'en-US,en-GB,es-ES,es-US,fr-FR,de-DE,ar-AR,pt-BR,it-IT,ja-JP,ko-KR,ru-RU,hi-IN\' \
nn.fp16_needs_obey_precision_pass=True \
unified_acoustic_model=true \
chunk_size=8.0 \
left_padding_size=0 \
right_padding_size=0 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=128 \
nn.opt_batch_size=128 \
endpointing_type=niva \
endpointing.stop_history=0 \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=800 streaming_diarizer_nn.spkcache_len=312 streaming_diarizer_nn.fifo_len=100.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-indic-asr-streaming-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
language_code=\'bn-IN,en-US,hi-IN,ta-IN,indic\' \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=0.32 \
left_padding_size=4.64 \
right_padding_size=4.64 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=64 \
nn.opt_batch_size=64 \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.stop_th=1.0 \
endpointing.residue_blanks_at_end=0 \
nemo_decoder.use_stateful_decoding=true \
unified_acoustic_model=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=1.6 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=160 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-indic-asr-streaming-throughput-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
language_code=\'bn-IN,en-US,hi-IN,ta-IN,indic\' \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=0.96 \
left_padding_size=4.32 \
right_padding_size=4.32 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=64 \
nn.opt_batch_size=64 \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.stop_th=1.0 \
endpointing.residue_blanks_at_end=0 \
nemo_decoder.use_stateful_decoding=true \
unified_acoustic_model=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see the (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=272 streaming_diarizer_nn.fifo_len=96.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' \
vad=enabled \
vad_type=silero \
neural_vad_nn.optimization_graph_level=-1 \
neural_vad.filter_speech_first=false \
neural_vad.min_duration_on=0.2 \
neural_vad.min_duration_off=0.5 \
neural_vad.onset=0.85 \
neural_vad.offset=0.3 \
neural_vad.pad_offset=0.08 \
neural_vad.pad_onset=0.3 \
neural_vad.mask_features=false \
endpointing.stop_history=800 \
enable_vad_endpointing=true \
profane_words_file=<txt_profane_words_file> \
name=parakeet-rnnt-1.1b-indic-asr-offline-silero-vad \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
language_code=\'bn-IN,en-US,hi-IN,ta-IN,indic\' \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=8.0 \
left_padding_size=0 \
right_padding_size=0 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
max_batch_size=128 \
nn.opt_batch_size=128 \
endpointing_type=niva \
endpointing.stop_history=0 \
nemo_decoder.use_stateful_decoding=false \
unified_acoustic_model=true \
decoder=nemo
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=800 streaming_diarizer_nn.spkcache_len=312 streaming_diarizer_nn.fifo_len=100.
Voice Activity Detection: VAD is already enabled in the build command above and is recommended for better accuracy. To disable it, remove vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true and <VAD_riva_file>:<key> in source_path from the command.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=cache-aware-parakeet-rnnt-en-US-asr-streaming \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
endpointing.residue_blanks_at_start=-2 \
nemo_decoder.use_stateful_decoding=true \
endpointing_type=niva \
endpointing.stop_history=800 \
endpointing.residue_blanks_at_end=0 \
unified_acoustic_model=true \
feature_extractor_type=torch \
torch_feature_type=nemo \
featurizer.use_streaming_torch_fe=true \
nn.fp16_needs_obey_precision_pass=True \
nn.am_cache_len_input_use_int64=true \
att_context_size='[70,1]' \
max_batch_size=32 \
nn.max_batch_size=32 \
nn.opt_batch_size=32 \
decoder=nemo \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.
Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-tdt-0.6b-en-US-asr-offline \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=16 \
left_padding_size=0.0 \
right_padding_size=0.0 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
featurizer.right_pad_features=true \
max_batch_size=64 \
nn.opt_batch_size=64 \
unified_acoustic_model=true \
endpointing_type=niva \
endpointing.stop_history=0 \
nemo_decoder.use_stateful_decoding=False \
decoder=nemo \
language_code=en-US \
wfst_tokenizer_model=<far_tokenizer_file> \
wfst_verbalizer_model=<far_verbalizer_file> \
speech_hints_model=<far_speech_hints_file>
Note
GPU-based Language Model: Not supported.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: Not supported.
Voice Activity Detection: Not supported.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=parakeet-tdt-0.6b-multi-asr-offline \
featurizer.use_utterance_norm_params=False \
featurizer.precalc_norm_time_steps=0 \
featurizer.precalc_norm_params=False \
ms_per_timestep=80 \
language_code=\'bg-BG,hr-HR,cs-CZ,da-DK,nl-NL,en-GB,et-EE,fi-FI,fr-FR,de-DE,el-GR,hu-HU,it-IT,lv-LV,lt-LT,mt-MT,pl-PL,pt-PT,ro-RO,sk-SK,sl-SI,es-ES,sv-SE,ru-RU,uk-UA,multi\' \
nn.fp16_needs_obey_precision_pass=True \
chunk_size=16 \
left_padding_size=0.0 \
right_padding_size=0.0 \
featurizer.max_batch_size=256 \
featurizer.max_execution_batch_size=256 \
featurizer.right_pad_features=true \
max_batch_size=64 \
nn.opt_batch_size=64 \
unified_acoustic_model=true \
endpointing_type=niva \
endpointing.stop_history=0 \
nemo_decoder.use_stateful_decoding=False \
nn.use_trt_bf16=True \
nn.bf16_needs_obey_precision_pass=True \
decoder=nemo
Note
GPU-based Language Model: Not supported.
nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.
Speaker Diarization: Not supported.
Voice Activity Detection: Not supported.
Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<txt_profane_words_file> \
name=whisper-large-v3-multi-asr-offline \
unified_acoustic_model=true \
chunk_size=30 \
left_padding_size=0 \
right_padding_size=0 \
decoder=trtllm \
feature_extractor_type=torch \
torch_feature_type=whisper \
featurizer.norm_per_feature=false \
max_batch_size=8 \
featurizer.precalc_norm_params=false \
featurizer.max_batch_size=8 \
featurizer.max_execution_batch_size=8 \
language_code=\'en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi\'
riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
output_path=<rmir_filename>:<key> \
'source_path=[<riva_file>:<key>]' \
profane_words_file=<profane_words_file> \
name=canary-1b-multi-asr-offline \
unified_acoustic_model=true \
use_cpp_postprocessing=False \
language_code=\'en-US,ar-AR,bg-BG,ca-ES,cs-CZ,da-DK,de-AT,de-CH,de-DE,el-GR,el-IL,et-EE,en-AM,en-AU,en-CA,en-EU,en-GB,en-IN,en-ME,en-MY,en-PH,en-SA,en-SG,en-UA,en-ZA,es-AR,es-CL,es-ES,es-LA,es-PY,es-UY,es-US,es-MX,fi-FI,fr-BE,fr-CA,fr-CH,fr-FR,he-IL,hi-IN,hu-HU,hr-HR,id-ID,it-IT,it-CH,lt-LT,lv-LV,ja-JP,km-KH,ko-KR,my-MM,nb-NO,nn-NO,nl-NL,nl-BE,nn-NB,pl-PL,pt-BR,pt-PT,ro-RO,ru-AM,ru-RU,ru-UA,sk-SK,sl-SI,sv-SE,th-TH,tr-TR,uk-UA,vi-VN,zh-CN,zh-TW\' \
chunk_size=30 \
left_padding_size=0 \
right_padding_size=0 \
feature_extractor_type=torch \
torch_feature_type=nemo \
max_batch_size=8 \
featurizer.use_utterance_norm_params=false \
featurizer.precalc_norm_params=false \
featurizer.max_batch_size=128 \
featurizer.max_execution_batch_size=128 \
ms_per_timestep=80 \
share_flags=true \
featurizer.norm_per_feature=false \
decoder=trtllm \
trtllm_decoder.max_output_len=200 \
trtllm_decoder.decoupled_mode=true