<no title>

Parakeet-0.6B-CTC

Parakeet-1.1B-CTC

English (US)

Streaming, low-latency

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
   output_path=<rmir_filename>:<key> \
   'source_path=[<riva_file>:<key>]' \
   profane_words_file=<txt_profane_words_file> \
   name=parakeet-1.1b-en-US-asr-streaming \
   featurizer.use_utterance_norm_params=False \
   featurizer.precalc_norm_time_steps=0 \
   featurizer.precalc_norm_params=False \
   ms_per_timestep=80 \
   endpointing.residue_blanks_at_start=-2 \
   nn.fp16_needs_obey_precision_pass=True \
   chunk_size=0.16 \
   left_padding_size=1.92 \
   right_padding_size=1.92 \
   decoder_chunk_size=0.96 \
   decoder=flashlight \
   flashlight_decoder.asr_model_delay=-1 \
   decoding_language_model_binary=<bin_file> \
   decoding_vocab=<txt_decoding_vocab_file> \
   flashlight_decoder.lm_weight=0.8 \
   flashlight_decoder.word_insertion_score=1.0 \
   flashlight_decoder.beam_size=32 \
   flashlight_decoder.beam_threshold=20. \
   flashlight_decoder.num_tokenization=1 \
   language_code=en-US \
   wfst_tokenizer_model=<far_tokenizer_file> \
   wfst_verbalizer_model=<far_verbalizer_file> \
   speech_hints_model=<far_speech_hints_file>

Note

Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.

nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.

FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.

Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.

Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.

Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>

Streaming, high-throughput

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
   output_path=<rmir_filename>:<key> \
   'source_path=[<riva_file>:<key>]' \
   profane_words_file=<txt_profane_words_file> \
   name=parakeet-1.1b-en-US-asr-streaming-throughput \
   featurizer.use_utterance_norm_params=False \
   featurizer.precalc_norm_time_steps=0 \
   featurizer.precalc_norm_params=False \
   ms_per_timestep=80 \
   endpointing.residue_blanks_at_start=-2 \
   nn.fp16_needs_obey_precision_pass=True \
   chunk_size=0.96 \
   left_padding_size=1.92 \
   right_padding_size=1.92 \
   decoder=flashlight \
   flashlight_decoder.asr_model_delay=-1 \
   decoding_language_model_binary=<bin_file> \
   decoding_vocab=<txt_decoding_vocab_file> \
   flashlight_decoder.lm_weight=0.8 \
   flashlight_decoder.word_insertion_score=1.0 \
   flashlight_decoder.beam_size=32 \
   flashlight_decoder.beam_threshold=20. \
   flashlight_decoder.num_tokenization=1 \
   language_code=en-US \
   wfst_tokenizer_model=<far_tokenizer_file> \
   wfst_verbalizer_model=<far_verbalizer_file> \
   speech_hints_model=<far_speech_hints_file>

Note

Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.

nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.

FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.

Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.96 streaming_diarizer.right_context_size=0.32 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=256 streaming_diarizer_nn.fifo_len=112.

Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.

Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>

Offline

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
   output_path=<rmir_filename>:<key> \
   'source_path=[<riva_file>:<key>]' \
   profane_words_file=<txt_profane_words_file> \
   name=parakeet-1.1b-en-US-asr-offline \
   featurizer.use_utterance_norm_params=False \
   featurizer.precalc_norm_time_steps=0 \
   featurizer.precalc_norm_params=False \
   ms_per_timestep=80 \
   nn.fp16_needs_obey_precision_pass=True \
   chunk_size=4.8 \
   left_padding_size=1.6 \
   right_padding_size=1.6 \
   featurizer.max_batch_size=256 \
   featurizer.max_execution_batch_size=256 \
   decoder=flashlight \
   flashlight_decoder.asr_model_delay=-1 \
   decoding_language_model_binary=<bin_file> \
   decoding_vocab=<txt_decoding_vocab_file> \
   flashlight_decoder.lm_weight=0.8 \
   flashlight_decoder.word_insertion_score=1.0 \
   flashlight_decoder.beam_size=32 \
   flashlight_decoder.beam_threshold=20. \
   flashlight_decoder.num_tokenization=1 \
   language_code=en-US \
   wfst_tokenizer_model=<far_tokenizer_file> \
   wfst_verbalizer_model=<far_verbalizer_file> \
   speech_hints_model=<far_speech_hints_file>

Note

Greedy Mode: For deploying the model in greedy mode, replace the flashlight_decoder related parameters from the above command and add decoder=greedy greedy_decoder.asr_model_delay=-1.

nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:onnx,onnx_opset:19,max_dim:1000}}]' in above command.

FP8 Quantization: To deploy the model with FP8 precision, add nn.use_trt_fp8=True to the command above. FP8 is supported only on GPUs with compute capability 8.9 or higher.

Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=4.8 streaming_diarizer.right_context_size=0 streaming_diarizer_nn.chunk_len=480 streaming_diarizer_nn.spkcache_len=332 streaming_diarizer_nn.fifo_len=120.

Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.

Punctuation and Capitalization: PnC models require a separate RMIR file. Generate the RMIR file using the following command: riva-build --config-path=pkg://servicemaker.configs.punctuation --config-name=base output_path=<rmir_filename>:<key> 'source_path=[<riva_file>:<key>]' name=riva-punctuation language_code=<language_code>

Parakeet-1.1B-RNNT

Nemotron ASR Streaming

English (US)

Streaming, low-latency

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=streaming \
   output_path=<rmir_filename>:<key> \
   'source_path=[<riva_file>:<key>]' \
   profane_words_file=<txt_profane_words_file> \
   name=cache-aware-parakeet-rnnt-en-US-asr-streaming \
   featurizer.use_utterance_norm_params=False \
   featurizer.precalc_norm_time_steps=0 \
   featurizer.precalc_norm_params=False \
   ms_per_timestep=80 \
   endpointing.residue_blanks_at_start=-2 \
   nemo_decoder.use_stateful_decoding=true \
   endpointing_type=niva \
   endpointing.stop_history=800 \
   endpointing.residue_blanks_at_end=0 \
   unified_acoustic_model=true \
   feature_extractor_type=torch \
   torch_feature_type=nemo \
   featurizer.use_streaming_torch_fe=true \
   nn.fp16_needs_obey_precision_pass=True \
   nn.am_cache_len_input_use_int64=true \
   att_context_size='[70,1]' \
   max_batch_size=32 \
   nn.max_batch_size=32 \
   nn.opt_batch_size=32 \
   decoder=nemo \
   language_code=en-US \
   wfst_tokenizer_model=<far_tokenizer_file> \
   wfst_verbalizer_model=<far_verbalizer_file> \
   speech_hints_model=<far_speech_hints_file>

Note

GPU-based Language Model: To deploy with a GPU-LM, add the following parameters: nemo_decoder.language_model_alpha=0.5 nemo_decoder.language_model_file=<GPU_LM.nemo file>. For training instructions, see (nvidia-riva/tutorials).

nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.

Speaker Diarization: To enable speaker diarization, include the Sortformer Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<sortformer_riva_file>:<key>]' and add the following parameters: sortformer=enabled diarizer_type=sortformer streaming_diarizer.center_chunk_size=0.64 streaming_diarizer.right_context_size=0.64 streaming_diarizer_nn.chunk_len=128 streaming_diarizer_nn.spkcache_len=160 streaming_diarizer_nn.fifo_len=80.

Voice Activity Detection: To enable VAD for improved noise robustness, include the Silero VAD Riva file in the source_path in the build command: 'source_path=[<riva_file>:<key>,<VAD_riva_file>:<key>]' and add the following parameters: vad=enabled vad_type=silero neural_vad_nn.optimization_graph_level=-1 neural_vad.filter_speech_first=false neural_vad.min_duration_on=0.2 neural_vad.min_duration_off=0.5 neural_vad.onset=0.85 neural_vad.offset=0.3 neural_vad.pad_offset=0.08 neural_vad.pad_onset=0.3 neural_vad.mask_features=false enable_vad_endpointing=true.

Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.

Parakeet-0.6B-TDT

English (US)

Offline

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
   output_path=<rmir_filename>:<key> \
   'source_path=[<riva_file>:<key>]' \
   profane_words_file=<txt_profane_words_file> \
   name=parakeet-tdt-0.6b-en-US-asr-offline \
   featurizer.use_utterance_norm_params=False \
   featurizer.precalc_norm_time_steps=0 \
   featurizer.precalc_norm_params=False \
   ms_per_timestep=80 \
   nn.fp16_needs_obey_precision_pass=True \
   chunk_size=16 \
   left_padding_size=0.0 \
   right_padding_size=0.0 \
   featurizer.max_batch_size=256 \
   featurizer.max_execution_batch_size=256 \
   featurizer.right_pad_features=true \
   max_batch_size=64 \
   nn.opt_batch_size=64 \
   unified_acoustic_model=true \
   endpointing_type=niva \
   endpointing.stop_history=0 \
   nemo_decoder.use_stateful_decoding=False \
   decoder=nemo \
   language_code=en-US \
   wfst_tokenizer_model=<far_tokenizer_file> \
   wfst_verbalizer_model=<far_verbalizer_file> \
   speech_hints_model=<far_speech_hints_file>

Note

GPU-based Language Model: Not supported.

nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.

Speaker Diarization: Not supported.

Voice Activity Detection: Not supported.

Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.

Multilingual

Offline

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
   output_path=<rmir_filename>:<key> \
   'source_path=[<riva_file>:<key>]' \
   profane_words_file=<txt_profane_words_file> \
   name=parakeet-tdt-0.6b-multi-asr-offline \
   featurizer.use_utterance_norm_params=False \
   featurizer.precalc_norm_time_steps=0 \
   featurizer.precalc_norm_params=False \
   ms_per_timestep=80 \
   language_code=\'bg-BG,hr-HR,cs-CZ,da-DK,nl-NL,en-GB,et-EE,fi-FI,fr-FR,de-DE,el-GR,hu-HU,it-IT,lv-LV,lt-LT,mt-MT,pl-PL,pt-PT,ro-RO,sk-SK,sl-SI,es-ES,sv-SE,ru-RU,uk-UA,multi\' \
   nn.fp16_needs_obey_precision_pass=True \
   chunk_size=16 \
   left_padding_size=0.0 \
   right_padding_size=0.0 \
   featurizer.max_batch_size=256 \
   featurizer.max_execution_batch_size=256 \
   featurizer.right_pad_features=true \
   max_batch_size=64 \
   nn.opt_batch_size=64 \
   unified_acoustic_model=true \
   endpointing_type=niva \
   endpointing.stop_history=0 \
   nemo_decoder.use_stateful_decoding=False \
   nn.use_trt_bf16=True \
   nn.bf16_needs_obey_precision_pass=True \
   decoder=nemo

Note

GPU-based Language Model: Not supported.

nemo2riva: For using .nemo checkpoint instead of .riva, replace source_path=[<riva_file>:<key>] with 'source_path=[{path: <path to .nemo checkpoint>, nemo2riva: {format:nemo}}]' in above command.

Speaker Diarization: Not supported.

Voice Activity Detection: Not supported.

Punctuation and Capitalization: No separate model required, the ASR model automatically generates punctuated and capitalized text.

Whisper-Large

Multilingual

Offline

trtllm

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
  output_path=<rmir_filename>:<key> \
  'source_path=[<riva_file>:<key>]' \
  profane_words_file=<txt_profane_words_file> \
  name=whisper-large-v3-multi-asr-offline \
  unified_acoustic_model=true \
  chunk_size=30 \
  left_padding_size=0 \
  right_padding_size=0 \
  decoder=trtllm \
  feature_extractor_type=torch \
  torch_feature_type=whisper \
  featurizer.norm_per_feature=false \
  max_batch_size=8 \
  featurizer.precalc_norm_params=false \
  featurizer.max_batch_size=8 \
  featurizer.max_execution_batch_size=8 \
  language_code=\'en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi\'

Canary-1B

Multilingual

Offline

nemo

riva-build --config-path=pkg://servicemaker.configs.asr --config-name=offline \
  output_path=<rmir_filename>:<key> \
  'source_path=[<riva_file>:<key>]' \
  profane_words_file=<profane_words_file> \
  name=canary-1b-multi-asr-offline \
  unified_acoustic_model=true \
  use_cpp_postprocessing=False \
  language_code=\'en-US,ar-AR,bg-BG,ca-ES,cs-CZ,da-DK,de-AT,de-CH,de-DE,el-GR,el-IL,et-EE,en-AM,en-AU,en-CA,en-EU,en-GB,en-IN,en-ME,en-MY,en-PH,en-SA,en-SG,en-UA,en-ZA,es-AR,es-CL,es-ES,es-LA,es-PY,es-UY,es-US,es-MX,fi-FI,fr-BE,fr-CA,fr-CH,fr-FR,he-IL,hi-IN,hu-HU,hr-HR,id-ID,it-IT,it-CH,lt-LT,lv-LV,ja-JP,km-KH,ko-KR,my-MM,nb-NO,nn-NO,nl-NL,nl-BE,nn-NB,pl-PL,pt-BR,pt-PT,ro-RO,ru-AM,ru-RU,ru-UA,sk-SK,sl-SI,sv-SE,th-TH,tr-TR,uk-UA,vi-VN,zh-CN,zh-TW\' \
  chunk_size=30 \
  left_padding_size=0 \
  right_padding_size=0 \
  feature_extractor_type=torch \
  torch_feature_type=nemo \
  max_batch_size=8 \
  featurizer.use_utterance_norm_params=false \
  featurizer.precalc_norm_params=false \
  featurizer.max_batch_size=128 \
  featurizer.max_execution_batch_size=128 \
  ms_per_timestep=80 \
  share_flags=true \
  featurizer.norm_per_feature=false \
  decoder=trtllm \
  trtllm_decoder.max_output_len=200 \
  trtllm_decoder.decoupled_mode=true