nemo_curator.stages.audio.inference.asr_nemo
nemo_curator.stages.audio.inference.asr_nemo
Module Contents
Classes
API
Dataclass
Bases: ProcessingStage[AudioTask, AudioTask]
Speech recognition inference using a NeMo ASR model.
Overrides process_batch for batched GPU inference.
Parameters:
model_name
Pretrained NeMo ASR model name. See full list at https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html
cache_dir
Optional directory for model download cache. When set, NeMo stores/loads the pretrained checkpoint here instead of the default cache location.
filepath_key
Key in the entry dict pointing to the audio file.
pred_text_key
Key where the predicted transcription is stored.
asr_model
batch_size
cache_dir
filepath_key
model_name
name
pred_text_key
resources