NeMo ASR Models

Use NeMo Framework’s automatic speech recognition models for transcription in your audio curation pipelines. This guide covers basic usage and configuration.

Model Selection

NeMo Framework provides pre-trained ASR models through the Hugging Face model hub. For the complete list of available models and their specifications, refer to the NeMo Framework ASR documentation.

Example Model Usage

1 # Example using a test-verified model
2 example_model = "nvidia/parakeet-tdt-0.6b-v2"
3 
4 # For production use, select appropriate models from:
5 # https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html

Basic Usage

Simple ASR Inference

1 from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
2 from nemo_curator.stages.resources import Resources
3 
4 # Create ASR inference stage with a model from NeMo Framework
5 asr_stage = InferenceAsrNemoStage(
6     model_name="your_chosen_model_name",  # Select from NeMo Framework docs
7     filepath_key="audio_filepath",
8     pred_text_key="pred_text"
9 )
10 
11 # Configure for GPU processing
12 asr_stage = asr_stage.with_(
13     resources=Resources(gpus=1.0),
14     batch_size=16
15 )

Custom Configuration

1 # Example with custom field names
2 custom_asr = InferenceAsrNemoStage(
3     model_name="your_chosen_model_name",
4     filepath_key="custom_audio_path",
5     pred_text_key="transcription"
6 ).with_(
7     batch_size=32,
8     resources=Resources(cpus=4.0, gpus=1.0)
9 )

Model Caching

Models are automatically downloaded and cached when first loaded:

1 # Models are cached automatically on first use
2 asr_stage = InferenceAsrNemoStage(model_name="your_chosen_model_name")
3 
4 # The setup() method handles model downloading and caching
5 asr_stage.setup()

Resource Configuration

Configure GPU and CPU resources based on your hardware:

1 from nemo_curator.stages.resources import Resources
2 
3 # Single GPU configuration
4 asr_stage = InferenceAsrNemoStage(
5     model_name="your_chosen_model_name"
6 ).with_(
7     resources=Resources(
8         cpus=4.0,
9         gpu_memory_gb=8.0  # Adjust based on your model's requirements
10     ),
11     batch_size=16
12 )
13 
14 # Multi-GPU configuration
15 multi_gpu_stage = InferenceAsrNemoStage(
16     model_name="your_chosen_model_name"
17 ).with_(
18     resources=Resources(
19         cpus=8.0,
20         gpus=2.0  # Use 2 GPUs
21     ),
22     batch_size=32
23 )

Resource requirements vary by model. Test with your specific model to determine optimal settings.