Curate AudioProcess DataASR Inference

NeMo ASR Models

View as Markdown

Use NeMo Framework’s automatic speech recognition models for transcription in your audio curation pipelines. This guide covers basic usage and configuration.

Model Selection

NeMo Framework provides pre-trained ASR models through the Hugging Face model hub. For the complete list of available models and their specifications, refer to the NeMo Framework ASR documentation.

Example Model Usage

1# Example using a test-verified model
2example_model = "nvidia/parakeet-tdt-0.6b-v2"
3
4# For production use, select appropriate models from:
5# https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html

Basic Usage

Simple ASR Inference

1from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
2from nemo_curator.stages.resources import Resources
3
4# Create ASR inference stage with a model from NeMo Framework
5asr_stage = InferenceAsrNemoStage(
6 model_name="your_chosen_model_name", # Select from NeMo Framework docs
7 filepath_key="audio_filepath",
8 pred_text_key="pred_text"
9)
10
11# Configure for GPU processing
12asr_stage = asr_stage.with_(
13 resources=Resources(gpus=1.0),
14 batch_size=16
15)

Custom Configuration

1# Example with custom field names
2custom_asr = InferenceAsrNemoStage(
3 model_name="your_chosen_model_name",
4 filepath_key="custom_audio_path",
5 pred_text_key="transcription"
6).with_(
7 batch_size=32,
8 resources=Resources(cpus=4.0, gpus=1.0)
9)

Model Caching

Models are automatically downloaded and cached when first loaded:

1# Models are cached automatically on first use
2asr_stage = InferenceAsrNemoStage(model_name="your_chosen_model_name")
3
4# The setup() method handles model downloading and caching
5asr_stage.setup()

Resource Configuration

Configure GPU and CPU resources based on your hardware:

1from nemo_curator.stages.resources import Resources
2
3# Single GPU configuration
4asr_stage = InferenceAsrNemoStage(
5 model_name="your_chosen_model_name"
6).with_(
7 resources=Resources(
8 cpus=4.0,
9 gpu_memory_gb=8.0 # Adjust based on your model's requirements
10 ),
11 batch_size=16
12)
13
14# Multi-GPU configuration
15multi_gpu_stage = InferenceAsrNemoStage(
16 model_name="your_chosen_model_name"
17).with_(
18 resources=Resources(
19 cpus=8.0,
20 gpus=2.0 # Use 2 GPUs
21 ),
22 batch_size=32
23)

Resource requirements vary by model. Test with your specific model to determine optimal settings.