ASR Inference
Perform automatic speech recognition (ASR) on audio files using NeMo Framework models. The ASR inference stage transcribes audio into text, enabling downstream quality assessment and text processing workflows.
How it Works
The InferenceAsrNemoStage processes AudioBatch objects by:
- Input Validation: Verifies required attributes and data structure
- Model Loading: Downloads and initializes NeMo ASR models on GPU or CPU
- Batch Processing: Groups audio files for efficient inference
- Transcription: Generates text predictions for each audio file
- Output Creation: Returns
AudioBatchwith original data plus predicted transcriptions
Basic Usage
Simple ASR Inference
Multilingual ASR
Configuration Options
Model Selection
NeMo Framework provides ready-to-use ASR models for several languages and domains:
Resource Configuration
Batch Processing
batch_size controls the number of tasks the executor groups per call. The ASR stage does not define process_batch(); the executor batches tasks.
Within a single AudioBatch, process() transcribes the file paths together.
Input Requirements
AudioBatch Format
Data loading stages create input AudioBatch objects that must contain:
Audio File Requirements
- Supported Formats: Determined by the selected NeMo ASR model; refer to the NeMo ASR documentation.
- Sample Rates: Typically 16 kHz; refer to the model card for details.
- Channels: Mono or stereo; channel handling (for example, down-mixing) depends on the model.
- Duration: Long files can require manual chunking before inference.
Output Structure
The ASR stage adds predicted transcriptions to each audio sample:
Error Handling
Model Loading Errors
Processing Errors
Processing behavior:
- Input structure validation: The stage uses
validate_input()to check required attributes/columns and raisesValueErrorif they are missing. - Model loading failures:
setup()raisesRuntimeErrorif model download or initialization fails. - No automatic retries or auto-tuning: The stage does not perform automatic batch size reduction or network retries.
- Missing files:
AudioBatch.validate()can log file-existence warnings when code creates tasks; the stage does not auto-skip files.