For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
      • Overview
        • Overview
        • NeMo ASR Models
      • Text Integration
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Model Selection
  • Example Model Usage
  • Basic Usage
  • Simple ASR Inference
  • Custom Configuration
  • Model Caching
  • Resource Configuration
Curate AudioProcess DataASR Inference

NeMo ASR Models

||View as Markdown|
Previous

Overview

Next

Overview

Use NeMo Framework’s automatic speech recognition models for transcription in your audio curation pipelines. This guide covers basic usage and configuration.

Model Selection

NeMo Framework provides pre-trained ASR models through the Hugging Face model hub. For the complete list of available models and their specifications, refer to the NeMo Framework ASR documentation.

Example Model Usage

1# Example using a test-verified model
2example_model = "nvidia/parakeet-tdt-0.6b-v2"
3
4# For production use, select appropriate models from:
5# https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html

Basic Usage

Simple ASR Inference

1from nemo_curator.stages.audio.inference.asr_nemo import InferenceAsrNemoStage
2from nemo_curator.stages.resources import Resources
3
4# Create ASR inference stage with a model from NeMo Framework
5asr_stage = InferenceAsrNemoStage(
6 model_name="your_chosen_model_name", # Select from NeMo Framework docs
7 filepath_key="audio_filepath",
8 pred_text_key="pred_text"
9)
10
11# Configure for GPU processing
12asr_stage = asr_stage.with_(
13 resources=Resources(gpus=1.0),
14 batch_size=16
15)

Custom Configuration

1# Example with custom field names
2custom_asr = InferenceAsrNemoStage(
3 model_name="your_chosen_model_name",
4 filepath_key="custom_audio_path",
5 pred_text_key="transcription"
6).with_(
7 batch_size=32,
8 resources=Resources(cpus=4.0, gpus=1.0)
9)

Model Caching

Models are automatically downloaded and cached when first loaded:

1# Models are cached automatically on first use
2asr_stage = InferenceAsrNemoStage(model_name="your_chosen_model_name")
3
4# The setup() method handles model downloading and caching
5asr_stage.setup()

Resource Configuration

Configure GPU and CPU resources based on your hardware:

1from nemo_curator.stages.resources import Resources
2
3# Single GPU configuration
4asr_stage = InferenceAsrNemoStage(
5 model_name="your_chosen_model_name"
6).with_(
7 resources=Resources(
8 cpus=4.0,
9 gpu_memory_gb=8.0 # Adjust based on your model's requirements
10 ),
11 batch_size=16
12)
13
14# Multi-GPU configuration
15multi_gpu_stage = InferenceAsrNemoStage(
16 model_name="your_chosen_model_name"
17).with_(
18 resources=Resources(
19 cpus=8.0,
20 gpus=2.0 # Use 2 GPUs
21 ),
22 batch_size=32
23)

Resource requirements vary by model. Test with your specific model to determine optimal settings.