NVIDIA NeMo Speech Developer Docs#

NVIDIA NeMo Speech is an open-source toolkit for speech, audio, and multimodal language model research, with a clear path from experimentation to production deployment.

What is NeMo?#

NVIDIA NeMo is an open-source toolkit for building, customizing, and deploying speech, audio, and multimodal language models. It provides:

  • Pretrained models — production-ready checkpoints on NGC and HuggingFace Hub

  • Modular architecture — neural modules you can mix, match, and extend

  • Scalable training — multi-GPU/multi-node via PyTorch Lightning with mixed-precision support

  • Simple configuration — YAML-based experiment configs with Hydra

Get started in 30 seconds:

pip install nemo_toolkit[asr,tts]
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
print(model.transcribe(["audio.wav"])[0].text)

Model Checkpoints

APIs

Collections

Speech AI Tools