Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

NVIDIA NeMo Framework Developer Docs#

NVIDIA NeMo Framework is an end-to-end, cloud-native framework designed to build, customize, and deploy generative AI models anywhere.

NVIDIA NeMo Framework supports large-scale training features, including:

Mixed Precision Training
Parallelism
Distributed Optimizer
Fully Sharded Data Parallel (FSDP)
Flash Attention
Activation Recomputation
Positional Embeddings and Positional Interpolation
Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) with TensorRT Model Optimizer
Knowledge Distillation-based training with TensorRT Model Optimizer
Sequence Packing

NVIDIA NeMo Framework has separate collections for:

Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new generative AI model architectures.

For quick guides and tutorials, see the “Getting started” section below.

Getting Started

For more information, browse the developer docs for your area of interest in the contents section below or on the left sidebar.

Key Optimizations

Model Checkpoints

Checkpoints

APIs

NeMo APIs

Collections

NeMo Collections

Speech AI Tools

Speech AI Tools