Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

NVIDIA NeMo Framework Developer Docs

NVIDIA NeMo Framework is an end-to-end, cloud-native framework designed to build, customize, and deploy generative AI models anywhere.

NVIDIA NeMo Framework supports large-scale training features, including:

  • Mixed Precision Training

  • Parallelism

  • Distributed Optimizer

  • Fully Sharded Data Parallel (FSDP)

  • Flash Attention

  • Activation Recomputation

  • Positional Embeddings and Positional Interpolation

  • Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) with TensorRT Model Optimizer

  • Sequence Packing

NVIDIA NeMo Framework has separate collections for:

Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new generative AI model architectures.

For quick guides and tutorials, see the “Getting started” section below.

For more information, browse the developer docs for your area of interest in the contents section below or on the left sidebar.

Model Checkpoints

APIs

Collections

Speech AI Tools