Skip to main content
Ctrl+K
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
NVIDIA NeMo Framework User Guide - Home

NVIDIA NeMo Framework User Guide

NVIDIA NeMo Framework User Guide - Home

NVIDIA NeMo Framework User Guide

Table of Contents

NeMo Framework

  • Overview
  • Install NeMo Framework
  • Performance
  • Why NeMo Framework?

Getting Started

  • Quickstart with NeMo-Run
  • Quickstart with NeMo 2.0 API
  • Tutorials

Developer Guides

  • Migration Guide
    • Pre-Training
    • SFT Training and Inference
    • PEFT Training and Inference
    • Trainer Configuration
    • Precision Configuration
    • Parallelisms
    • Experiment Manager
    • Checkpointing Configurations
    • Optimizer Configuration
    • Data Configuration
    • Nsys Profiling
    • Tokenizers
  • Feature Guide
    • The Bridge Between Lightning and Megatron Core
    • Logging and Checkpointing
    • Serialization
    • Parameter Efficient Fine-Tuning (PEFT)
    • Hugging Face Integration
    • Profiling
  • Best Practices

Training and Customization

  • Long Context Training
    • Context Parallelism
  • Optimal Configuration with Auto Configurator
  • Parameter-Efficient Fine-tuning (PEFT)
    • Supported PEFT Methods
    • A Comparison of Performant and Canonical LoRA Variants
  • Sequence Packing
  • Resiliency
  • Continual Training
  • Custom Datasets
    • Pre-Training Data Module
    • Fine-Tuning Data Module

NeMo AutoModel

  • Overview
  • Parameter-Efficient Fine-tuning (PEFT)
  • Supervised Fine-tuning (SFT)
  • Vision Language Models with NeMo AutoModel

Model Optimization

  • Quantization
  • Pruning
  • Distillation

Models

  • Large Language Models
    • Baichuan 2
    • ChatGLM 3
    • DeepSeek V2
    • DeepSeek V3
    • Gemma
    • Gemma 2
    • Hyena
    • Llama 3
    • Llama Nemotron
    • Mamba 2
    • Mixtral
    • Nemotron
    • Phi 3
    • Qwen2/2.5
    • Starcoder
    • Starcoder 2
    • T5
    • BERT
  • Vision Language Models
    • NeVA (LLaVA)
    • LLaVA-Next
    • Llama 3.2 Vision Models
    • Llama 4 Models
    • Qwen2-VL
    • Data Preparation to Use Megatron-Energon Dataloader
    • CLIP
  • Speech AI Models
  • Diffusion Models
    • Flux
    • Diffusion Training Framework
  • Embedding Models
    • SBERT
    • Llama Embedding
    • Exporting Llama Embedding To ONNX and TensorRT

Deploy Models

  • Overview
  • Large Language Models
    • Deploy NeMo Models Using NIM LLM Containers
    • Deploy NeMo Models by Exporting to Inference Optimized Libraries
      • Deploy NeMo Models by Exporting TensorRT-LLM
      • Deploy NeMo Models by Exporting vLLM
    • Deploy NeMo Models in the Framework
    • Send Queries to the NVIDIA Triton Server for NeMo LLMs
  • Multimodal Models

Library Documentation

  • Overview
  • NeMo
    • Introduction
    • NeMo Fundamentals
    • Tutorials
    • Mixed Precision Training
    • Parallelisms
    • Mixture of Experts
    • Optimizations
      • Attention Optimizations
      • Activation Recomputation
      • Communication Overlap
      • CPU Offloading
    • Checkpoints
      • NeMo Distributed Checkpoint User Guide
      • Converting from Megatron-LM
    • Evaluate NeMo 2.0 Checkpoints
    • NeMo APIs
      • NeMo Models
      • Neural Modules
      • Experiment Manager
      • Neural Types
      • Exporting NeMo Models
      • Adapters
        • Adapter Components
        • Adapters API
      • NeMo Core APIs
      • NeMo Common Collection API
        • Callbacks
        • Losses
        • Metrics
        • Tokenizers
        • Data
        • S3 Checkpointing
      • NeMo ASR API
      • NeMo TTS API
    • NeMo Collections
      • Large Language Models
        • GPT Model Training
        • Batching
        • Positional embeddings
        • Megatron Core Customization
        • Reset Learning Rate
        • Ramp Up Batch Size
      • Machine Translation Models
      • Automatic Speech Recognition (ASR)
        • Models
        • Datasets
        • ASR Language Modeling and Customization
        • Checkpoints
        • Scores
        • NeMo ASR Configuration Files
        • NeMo ASR API
        • All Checkpoints
        • Example With MCV
      • Speech Classification
        • Models
        • Datasets
        • Checkpoints
        • NeMo Speech Classification Configuration Files
        • Resource and Documentation Guide
      • Speaker Recognition (SR)
        • Models
        • NeMo Speaker Recognition Configuration Files
        • Datasets
        • Checkpoints
        • NeMo Speaker Recognition API
        • Resource and Documentation Guide
      • Speaker Diarization
        • Models
        • Datasets
        • Checkpoints
        • End-to-End Speaker Diarization Configuration Files
        • NeMo Speaker Diarization API
        • Resource and Documentation Guide
      • Speech Self-Supervised Learning
        • Models
        • Datasets
        • Checkpoints
        • NeMo SSL Configuration Files
        • NeMo SSL collection API
        • Resources and Documentation
      • Speech Intent Classification and Slot Filling
        • Models
        • Datasets
        • Checkpoints
        • NeMo Speech Intent Classification and Slot Filling Configuration Files
        • NeMo Speech Intent Classification and Slot Filling collection API
        • Resources and Documentation
      • Text-to-Speech (TTS)
        • Models
        • Data Preprocessing
        • Checkpoints
        • NeMo TTS Configuration Files
        • Grapheme-to-Phoneme Models
      • Speech and Audio Processing
        • Models
        • Datasets
        • Checkpoints
        • NeMo Audio Configuration Files
        • NeMo Audio API
    • Speech AI Tools
      • NeMo Forced Aligner (NFA)
      • Dataset Creation Tool Based on CTC-Segmentation
      • Speech Data Explorer
      • Comparison tool for ASR Models
      • ASR Evaluator
      • Speech Data Processor
      • (Inverse) Text Normalization
        • WFST-based (Inverse) Text Normalization
        • Neural Models for (Inverse) Text Normalization
  • NeMo Aligner
    • Obtain a Pretrained Model
    • Supervised Fine-Tuning (SFT) with Knowledge Distillation
    • Model Alignment by REINFORCE
    • Model Alignment by DPO, RPO, and IPO
    • Model Alignment by RLHF
    • Model Alignment by SteerLM Method
    • SteerLM 2.0: Iterative Training for Attribute-Conditioned Language Model Alignment
    • Model Alignment by Rejection Sampling
    • Model Alignment by Self-Play Fine-Tuning (SPIN)
    • Fine-Tuning Stable Diffusion with DRaFT+
    • Constitutional AI: Harmlessness from AI Feedback
  • NeMo Curator
    • Text Curation
      • Download and Extract Text
      • Working with DocumentDataset
      • CPU and GPU Modules with Dask
      • Classifier and Heuristic Quality Filtering
      • Language Identification and Unicode Fixing
      • Stop Words in Text Processing
      • GPU Accelerated Exact and Fuzzy Deduplication
      • Semantic Deduplication
      • Synthetic Data Generation
      • Downstream Task Decontamination/Deduplication
      • PII Identification and Removal
      • Distributed Data Classification
    • Image Curation
      • Get Started
      • Image-Text Pair Datasets
      • Aesthetic Classifier
      • NSFW Classifier
      • Semantic Deduplication
    • Reference
      • Running NeMo Curator on Kubernetes
      • Reading and writing datasets with NeMo Curator and Apache Spark
      • Best Practices
      • Next Steps
      • API Reference
        • Dask Cluster Functions
        • Datasets
        • Download and Extract
        • Filters
        • Classifiers
        • Modifiers
        • Deduplication
        • Task Decontamination
        • LLM Services
        • Synthetic Data
        • Image Curation
        • Miscellaneous
  • NeMo Run
    • API Reference
      • Configuration
      • Execution
      • Management
    • Guides
      • Configure NeMo-Run
      • Execute NeMo Run
      • Management
      • Why should I use NemoRun?
    • Frequently Asked Questions

Releases

  • Software Component Versions
  • Changelog
  • Known Issues
  • Text to Image Models

Text to Image Models#

NeMo multimodal provides implementations of multiple image-to-text models, including Stable Diffusion, Imagen, DreamBooth, ControlNet, and InstructPix2Pix. Please refer to NeMo Framework User Guide for Multimodal Models for detailed support information.

  • Datasets
  • Common Configuration Files
  • Checkpoints
  • Stable Diffusion
  • Imagen
  • DreamBooth
  • ControlNet
  • InstructPix2Pix
  • Stable Diffusion XL Int8 Quantization
NVIDIA NVIDIA
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2023-2025, NVIDIA Corporation.

Last updated on May 01, 2025.