🚀 NeMo AutoModel#

📖 Documentation • 🔥 Ready-to-Use Recipes • 💡 Examples • 🤝 Contributing

✨ What is NeMo AutoModel?#

NeMo AutoModel is an NVIDIA-developed library that delivers a high-performance, easy-to-use solution for fine-tuning and pretraining large language models (LLMs) and vision-language models (VLMs) directly from the Hugging Face Hub. It provides true Day-0 compatibility with any Hugging Face model, so you can start using models immediately without conversion or setup delays.

Start fine-tuning models instantly, scale effortlessly with PyTorch-native data/model parallelism, optimized custom kernels, and memory-efficient recipes-all while preserving the original checkpoint format for seamless use across the Hugging Face ecosystem.

🎛️ Supported Models#

NeMo AutoModel provides native support for a wide range of models available on the Hugging Face Hub, enabling efficient fine-tuning for various domains.

Large Language Models#

LLaMA Family: LLaMA 3, LLaMA 3.1, LLaMA 3.2, Code Llama
QWen Family: QWen3, QWen2.5, Qwen2
Gemma Family: Gemma2, Gemma3
Phi Family: Phi2, Phi3, Phi4
And more: Any causal LM on Hugging Face Hub!

Vision-Language Models#

Qwen2.5-VL: All variants (3B, 7B, 72B)
Gemma-3-VL: 3B and other variants

📋 Ready-to-Use Recipes#

To get started quickly, NeMo AutoModel provides a collection of ready-to-use recipes for common LLM and VLM fine-tuning tasks. Simply select the recipe that matches your model and training setup (e.g., single-GPU, multi-GPU, or multi-node).

Domain	Model ID	Single-GPU	Single-Node	Multi-Node
LLM	`meta-llama/Llama-3.2-1B`	HellaSwag + LoRA	•HellaSwag •SQuAD	HellaSwag + nvFSDP
VLM	`google/gemma-3-4b-it`	CORD-v2 + LoRA	CORD-v2	Coming Soon

Run a Recipe#

To run a NeMo AutoModel recipe, you need a recipe script (e.g., LLM, VLM) and a YAML config file (e.g., LLM, VLM):

# Command invocation format:
uv run <recipe_script_path> --config <yaml_config_path>

# LLM example: multi-GPU with FSDP2
uv run torchrun --nproc-per-node=8 recipes/llm/finetune.py --config recipes/llm/llama_3_2_1b_hellaswag.yaml

# VLM example: single GPU fine-tuning (Gemma-3-VL) with LoRA
uv run recipes/vlm/finetune.py --config recipes/vlm/gemma_3_vl_3b_cord_v2_peft.yaml

🚀 Key Features#

🔥 Day-0 Hugging Face Support: Instantly fine-tune any model from the Hugging Face Hub
⚡ Lightning Fast Performance: Custom CUDA kernels and memory optimizations deliver 2–5× speedups
🌐 Large-Scale Distributed Training: Built-in FSDP2 and nvFSDP for seamless multi-node scaling
👁️ Vision-Language Model Ready: Native support for VLMs (Qwen2-VL, Gemma-3-VL, etc)
🧩 Advanced PEFT Methods: LoRA and extensible PEFT system out of the box
📦 Seamless HF Ecosystem: Fine-tuned models work perfectly with Transformers pipeline, VLM, etc.
⚙️ Robust Infrastructure: Distributed checkpointing with integrated logging and monitoring
🎯 Optimized Recipes: Pre-built configurations for common models and datasets
🔧 Flexible Configuration: YAML-based configuration system for reproducible experiments
⚡ FP8 Precision: Native FP8 training & inference for higher throughput and lower memory use
🔢 INT4 / INT8 Quantization: Turn-key quantization workflows for ultra-compact, low-memory training

⚠️ Note: NeMo AutoModel is under active development. New features, improvements, and documentation updates are released regularly. We are working toward a stable release, so expect the interface to solidify over time. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.

✨ Install NeMo AutoModel#

NeMo AutoModel is offered both as a standard Python package installable via pip and as a ready-to-run NeMo Framework Docker container.

Prerequisites#

# We use `uv` for package management and environment isolation.
pip3 install uv

# If you cannot install at the system level, you can install for your user with
# pip3 install --user uv

Run every command with uv run. It auto-installs the virtual environment from the lock file and keeps it up to date, so you never need to activate a venv manually. Example: uv run recipes/llm/finetune.py. If you prefer to install NeMo Automodel explicitly, please follow the instructions below.

📦 Install from a Wheel Package#

# Install the latest stable release from PyPI
# We first need to initialize the virtual environment using uv
uv venv

uv pip install nemo_automodel   # or: uv pip install --upgrade nemo_automodel

🔧 Install from Source#

# Install the latest NeMo Automodel from the GitHub repo (best for development).
# We first need to initialize the virtual environment using uv
uv venv

# We can now install from source
uv pip install git+https://github.com/NVIDIA-NeMo/Automodel.git

Verify the Installation#

uv run python -c "import nemo_automodel; print('✅ NeMo AutoModel ready')"

📋 YAML Configuration Examples#

1. Distributed Training Configuration#

distributed:
  _target_: nemo_automodel.distributed.nvfsdp.NVFSDPManager
  dp_size: 8
  tp_size: 1
  cp_size: 1

2. LoRA Configuration#

peft:
  peft_fn: nemo_automodel._peft.lora.apply_lora_to_linear_modules
  match_all_linear: True
  dim: 8
  alpha: 32
  use_triton: True

3. Vision-Language Model Fine-Tuning#

model:
  _target_: nemo_automodel._transformers.NeMoAutoModelForImageTextToText.from_pretrained
  pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct

processor:
  _target_: transformers.AutoProcessor.from_pretrained
  pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
  min_pixels: 200704
  max_pixels: 1003520

4. Checkpointing and Resume#

checkpoint:
  enabled: true
  checkpoint_dir: ./checkpoints
  save_consolidated: true      # HF-compatible safetensors
  model_save_format: safetensors

🗂️ Project Structure#

NeMo-Automodel/
├── nemo_automodel/              # Core library
│   ├── _peft/                   # PEFT implementations (LoRA)
│   ├── _transformers/           # HF model integrations  
│   ├── checkpoint/              # Distributed checkpointing
│   ├── datasets/                # Dataset loaders
│   │   ├── llm/                 # LLM datasets (HellaSwag, SQuAD, etc.)
│   │   └── vlm/                 # VLM datasets (CORD-v2, rdr etc.)
│   ├── distributed/             # FSDP2, nvFSDP, parallelization
│   ├── loss/                    # Optimized loss functions
│   └── training/                # Training recipes and utilities
├── recipes/                     # Ready-to-use training recipes
│   ├── llm/                     # LLM fine-tuning recipes
│   └── vlm/                     # VLM fine-tuning recipes  
└── tests/                       # Comprehensive test suite

🤝 Contributing#

We welcome contributions! Please see our Contributing Guide for details.

📄 License#

NVIDIA NeMo AutoModel is licensed under the Apache License 2.0.

🔗 Links#

Documentation: https://docs.nvidia.com/nemo-framework/user-guide/latest/automodel/index.html
Hugging Face Hub: https://huggingface.co/models
Issues: https://github.com/NVIDIA-NeMo/Automodel/issues
Discussions: https://github.com/NVIDIA-NeMo/Automodel/discussions

Made with ❤️ by NVIDIA

Accelerating AI for everyone