NeMo AutoModel Documentation | NVIDIA NeMo AutoModel

PyTorch-native training that scales from 1 GPU to thousands with a single config change. Load any Hugging Face model, point at your data, and start training; no checkpoint conversion and no boilerplate.

Quick Links

🤗 HF Compatible 🚀 Performance 📐 Scalability 🎯 SFT & PEFT 🎨 Diffusion 👁️ VLM 🌐 Omni 🌊 dLLM 🔊 Audio ⚡ Speculative

About

Overview of NeMo AutoModel and its capabilities.

Key Features

Supported workflows, parallelism, recipes, and benchmarks.

🤗 HF Integration

A transformers-compatible library with accelerated model implementations.

Model Coverage

Built on transformers for day-0 model support and OOTB compatibility.

Get Started

$ uv pip install nemo-automodel
$ 
$ automodel --nproc-per-node=2 llama3_2_1b_squad.yaml

See the installation guide for Docker, source builds, and multi-node setup. See the configuration guide for YAML recipes and CLI overrides. Launch on a local workstation or SLURM cluster.

Latest Model Support

New models are added regularly. Pick a model below to start fine-tuning, or see the full release log.

Date	Modality	Model
2026-05-19	LLM	Ling 2.0 (recipes)
2026-05-18	Audio	Qwen3-Omni ASR (recipe)
2026-05-17	LLM	ERNIE 4.5 (recipe)
2026-05-17	LLM	MiMo-V2-Flash (recipe)
2026-04-07	LLM	GLM-5.1 (recipe)
2026-04-02	VLM	Gemma 4 (recipe)
2026-03-16	VLM	Mistral Small 4 (recipe)
2026-03-11	LLM	Nemotron Super v3 (recipe)
2026-03-03	Diffusion	FLUX.1-dev (recipe)

Recipes and Guides

Find the right guide for your task: fine-tuning, pretraining, distillation, diffusion, and more.

I want to…	Choose this when…	Input Data	Model	Guide
SFT (full fine-tune)	You need maximum accuracy and have the GPU budget to update all weights	Instruction / chat dataset	LLM	Start fine-tuning
PEFT (LoRA)	You want to fine-tune on limited GPU memory; updates <1 % of parameters	Instruction / chat dataset	LLM	Start LoRA
Tool / function calling	Your model needs to call APIs or tools with structured arguments	Function-calling dataset (queries + tool schemas)	LLM	Add tool calling
Fine-tune VLM	Your task involves both images and text (e.g., visual QA, captioning)	Image + text dataset	VLM	Fine-tune VLM
Fine-tune Gemma 4	You want to fine-tune Gemma 4 for structured extraction from images (e.g., receipts)	Image + text dataset	VLM	Fine-tune Gemma 4
Fine-tune dLLM	You want to fine-tune a diffusion language model (e.g., LLaDA) using masked denoising	Instruction / chat dataset	dLLM	Fine-tune dLLM
Fine-tune Diffusion	You want to fine-tune a diffusion model for image or video generation	Video / Image dataset	Diffusion	Fine-tune Diffusion
Fine-tune VLM-MoE	You need large-scale vision-language training with sparse MoE efficiency	Image + text dataset	VLM (MoE)	Fine-tune VLM-MoE
Fine-tune agentic VLM-MoE	You need image/video context for agentic developer workflows	Image / video + text dataset	VLM (MoE)	Fine-tune Step-3.7-Flash
Fine-tune Audio ASR	Adapt Qwen3-Omni for speech recognition on HF audio datasets	Audio + transcript dataset	Qwen3-Omni	Fine-tune Qwen3-Omni ASR
Embedding fine-tune	You want to improve text similarity for search, retrieval, or RAG	Text pairs / retrieval corpus	LLM	Coming Soon
Fine-tune a large MoE	You are adapting a large sparse MoE model (DeepSeek-V3, GLM-5, etc.) to your domain	Text dataset (e.g., HellaSwag)	LLM (MoE)	Fine-tune MoE
Fine-tune DeepSeek V4 Flash	You want to fine-tune the DeepSeek V4 Flash hybrid-attention MoE (SWA / CSA / HCA + hash-routing)	Text dataset (e.g., HellaSwag)	LLM (MoE)	Fine-tune DeepSeek V4 Flash
Fine-tune Hy3-preview	You want to fine-tune Tencent’s 295B MoE with sigmoid routing and per-head QK RMSNorm	Text dataset (e.g., HellaSwag)	LLM (MoE)	Fine-tune Hy3-preview
Fine-tune Nemotron-3 Ultra	You want to fine-tune NVIDIA’s 550B-A55B hybrid Mamba-2 / LatentMoE model with MTP	Text dataset (e.g., HellaSwag)	LLM (MoE)	Fine-tune Nemotron-3 Ultra
Sequence classification	You need to classify text into categories (sentiment, topic, NLI)	Text + labels (e.g., GLUE MRPC)	LLM	Train classifier
QAT fine-tune	You want a quantized model that keeps accuracy for efficient deployment	Text dataset	LLM	Enable QAT
Knowledge distillation	You want a smaller, faster model that retains most of the teacher’s quality	Instruction dataset + teacher model	LLM	Distill a model
Pretrain an LLM	You are building a base model from scratch on your own corpus	Large unlabeled text corpus (e.g., FineWeb-Edu)	LLM	Start pretraining
Pretrain (NanoGPT)	You want quick pretraining experiments on a single node	FineWeb / text corpus	LLM	Try NanoGPT