NeMo AutoModel Documentation

View as Markdown

PyTorch-native training that scales from 1 GPU to thousands with a single config change. Load any Hugging Face model, point at your data, and start training; no checkpoint conversion and no boilerplate.

🤗 HF Compatible 🚀 Performance 📐 Scalability 🎯 SFT & PEFT 🎨 Diffusion 👁️ VLM 🌐 Omni 🌊 dLLM 🔊 Audio ⚡ Speculative

Get Started

$uv pip install nemo-automodel
$
$automodel --nproc-per-node=2 llama3_2_1b_squad.yaml

See the installation guide for Docker, source builds, and multi-node setup. See the configuration guide for YAML recipes and CLI overrides. Launch on a local workstation or SLURM cluster.

Latest Model Support

New models are added regularly. Pick a model below to start fine-tuning, or see the full release log.

DateModalityModel
2026-05-19LLMLing 2.0 (recipes)
2026-05-18AudioQwen3-Omni ASR (recipe)
2026-05-17LLMERNIE 4.5 (recipe)
2026-05-17LLMMiMo-V2-Flash (recipe)
2026-04-07LLMGLM-5.1 (recipe)
2026-04-02VLMGemma 4 (recipe)
2026-03-16VLMMistral Small 4 (recipe)
2026-03-11LLMNemotron Super v3 (recipe)
2026-03-03DiffusionFLUX.1-dev (recipe)

Recipes and Guides

Find the right guide for your task: fine-tuning, pretraining, distillation, diffusion, and more.

I want to…Choose this when…Input DataModelGuide
SFT (full fine-tune)You need maximum accuracy and have the GPU budget to update all weightsInstruction / chat datasetLLMStart fine-tuning
PEFT (LoRA)You want to fine-tune on limited GPU memory; updates <1 % of parametersInstruction / chat datasetLLMStart LoRA
Tool / function callingYour model needs to call APIs or tools with structured argumentsFunction-calling dataset (queries + tool schemas)LLMAdd tool calling
Fine-tune VLMYour task involves both images and text (e.g., visual QA, captioning)Image + text datasetVLMFine-tune VLM
Fine-tune Gemma 4You want to fine-tune Gemma 4 for structured extraction from images (e.g., receipts)Image + text datasetVLMFine-tune Gemma 4
Fine-tune dLLMYou want to fine-tune a diffusion language model (e.g., LLaDA) using masked denoisingInstruction / chat datasetdLLMFine-tune dLLM
Fine-tune DiffusionYou want to fine-tune a diffusion model for image or video generationVideo / Image datasetDiffusionFine-tune Diffusion
Fine-tune VLM-MoEYou need large-scale vision-language training with sparse MoE efficiencyImage + text datasetVLM (MoE)Fine-tune VLM-MoE
Fine-tune agentic VLM-MoEYou need image/video context for agentic developer workflowsImage / video + text datasetVLM (MoE)Fine-tune Step-3.7-Flash
Fine-tune Audio ASRAdapt Qwen3-Omni for speech recognition on HF audio datasetsAudio + transcript datasetQwen3-OmniFine-tune Qwen3-Omni ASR
Embedding fine-tuneYou want to improve text similarity for search, retrieval, or RAGText pairs / retrieval corpusLLMComing Soon
Fine-tune a large MoEYou are adapting a large sparse MoE model (DeepSeek-V3, GLM-5, etc.) to your domainText dataset (e.g., HellaSwag)LLM (MoE)Fine-tune MoE
Fine-tune DeepSeek V4 FlashYou want to fine-tune the DeepSeek V4 Flash hybrid-attention MoE (SWA / CSA / HCA + hash-routing)Text dataset (e.g., HellaSwag)LLM (MoE)Fine-tune DeepSeek V4 Flash
Fine-tune Hy3-previewYou want to fine-tune Tencent’s 295B MoE with sigmoid routing and per-head QK RMSNormText dataset (e.g., HellaSwag)LLM (MoE)Fine-tune Hy3-preview
Fine-tune Nemotron-3 UltraYou want to fine-tune NVIDIA’s 550B-A55B hybrid Mamba-2 / LatentMoE model with MTPText dataset (e.g., HellaSwag)LLM (MoE)Fine-tune Nemotron-3 Ultra
Sequence classificationYou need to classify text into categories (sentiment, topic, NLI)Text + labels (e.g., GLUE MRPC)LLMTrain classifier
QAT fine-tuneYou want a quantized model that keeps accuracy for efficient deploymentText datasetLLMEnable QAT
Knowledge distillationYou want a smaller, faster model that retains most of the teacher’s qualityInstruction dataset + teacher modelLLMDistill a model
Pretrain an LLMYou are building a base model from scratch on your own corpusLarge unlabeled text corpus (e.g., FineWeb-Edu)LLMStart pretraining
Pretrain (NanoGPT)You want quick pretraining experiments on a single nodeFineWeb / text corpusLLMTry NanoGPT

Performance

Training throughput on NVIDIA GPUs with optimized kernels for Hugging Face models.

ModelGPUsTFLOPs/sec/GPUTokens/sec/GPUOptimizations
DeepSeek V3 671B2562501,002TE + DeepEP
GPT-OSS 20B827913,058TE + DeepEP + FlexAttn
Qwen3 MoE 30B827712,040TE + DeepEP

See the full benchmark results for configuration details and more models.

Advanced Topics

Parallelism, precision, checkpointing strategies, and experiment tracking.

For Developers


::

::

::

::

::

::

::

::