Large Language Models (LLMs)#

Introduction#

Large Language Models (LLMs) power a variety of tasks such as dialogue systems, text classification, summarization, and more. NeMo AutoModel provides a simple interface for loading and fine-tuning LLMs hosted on the Hugging Face Hub.

Run LLMs with NeMo AutoModel#

To run LLMs with NeMo AutoModel, make sure you’re using NeMo container version 25.11.00 or later. If the model you intend to fine-tune requires a newer version of Transformers, you may need to upgrade to the latest version of NeMo AutoModel by using:

   pip3 install --upgrade git+git@github.com:NVIDIA-NeMo/AutoModel.git

For other installation options (e.g., uv), please see our Installation Guide.

Supported Models#

NeMo AutoModel supports the AutoModelForCausalLM in the Text Generation category. During preprocessing, it uses transformers.AutoTokenizer, which is sufficient for most LLM cases. If your model requires custom text handling (for example, a custom chat template), override the tokenizer in your recipe YAML (for example, set dataset.tokenizer / validation_dataset.tokenizer) or provide a custom dataset _target_. See LLM datasets and dataset overview.

The table below lists the main architectures we test against (FSDP2 combined with SFT/PEFT) and includes a representative checkpoint for each.

Architecture	Models	Example HF Models
`AquilaForCausalLM`	Aquila, Aquila2	`BAAI/Aquila-7B`, `BAAI/AquilaChat-7B`, etc.
`BaiChuanForCausalLM`	Baichuan2, Baichuan	`baichuan-inc/Baichuan2-13B-Chat`, `baichuan-inc/Baichuan-7B`, etc. — example recipes: baichuan_2_7b_squad.yaml, baichuan_2_7b_squad_peft.yaml
`BambaForCausalLM`	Bamba	`ibm-ai-platform/Bamba-9B`
`ChatGLMModel` / `ChatGLMForConditionalGeneration`	ChatGLM	`THUDM/chatglm2-6b`, `THUDM/chatglm3-6b`, etc.
`CohereForCausalLM` / `Cohere2ForCausalLM`	Command‑R	`CohereForAI/c4ai-command-r-v01`, `CohereForAI/c4ai-command-r7b-12-2024`, etc. — example recipes: cohere_command_r_7b_squad.yaml, cohere_command_r_7b_squad_peft.yaml
`DeciLMForCausalLM`	DeciLM	`nvidia/Llama-3_3-Nemotron-Super-49B-v1`, etc.
`DeepseekForCausalLM`	DeepSeek	`deepseek-ai/deepseek-llm-7b-chat` etc.
`DeepseekV3ForCausalLM` / `DeepseekV32ForCausalLM`	DeepSeek V3, DeepSeek V3.2, Moonlight	`deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-V3.2`, `moonshotai/Moonlight-16B-A3B` — example recipes: deepseek_v32_hellaswag_pp.yaml, moonlight_16b_te.yaml, moonlight_16b_te_packed_sequence.yaml
`ExaoneForCausalLM`	EXAONE‑3	`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
`FalconForCausalLM`	Falcon	`tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc. — example recipes: falcon3_7b_instruct_squad.yaml, falcon3_7b_instruct_squad_peft.yaml
`GemmaForCausalLM`	Gemma	`google/gemma-2b`, `google/gemma-1.1-2b-it`, etc.
`Gemma2ForCausalLM`	Gemma 2	`google/gemma-2-9b`, etc. — example recipes: gemma_2_9b_it_squad.yaml, gemma_2_9b_it_squad_peft.yaml
`Gemma3ForCausalLM`	Gemma 3	`google/gemma-3-1b-it` etc. — example recipes: gemma_3_270m_squad.yaml, gemma_3_270m_squad_peft.yaml
`GlmForCausalLM`	GLM‑4	`THUDM/glm-4-9b-chat-hf` etc. — example recipes: glm_4_9b_chat_hf_squad.yaml, glm_4_9b_chat_hf_hellaswag_fp8.yaml
`Glm4ForCausalLM`	GLM‑4‑0414	`THUDM/GLM-4-32B-0414` etc.
`Glm4MoeForCausalLM`	GLM‑4‑MoE	`zai-org/GLM-4.5-Air`, `zai-org/GLM-4.7` — example recipes: glm_4.5_air_te_deepep.yaml, glm_4.7_te_deepep.yaml
`Glm4MoeLiteForCausalLM`	GLM‑4‑MoE Lite	`zai-org/GLM-4.7-Flash` — example recipes: glm_4.7_flash_te_deepep.yaml, glm_4.7_flash_te_packed_sequence.yaml
`GPTBigCodeForCausalLM`	StarCoder, SantaCoder, WizardCoder	`bigcode/starcoder`, `bigcode/gpt_bigcode-santacoder`, `WizardLM/WizardCoder-15B-V1.0` etc.
`GPTJForCausalLM`	GPT‑J	`EleutherAI/gpt-j-6b`, `nomic-ai/gpt4all-j` etc.
`GPTNeoXForCausalLM`	GPT‑NeoX, Pythia, OpenAssistant, Dolly V2, StableLM	`EleutherAI/gpt-neox-20b`, `EleutherAI/pythia-12b`, `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, `databricks/dolly-v2-12b`, `stabilityai/stablelm-tuned-alpha-7b` etc.
`GptOssForCausalLM`	GPT-OSS	`openai/gpt-oss-20b`, `openai/gpt-oss-120b` — example recipes: gpt_oss_20b.yaml, gpt_oss_120b.yaml
`GraniteForCausalLM`	Granite 3.0, Granite 3.1, PowerLM	`ibm-granite/granite-3.0-2b-base`, `ibm-granite/granite-3.1-8b-instruct`, `ibm/PowerLM-3b` etc. — example recipes: granite_3_3_2b_instruct_squad.yaml, granite_3_3_2b_instruct_squad_peft.yaml
`GraniteMoeForCausalLM`	Granite 3.0 MoE, PowerMoE	`ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b` etc.
`GraniteMoeSharedForCausalLM`	Granite MoE Shared	`ibm-research/moe-7b-1b-active-shared-experts` (test model)
`GritLM`	GritLM	`parasail-ai/GritLM-7B-vllm`
`InternLMForCausalLM`	InternLM	`internlm/internlm-7b`, `internlm/internlm-chat-7b` etc.
`InternLM2ForCausalLM`	InternLM2	`internlm/internlm2-7b`, `internlm/internlm2-chat-7b` etc.
`InternLM3ForCausalLM`	InternLM3	`internlm/internlm3-8b-instruct` etc.
`JAISLMHeadModel`	Jais	`inceptionai/jais-13b`, `inceptionai/jais-13b-chat`, `inceptionai/jais-30b-v3`, `inceptionai/jais-30b-chat-v3` etc.
`LlamaForCausalLM`	Llama 3.1, Llama 3, Llama 2, LLaMA, Yi	`meta-llama/Meta-Llama-3.1-70B`, `meta-llama/Meta-Llama-3-70B-Instruct`, `meta-llama/Llama-2-70b-hf`, `01-ai/Yi-34B` etc. — example recipes: llama3_2_1b_squad.yaml, llama_3_3_70b_instruct_squad.yaml
`MiniMaxM2ForCausalLM`	MiniMax M2	`MiniMaxAI/MiniMax-M2.1`, `MiniMaxAI/MiniMax-M2.5` — example recipes: minimax_m2.1_hellaswag_pp.yaml, minimax_m2.5_hellaswag_pp.yaml
`MiniCPMForCausalLM`	MiniCPM	`openbmb/MiniCPM-2B-sft-bf16`, `openbmb/MiniCPM-2B-dpo-bf16` etc.
`MiniCPM3ForCausalLM`	MiniCPM3	`openbmb/MiniCPM3-4B` etc.
`MistralForCausalLM`	Mistral, Mistral‑Instruct, Mistral‑Nemo	`mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, `mistralai/Mistral-Nemo-Instruct-2407` etc. — example recipes: mistral_7b_squad.yaml, mistral_7b_squad_peft.yaml, mistral_nemo_2407_squad.yaml, mistral_nemo_2407_squad_peft.yaml
`MixtralForCausalLM`	Mixtral‑8x7B, Mixtral‑8x7B‑Instruct	`mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1` etc. — example recipes: mixtral-8x7b-v0-1_squad.yaml, mixtral-8x7b-v0-1_squad_peft.yaml
`NemotronForCausalLM`	Nemotron‑3, Nemotron‑4, Minitron	`nvidia/Minitron-8B-Base` etc.
`NemotronHForCausalLM`	Nemotron-Nano-{9B,12B}	`nvidia/NVIDIA-Nemotron-Nano-9B-v2`, `nvidia/NVIDIA-Nemotron-Nano-12B-v2` — example recipes: nemotron_nano_9b_squad.yaml, nemotron_nano_9b_squad_peft.yaml
`NemotronHForCausalLM`	Nemotron-3-Nano-30B-A3B-BF16	`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` — example recipes: nemotron_nano_v3_hellaswag.yaml, nemotron_nano_v3_hellaswag_peft.yaml
`OLMoForCausalLM`	OLMo	`allenai/OLMo-1B-hf`, `allenai/OLMo-7B-hf` etc.
`OLMo2ForCausalLM`	OLMo2	`allenai/OLMo2-7B-1124` etc. — example recipes: olmo_2_0425_1b_instruct_squad.yaml, olmo_2_0425_1b_instruct_squad_peft.yaml
`OLMoEForCausalLM`	OLMoE	`allenai/OLMoE-1B-7B-0924`, `allenai/OLMoE-1B-7B-0924-Instruct` etc.
`OrionForCausalLM`	Orion	`OrionStarAI/Orion-14B-Base`, `OrionStarAI/Orion-14B-Chat` etc.
`PhiForCausalLM`	Phi	`microsoft/phi-1_5`, `microsoft/phi-2` etc. — example recipes: phi_2_squad.yaml, phi_2_squad_peft.yaml
`Phi3ForCausalLM`	Phi‑4, Phi‑3	`microsoft/Phi-4-mini-instruct`, `microsoft/Phi-4`, `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct` etc. — example recipes: phi_4_squad.yaml, phi_4_squad_peft.yaml, phi_3_mini_it_squad.yaml, phi_3_mini_it_squad_peft.yaml
`Phi3SmallForCausalLM`	Phi‑3‑Small	`microsoft/Phi-3-small-8k-instruct`, `microsoft/Phi-3-small-128k-instruct` etc.
`Qwen2ForCausalLM`	QwQ, Qwen2	`Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B` etc. — example recipes: qwen2_5_7b_squad.yaml, qwq_32b_squad_peft.yaml
`Qwen2MoeForCausalLM`	Qwen2MoE	`Qwen/Qwen1.5-MoE-A2.7B`, `Qwen/Qwen1.5-MoE-A2.7B-Chat` etc. — example recipe: qwen1_5_moe_a2_7b_qlora.yaml
`Qwen3ForCausalLM`	Qwen3	`Qwen/Qwen3-8B` etc. — example recipes: qwen3_0p6b_hellaswag.yaml, qwen3_8b_squad_spark.yaml
`Qwen3MoeForCausalLM`	Qwen3MoE	`Qwen/Qwen3-30B-A3B` etc. — example recipes: qwen3_moe_30b_te_deepep.yaml, qwen3_moe_30b_lora.yaml
`Qwen3NextForCausalLM`	Qwen3‑Next	`Qwen/Qwen3-Next-80B-A3B-Instruct` — example recipe: qwen3_next_te_deepep.yaml
`Step3p5ForCausalLM`	Step‑3.5	`stepfun-ai/Step-3.5-Flash` — example recipe: step_3.5_flash_hellaswag_pp.yaml
`StableLmForCausalLM`	StableLM	`stabilityai/stablelm-3b-4e1t`, `stabilityai/stablelm-base-alpha-7b-v2` etc.
`Starcoder2ForCausalLM`	Starcoder2	`bigcode/starcoder2-3b`, `bigcode/starcoder2-7b`, `bigcode/starcoder2-15b` etc. — example recipes: starcoder_2_7b_squad.yaml, starcoder_2_7b_hellaswag_fp8.yaml
`SolarForCausalLM`	Solar Pro	`upstage/solar-pro-preview-instruct` etc.
`Mistral3ForConditionalGeneration`	Ministral3 3B, 8B, 14B	`mistralai/Ministral-3-8B-Instruct-2512`, `mistralai/Ministral-3-3B-Instruct-2512`, `mistralai/Ministral-3-14B-Instruct-2512`
`Mistral3ForConditionalGeneration`	Devstral-Small-2-24B	`mistralai/Devstral-Small-2-24B-Instruct-2512` — example recipes: devstral2_small_2512_squad.yaml, devstral2_small_2512_squad_peft.yaml
`NemotronFlashForCausalLM` ¹	Nemotron‑Flash	`nvidia/Nemotron-Flash-1B` — example recipes: nemotron_flash_1b_squad.yaml, nemotron_flash_1b_squad_peft.yaml
`Qwen2ForCausalLM` ²	Seed‑Coder	`ByteDance-Seed/Seed-Coder-8B-Instruct` — example recipes: seed_coder_8b_instruct_squad.yaml, seed_coder_8b_instruct_squad_peft.yaml
`Qwen2ForCausalLM` ²	Seed‑OSS	`ByteDance-Seed/Seed-OSS-36B-Instruct` — example recipes: seed_oss_36B_hellaswag.yaml, seed_oss_36B_hellaswag_peft.yaml

¹ Requires trust_remote_code: true. ² These models reuse an existing architecture; the architecture column shows the underlying class.

Fine-Tuning LLMs with NeMo AutoModel#

The models listed above can be fine-tuned using NeMo AutoModel to adapt them to specific tasks or domains. We support two primary fine-tuning approaches:

Parameter-Efficient Fine-Tuning (PEFT): Updates only a small subset of parameters (typically <1%) using techniques like Low-Rank Adaptation (LoRA). This is ideal for resource-constrained environments.
Supervised Fine-Tuning (SFT): Updates all or most model parameters for deeper adaptation, suitable for high-precision applications.

Please see our Fine-Tuning Guide to learn how you can apply both of these fine-tuning methods to your data.

Tip

In these guides, we use the SQuAD v1.1 dataset for demonstration purposes, but you can use your own data.

To do so, update the recipe YAML dataset / validation_dataset sections (for example dataset._target_, dataset_name/path_or_dataset, and split). See LLM datasets and dataset overview.