Large Language Models (LLMs)#

Introduction#

Large Language Models (LLMs) power a variety of tasks such as dialogue systems, text classification, summarization, and more. NeMo AutoModel provides a simple interface for loading and fine-tuning LLMs hosted on the Hugging Face Hub.

Run LLMs with NeMo AutoModel#

To run LLMs with NeMo AutoModel, make sure you’re using NeMo container version 25.11.00 or later. If the model you intend to fine-tune requires a newer version of Transformers, you may need to upgrade to the latest version of NeMo AutoModel by using:

pip3 install --upgrade git+git@github.com:NVIDIA-NeMo/AutoModel.git

For other installation options (e.g., uv), see the NeMo AutoModel Installation Guide.

Supported Models#

NeMo AutoModel supports the AutoModelForCausalLM in the Text Generation category. During preprocessing, it uses transformers.AutoTokenizer, which is sufficient for most LLM cases. If your model requires custom text handling, override the tokenizer in your recipe YAML or provide a custom dataset _target_. See LLM datasets and dataset overview.

Owner	Model Family	Architectures
Meta	Llama	`LlamaForCausalLM`
Google	Gemma	`GemmaForCausalLM`, `Gemma2ForCausalLM`, `Gemma3ForCausalLM`
Qwen / Alibaba Cloud	Qwen2	`Qwen2ForCausalLM`
Qwen / Alibaba Cloud	Qwen2 MoE	`Qwen2MoeForCausalLM`
Qwen / Alibaba Cloud	Qwen3	`Qwen3ForCausalLM`
Qwen / Alibaba Cloud	Qwen3 MoE	`Qwen3MoeForCausalLM`
Qwen / Alibaba Cloud	Qwen3-Next	`Qwen3NextForCausalLM`
Baidu	ERNIE 4.5	`Ernie4_5ForCausalLM`, `Ernie4_5_MoeForCausalLM`
DeepSeek	DeepSeek	`DeepseekForCausalLM`
DeepSeek	DeepSeek-V3	`DeepseekV3ForCausalLM`, `DeepseekV32ForCausalLM`
DeepSeek	DeepSeek V4 Flash	`DeepseekV4ForCausalLM`
Mistral AI	Mistral	`MistralForCausalLM`
Mistral AI	Mixtral	`MixtralForCausalLM`
Mistral AI	Ministral3 / Devstral	`Mistral3ForConditionalGeneration`
Microsoft	Phi	`PhiForCausalLM`
Microsoft	Phi-3 / Phi-4	`Phi3ForCausalLM`
Microsoft	Phi-3-Small	`Phi3SmallForCausalLM`
NVIDIA	Nemotron	`NemotronForCausalLM`
NVIDIA	Nemotron-H	`NemotronHForCausalLM`
NVIDIA	Nemotron-Flash	`NemotronFlashForCausalLM`
NVIDIA	Nemotron-Super	`DeciLMForCausalLM`
ZAI / Zhipu AI	ChatGLM	`ChatGLMModel`
ZAI / Zhipu AI	GLM-4	`GlmForCausalLM`, `Glm4ForCausalLM`
ZAI / Zhipu AI	GLM-4 MoE	`Glm4MoeForCausalLM`, `Glm4MoeLiteForCausalLM`
ZAI / Zhipu AI	GLM-5 / GLM-5.1	`GlmMoeDsaForCausalLM`
IBM	Granite	`GraniteForCausalLM`
IBM	Granite MoE	`GraniteMoeForCausalLM`, `GraniteMoeSharedForCausalLM`
IBM	Bamba	`BambaForCausalLM`
Allen AI	OLMo	`OLMoForCausalLM`
Allen AI	OLMo2	`OLMo2ForCausalLM`
Allen AI	OLMoE	`OLMoEForCausalLM`
OpenAI	GPT-OSS	`GptOssForCausalLM`
OpenAI	GPT-2	`GPT2LMHeadModel`
EleutherAI	GPT-J	`GPTJForCausalLM`
EleutherAI	GPT-NeoX / Pythia	`GPTNeoXForCausalLM`
BigCode	StarCoder	`GPTBigCodeForCausalLM`
BigCode	StarCoder2	`Starcoder2ForCausalLM`
BAAI	Aquila	`AquilaForCausalLM`
Baichuan Inc	Baichuan	`BaiChuanForCausalLM`
Cohere	Command-R	`CohereForCausalLM`, `Cohere2ForCausalLM`
TII	Falcon	`FalconForCausalLM`
LG AI Research	EXAONE	`ExaoneForCausalLM`
InternLM	InternLM	`InternLMForCausalLM`, `InternLM2ForCausalLM`, `InternLM3ForCausalLM`
Inception AI	Jais	`JAISLMHeadModel`
MiniMax	MiniMax-M2	`MiniMaxM2ForCausalLM`
OpenBMB	MiniCPM	`MiniCPMForCausalLM`, `MiniCPM3ForCausalLM`
Moonshot AI	Moonlight	`DeepseekV3ForCausalLM`
ByteDance Seed	Seed	`Qwen2ForCausalLM`
Upstage	Solar	`SolarForCausalLM`
OrionStar	Orion	`OrionForCausalLM`
Stability AI	StableLM	`StableLmForCausalLM`
Stepfun AI	Step-3.5	`Step3p5ForCausalLM`
Parasail AI	GritLM	`GritLM`
Tencent	Hy3-preview	`HYV3ForCausalLM`
Xiaomi MiMo	MiMo-V2-Flash	`MiMoV2FlashForCausalLM`
inclusionAI	Ling 2.0	`BailingMoeV2ForCausalLM`

Fine-Tuning LLMs with NeMo AutoModel#

The models listed above can be fine-tuned using NeMo AutoModel. We support two primary fine-tuning approaches:

Parameter-Efficient Fine-Tuning (PEFT): Updates only a small subset of parameters (typically <1%) using techniques like Low-Rank Adaptation (LoRA).
Supervised Fine-Tuning (SFT): Updates all or most model parameters for deeper adaptation.

See the Fine-Tuning Guide to learn how to apply both methods to your data.

Tip

In these guides, we use the SQuAD v1.1 dataset for demonstration purposes, but you can use your own data. Update the recipe YAML dataset / validation_dataset sections accordingly. See LLM datasets and dataset overview.