Large Language Models (LLMs)#

Introduction#

Large Language Models (LLMs) power a variety of tasks such as dialogue systems, text classification, summarization, and more. NeMo AutoModel provides a simple interface for loading and fine-tuning LLMs hosted on the Hugging Face Hub.

Run LLMs with NeMo AutoModel#

To run LLMs with NeMo AutoModel, make sure you’re using NeMo container version 25.11.00 or later. If the model you intend to fine-tune requires a newer version of Transformers, you may need to upgrade to the latest version of NeMo AutoModel by using:

pip3 install --upgrade git+git@github.com:NVIDIA-NeMo/AutoModel.git

For other installation options (e.g., uv), please see our Installation Guide.

Supported Models#

NeMo AutoModel supports the AutoModelForCausalLM in the Text Generation category. During preprocessing, it uses transformers.AutoTokenizer, which is sufficient for most LLM cases. If your model requires custom text handling, override the tokenizer in your recipe YAML or provide a custom dataset _target_. See LLM datasets and dataset overview.

Owner

Model Family

Architectures

Meta

Llama

LlamaForCausalLM

Google

Gemma

GemmaForCausalLM, Gemma2ForCausalLM, Gemma3ForCausalLM

Qwen / Alibaba Cloud

Qwen2

Qwen2ForCausalLM

Qwen / Alibaba Cloud

Qwen2 MoE

Qwen2MoeForCausalLM

Qwen / Alibaba Cloud

Qwen3

Qwen3ForCausalLM

Qwen / Alibaba Cloud

Qwen3 MoE

Qwen3MoeForCausalLM

Qwen / Alibaba Cloud

Qwen3-Next

Qwen3NextForCausalLM

DeepSeek

DeepSeek

DeepseekForCausalLM

DeepSeek

DeepSeek-V3

DeepseekV3ForCausalLM, DeepseekV32ForCausalLM

Mistral AI

Mistral

MistralForCausalLM

Mistral AI

Mixtral

MixtralForCausalLM

Mistral AI

Ministral3 / Devstral

Mistral3ForConditionalGeneration

Microsoft

Phi

PhiForCausalLM

Microsoft

Phi-3 / Phi-4

Phi3ForCausalLM

Microsoft

Phi-3-Small

Phi3SmallForCausalLM

NVIDIA

Nemotron

NemotronForCausalLM

NVIDIA

Nemotron-H

NemotronHForCausalLM

NVIDIA

Nemotron-Flash

NemotronFlashForCausalLM

NVIDIA

Nemotron-Super

DeciLMForCausalLM

THUDM / Zhipu AI

ChatGLM

ChatGLMModel

THUDM / Zhipu AI

GLM-4

GlmForCausalLM, Glm4ForCausalLM

THUDM / ZAI

GLM-4 MoE

Glm4MoeForCausalLM, Glm4MoeLiteForCausalLM

THUDM / ZAI

GLM-5 / GLM-5.1

GlmMoeDsaForCausalLM

IBM

Granite

GraniteForCausalLM

IBM

Granite MoE

GraniteMoeForCausalLM, GraniteMoeSharedForCausalLM

IBM

Bamba

BambaForCausalLM

Allen AI

OLMo

OLMoForCausalLM

Allen AI

OLMo2

OLMo2ForCausalLM

Allen AI

OLMoE

OLMoEForCausalLM

OpenAI

GPT-OSS

GptOssForCausalLM

EleutherAI

GPT-J

GPTJForCausalLM

EleutherAI

GPT-NeoX / Pythia

GPTNeoXForCausalLM

BigCode

StarCoder

GPTBigCodeForCausalLM

BigCode

StarCoder2

Starcoder2ForCausalLM

BAAI

Aquila

AquilaForCausalLM

Baichuan Inc

Baichuan

BaiChuanForCausalLM

Cohere

Command-R

CohereForCausalLM, Cohere2ForCausalLM

TII

Falcon

FalconForCausalLM

LG AI Research

EXAONE

ExaoneForCausalLM

InternLM

InternLM

InternLMForCausalLM, InternLM2ForCausalLM, InternLM3ForCausalLM

Inception AI

Jais

JAISLMHeadModel

MiniMax

MiniMax-M2

MiniMaxM2ForCausalLM

OpenBMB

MiniCPM

MiniCPMForCausalLM, MiniCPM3ForCausalLM

Moonshot AI

Moonlight

DeepseekV3ForCausalLM

ByteDance Seed

Seed

Qwen2ForCausalLM

Upstage

Solar

SolarForCausalLM

OrionStar

Orion

OrionForCausalLM

Stability AI

StableLM

StableLmForCausalLM

Stepfun AI

Step-3.5

Step3p5ForCausalLM

Parasail AI

GritLM

GritLM

Fine-Tuning LLMs with NeMo AutoModel#

The models listed above can be fine-tuned using NeMo AutoModel. We support two primary fine-tuning approaches:

  1. Parameter-Efficient Fine-Tuning (PEFT): Updates only a small subset of parameters (typically <1%) using techniques like Low-Rank Adaptation (LoRA).

  2. Supervised Fine-Tuning (SFT): Updates all or most model parameters for deeper adaptation.

Please see our Fine-Tuning Guide to learn how to apply both methods to your data.

Tip

In these guides, we use the SQuAD v1.1 dataset for demonstration purposes, but you can use your own data. Update the recipe YAML dataset / validation_dataset sections accordingly. See LLM datasets and dataset overview.