Large Language Models (LLMs)

View as Markdown

Introduction

Large Language Models (LLMs) power a variety of tasks such as dialogue systems, text classification, summarization, and more. NeMo AutoModel provides a simple interface for loading and fine-tuning LLMs hosted on the Hugging Face Hub.

Run LLMs with NeMo AutoModel

To run LLMs with NeMo AutoModel, make sure you’re using NeMo container version 26.04.00 or later. If the model you intend to fine-tune requires a newer version of Transformers, you may need to upgrade to the latest version of NeMo AutoModel by using:

$pip3 install --upgrade git+git@github.com:NVIDIA-NeMo/AutoModel.git

For other installation options (e.g., uv), see the NeMo AutoModel Installation Guide.

Supported Models

NeMo AutoModel supports the AutoModelForCausalLM in the Text Generation category. During preprocessing, it uses transformers.AutoTokenizer, which is sufficient for most LLM cases. If your model requires custom text handling, override the tokenizer in your recipe YAML or provide a custom dataset _target_. See LLM datasets and dataset overview.

OwnerModel FamilyArchitectures
MetaLlamaLlamaForCausalLM
GoogleGemmaGemmaForCausalLM, Gemma2ForCausalLM, Gemma3ForCausalLM
Qwen / Alibaba CloudQwen2Qwen2ForCausalLM
Qwen / Alibaba CloudQwen2 MoEQwen2MoeForCausalLM
Qwen / Alibaba CloudQwen3Qwen3ForCausalLM
Qwen / Alibaba CloudQwen3 MoEQwen3MoeForCausalLM
Qwen / Alibaba CloudQwen3-NextQwen3NextForCausalLM
DeepSeekDeepSeekDeepseekForCausalLM
DeepSeekDeepSeek-V3DeepseekV3ForCausalLM, DeepseekV32ForCausalLM
DeepSeekDeepSeek V4 FlashDeepseekV4ForCausalLM
Mistral AIMistralMistralForCausalLM
Mistral AIMixtralMixtralForCausalLM
Mistral AIMinistral3 / DevstralMistral3ForConditionalGeneration
MicrosoftPhiPhiForCausalLM
MicrosoftPhi-3 / Phi-4Phi3ForCausalLM
MicrosoftPhi-3-SmallPhi3SmallForCausalLM
NVIDIANemotron / MinitronNemotronForCausalLM
NVIDIANemotron-HNemotronHForCausalLM
NVIDIANemotron-FlashNemotronFlashForCausalLM
NVIDIANemotron-SuperDeciLMForCausalLM
THUDM / Zhipu AIChatGLMChatGLMModel
THUDM / Zhipu AIGLM-4GlmForCausalLM, Glm4ForCausalLM
THUDM / ZAIGLM-4 MoEGlm4MoeForCausalLM, Glm4MoeLiteForCausalLM
THUDM / ZAIGLM-5 / GLM-5.1GlmMoeDsaForCausalLM
IBMGraniteGraniteForCausalLM
IBMGranite MoEGraniteMoeForCausalLM, GraniteMoeSharedForCausalLM
IBMBambaBambaForCausalLM
Allen AIOLMoOLMoForCausalLM
Allen AIOLMo2OLMo2ForCausalLM
Allen AIOLMoEOLMoEForCausalLM
OpenAIGPT-OSSGptOssForCausalLM
OpenAIGPT-2GPT2LMHeadModel
EleutherAIGPT-JGPTJForCausalLM
EleutherAIGPT-NeoX / PythiaGPTNeoXForCausalLM
BigCodeStarCoderGPTBigCodeForCausalLM
BigCodeStarCoder2Starcoder2ForCausalLM
BAAIAquila / Aquila2AquilaForCausalLM
Baichuan IncBaichuan / Baichuan2BaiChuanForCausalLM
CohereCommand-RCohereForCausalLM, Cohere2ForCausalLM
TIIFalconFalconForCausalLM
LG AI ResearchEXAONEExaoneForCausalLM
InternLMInternLMInternLMForCausalLM, InternLM2ForCausalLM, InternLM3ForCausalLM
Inception AIJaisJAISLMHeadModel
MiniMaxMiniMax-M2MiniMaxM2ForCausalLM
OpenBMBMiniCPMMiniCPMForCausalLM, MiniCPM3ForCausalLM
Moonshot AIMoonlightDeepseekV3ForCausalLM
ByteDance SeedSeed (ByteDance)Qwen2ForCausalLM
UpstageSolar ProSolarForCausalLM
OrionStarOrionOrionForCausalLM
Stability AIStableLMStableLmForCausalLM
Stepfun AIStep-3.5Step3p5ForCausalLM
Parasail AIGritLMGritLM
TencentHy3-previewHYV3ForCausalLM

Fine-Tuning LLMs with NeMo AutoModel

The models listed above can be fine-tuned using NeMo AutoModel. We support two primary fine-tuning approaches:

  1. Parameter-Efficient Fine-Tuning (PEFT): Updates only a small subset of parameters (typically <1%) using techniques like Low-Rank Adaptation (LoRA).
  2. Supervised Fine-Tuning (SFT): Updates all or most model parameters for deeper adaptation.

See the Fine-Tuning Guide to learn how to apply both methods to your data.

In these guides, we use the SQuAD v1.1 dataset for demonstration purposes, but you can use your own data. Update the recipe YAML dataset / validation_dataset sections accordingly. See LLM datasets and dataset overview.