Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

NeMo AutoModel#

NeMo AutoModel enables the training and fine-tuning of models accessible through the Hugging Face Transformer AutoModel classes. Specifically, it supports models such as:

AutoModelForCausalLM
AutoModelForImageTextToText
AutoModelForSpeechSeq2Seq

It covers Large Language Models (LLM), Vision Language Models (VLM), and Automatic Speech Recognition (ASR).

For distributed processing, the NeMo AutoModel provides integration with Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP2), ensuring efficient and scalable training across multiple GPUs and nodes.

To access tutorials about NeMo AutoModels, see the “Getting Started” section below.

For more information, browse the developer documentation for your area of interest in the contents section below or on the left sidebar.

AutoModel Code Documentation

AutoModel Data Documentation

HFDatasetDataModule

AutoModel Callbacks Documentation

JitTransform Class