Is this page helpful?

NeMo AutoModel#

NeMo AutoModel enables the training and fine-tuning of models accessible through the Hugging Face Transformer AutoModel classes. Specifically, it supports models such as:

AutoModelForCausalLM
AutoModelForImageTextToText
AutoModelForSpeechSeq2Seq

It covers Large Language Models (LLM), Vision Language Models (VLM), and Automatic Speech Recognition (ASR).

For distributed processing, the NeMo AutoModel provides integration with Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP2), ensuring efficient and scalable training across multiple GPUs and nodes.

To access tutorials about NeMo AutoModels, see the “Getting Started” section below.

For more information, browse the developer documentation for your area of interest in the contents section below or on the left sidebar.

AutoModel Code Documentation

AutoModel Data Documentation

HFDatasetDataModule

AutoModel Callbacks Documentation

JitTransform Class