Omni Models
Omni models go beyond image-text understanding to support additional modalities such as audio, video, or a combination of all — text, image, audio, and video in a single unified model.
Run Omni Models with NeMo AutoModel
To run omni models with NeMo AutoModel, use NeMo container version 26.06.00 or later. If the model you want to fine-tune requires a newer version of Transformers, you may need to upgrade:
For other installation options, see our NeMo AutoModel Installation Guide.
Supported Models
Fine-Tune Omni Models
All supported omni models can be fine-tuned using full SFT or PEFT (LoRA) approaches. See the VLM Fine-Tuning Guide for general setup instructions.