> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# Omni Models

Omni models go beyond image-text understanding to support additional modalities such as audio, video, or a combination of all — text, image, audio, and video in a single unified model.

## Run Omni Models with NeMo AutoModel

To run omni models with NeMo AutoModel, use NeMo container version [`26.06.00`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel?version=26.06.00) or later. If the model you want to fine-tune requires a newer version of Transformers, you may need to upgrade:

```bash
pip3 install --upgrade git+git@github.com:NVIDIA-NeMo/AutoModel.git
```

For other installation options, see our [NeMo AutoModel Installation Guide](/get-started/installation).

## Supported Models

| Owner                | Model                                                      | Modalities                   | Architecture                          |
| -------------------- | ---------------------------------------------------------- | ---------------------------- | ------------------------------------- |
| Qwen / Alibaba Cloud | [Qwen3-Omni](/model-coverage/omni/qwen3-omni)              | Text · Image · Audio · Video | `Qwen3OmniForConditionalGeneration`   |
| Qwen / Alibaba Cloud | [Qwen2.5-Omni](/model-coverage/omni/qwen2-5-omni)          | Text · Image · Audio · Video | `Qwen2_5OmniForConditionalGeneration` |
| Microsoft            | [Phi-4-multimodal](/model-coverage/omni/phi-4-multimodal)  | Text · Image · Audio         | `Phi4MultimodalForCausalLM`           |
| NVIDIA               | [Nemotron-3-Nano-Omni](/model-coverage/omni/nemotron-omni) | Text · Image · Audio         | `NemotronH_Nano_Omni_Reasoning_V3`    |

## Fine-Tune Omni Models

All supported omni models can be fine-tuned using full SFT or PEFT (LoRA) approaches. See the [VLM Fine-Tuning Guide](/recipes-e2e-examples/gemma-3-3n) for general setup instructions.