> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# Multimodal Models

## Introduction

Multimodal models in this section combine understanding and generation capabilities across text and visual modalities. These model families may use custom training recipes, packed multimodal datasets, or task-specific model wrappers beyond the standard image-text-to-text fine-tuning path.

## Supported Models

| Owner          | Model                                     | Architectures                                                |
| -------------- | ----------------------------------------- | ------------------------------------------------------------ |
| ByteDance Seed | [BAGEL](/model-coverage/multimodal/bagel) | `BagelForUnifiedMultimodal`, `BagelForConditionalGeneration` |