> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# BAGEL

[BAGEL-7B-MoT](https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT) is a unified multimodal model from ByteDance Seed. It combines a Qwen2 language backbone, a SigLIP-NaViT vision encoder, and mixture-of-transformations layers for mixed understanding and visual-generation training.

|                  |                                                              |
| ---------------- | ------------------------------------------------------------ |
| **Task**         | Multimodal Input/Output                                      |
| **Architecture** | `BagelForUnifiedMultimodal`, `BagelForConditionalGeneration` |
| **Parameters**   | 14B (two 7B towers)                                          |
| **HF Org**       | [ByteDance-Seed](https://huggingface.co/ByteDance-Seed)      |

## Available Models

* **BAGEL-7B-MoT**

## Architecture

* `BagelForUnifiedMultimodal`
* `BagelForConditionalGeneration`

## Example HF Models

| Model        | HF ID                                                                               |
| ------------ | ----------------------------------------------------------------------------------- |
| BAGEL-7B-MoT | [`ByteDance-Seed/BAGEL-7B-MoT`](https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT) |

## Example Recipes

| Recipe                                                                                                                            | Dataset                            | Description                                               |
| --------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------------------------------------- |
| [bagel\_pretrain.yaml](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/multimodal_pretrain/bagel/bagel_pretrain.yaml) | BAGEL-style packed multimodal data | Joint text-understanding and image-generation pretraining |
| [bagel\_sft.yaml](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/multimodal_finetune/bagel/bagel_sft.yaml)           | BAGEL-style packed multimodal data | Joint understanding + generation fine-tuning              |

## Try with NeMo AutoModel

**1. Install** ([full instructions](/get-started/installation)):

```bash
pip install nemo-automodel
```

**2. Clone the repo** to get the example recipes:

```bash
git clone https://github.com/NVIDIA-NeMo/Automodel.git
cd Automodel
```

**3. Run the recipe** from inside the repo:

```bash
automodel --nproc-per-node=8 examples/multimodal_pretrain/bagel/bagel_pretrain.yaml
```

## Hugging Face Model Cards

* [ByteDance-Seed/BAGEL-7B-MoT](https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT)