> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.bagel.configuration

Configuration for the BAGEL mixed-modal LLM.

The config wraps a text config and a vision config. The flags
`visual_und` / `visual_gen` gate which paths are built at init time:

* Stage 1 (understanding-only): `visual_und=True`, `visual_gen=False`.
  Only the ViT + connector + LM path is active.
* Stage 2 (joint): `visual_gen=True`. Activates the MoT `*_moe_gen`
  parameter siblings, the VAE encode path, and the flow-matching head.

The checkpoint config names the nested configs `llm_config` / `vit_config`.
AM prefers `text_config` / `vision_config` to match the rest of the VLM
tree. We accept both sets of keys on input and expose both attributes on the
instance so that:

* `BagelConfig.from_pretrained("ByteDance-Seed/BAGEL-7B-MoT")` works
  against the checkpoint `config.json`.
* AM-native YAML (`_target_: nemo_automodel...BagelConfig`) can use the
  AM-flavored key names without surprises.

## Module Contents

### Classes

| Name                                                                               | Description             |
| ---------------------------------------------------------------------------------- | ----------------------- |
| [`BagelConfig`](#nemo_automodel-components-models-bagel-configuration-BagelConfig) | Top-level BAGEL config. |

### Functions

| Name                                                                                                   | Description                                                          |
| ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------- |
| [`_coerce_text_config`](#nemo_automodel-components-models-bagel-configuration-_coerce_text_config)     | Coerce `cfg` into a `Qwen2Config` with BAGEL's extra attributes set. |
| [`_coerce_vision_config`](#nemo_automodel-components-models-bagel-configuration-_coerce_vision_config) | Coerce `cfg` into a `SiglipVisionConfig` (our `rope`-flag variant).  |

### API

```python
class nemo_automodel.components.models.bagel.configuration.BagelConfig(
    vision_config: typing.Union[typing.Dict[str, typing.Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
    text_config: typing.Union[typing.Dict[str, typing.Any], transformers.Qwen2Config, None] = None,
    visual_und: bool = True,
    visual_gen: bool = False,
    stage: typing.Union[int, str, None] = None,
    llm_path: str = '',
    vit_path: str = '',
    vae_path: str = '',
    max_latent_size: int = 64,
    latent_patch_size: int = 2,
    vit_patch_size: int = 14,
    vit_max_num_patch_per_side: int = 70,
    connector_act: str = 'gelu_pytorch_tanh',
    interpolate_pos: bool = False,
    vit_select_layer: int = -2,
    vit_rope: bool = False,
    text_cond_dropout_prob: float = 0.1,
    vae_cond_dropout_prob: float = 0.3,
    vit_cond_dropout_prob: float = 0.3,
    timestep_shift: float = 1.0,
    pad_token_id: int = 151643,
    llm_config: typing.Union[typing.Dict[str, typing.Any], transformers.Qwen2Config, None] = None,
    vit_config: typing.Union[typing.Dict[str, typing.Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
    vae_config: typing.Union[typing.Dict[str, typing.Any], None] = None,
    kwargs: typing.Any = {}
)
```

**Bases:** `PretrainedConfig`

Top-level BAGEL config.

The text and vision sub-configs are nested :class:`PretrainedConfig`
instances (not bare dicts) so callers can introspect them the same way
they would with any other HF config.

Attribute aliases:

* `text_config` \<-> `llm_config` (both point at the same object)
* `vision_config` \<-> `vit_config` (ditto)

```python
nemo_automodel.components.models.bagel.configuration.BagelConfig.to_dict() -> typing.Dict[str, typing.Any]
```

```python
nemo_automodel.components.models.bagel.configuration._coerce_text_config(
    cfg: typing.Union[typing.Dict[str, typing.Any], transformers.Qwen2Config, None]
) -> transformers.Qwen2Config
```

Coerce `cfg` into a `Qwen2Config` with BAGEL's extra attributes set.

BAGEL adds three attributes to Qwen2Config that aren't part of stock
transformers:

* `qk_norm` (bool, default True for BAGEL-7B-MoT)
* `layer_module` (`"Qwen2DecoderLayer"` or `"Qwen2MoTDecoderLayer"`)
* `freeze_und` (bool, default False)

We also ensure `pad_token_id` is populated. Some checkpoint configs omit
it, and transformers 5.x raises `AttributeError` on missing config attrs.

```python
nemo_automodel.components.models.bagel.configuration._coerce_vision_config(
    cfg: typing.Union[typing.Dict[str, typing.Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None]
) -> typing.Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]
```

Coerce `cfg` into a `SiglipVisionConfig` (our `rope`-flag variant).