`nemo_automodel.components.models.bagel.configuration`#

Configuration for the BAGEL mixed-modal LLM.

The config wraps a text config and a vision config. The flags visual_und / visual_gen gate which paths are built at init time:

Stage 1 (understanding-only): visual_und=True, visual_gen=False. Only the ViT + connector + LM path is active.
Stage 2 (joint): visual_gen=True. Activates the MoT *_moe_gen parameter siblings, the VAE encode path, and the flow-matching head.

The checkpoint config names the nested configs llm_config / vit_config. AM prefers text_config / vision_config to match the rest of the VLM tree. We accept both sets of keys on input and expose both attributes on the instance so that:

BagelConfig.from_pretrained("ByteDance-Seed/BAGEL-7B-MoT") works against the checkpoint config.json.
AM-native YAML (_target_: nemo_automodel...BagelConfig) can use the AM-flavored key names without surprises.

Module Contents#

Classes#

BagelConfig

Top-level BAGEL config.

Functions#

`_coerce_text_config`	Coerce `cfg` into a `Qwen2Config` with BAGEL’s extra attributes set.
`_coerce_vision_config`	Coerce `cfg` into a `SiglipVisionConfig` (our `rope`-flag variant).

API#

nemo_automodel.components.models.bagel.configuration._coerce_text_config( cfg: Union[Dict[str, Any], transformers.Qwen2Config, None], ) → transformers.Qwen2Config#

Coerce cfg into a Qwen2Config with BAGEL’s extra attributes set.

BAGEL adds three attributes to Qwen2Config that aren’t part of stock transformers:

qk_norm (bool, default True for BAGEL-7B-MoT)
layer_module ("Qwen2DecoderLayer" or "Qwen2MoTDecoderLayer")
freeze_und (bool, default False)

We also ensure pad_token_id is populated. Some checkpoint configs omit it, and transformers 5.x raises AttributeError on missing config attrs.

nemo_automodel.components.models.bagel.configuration._coerce_vision_config( cfg: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None], ) → Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]#: Coerce cfg into a SiglipVisionConfig (our rope-flag variant).

class nemo_automodel.components.models.bagel.configuration.BagelConfig(

vision_config: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,

text_config: Union[Dict[str, Any], transformers.Qwen2Config, None] = None,

*,

visual_und: bool = True,

visual_gen: bool = False,

stage: Union[int, str, None] = None,

llm_path: str = '',

vit_path: str = '',

vae_path: str = '',

max_latent_size: int = 64,

latent_patch_size: int = 2,

vit_patch_size: int = 14,

vit_max_num_patch_per_side: int = 70,

connector_act: str = 'gelu_pytorch_tanh',

interpolate_pos: bool = False,

vit_select_layer: int = -2,

vit_rope: bool = False,

text_cond_dropout_prob: float = 0.1,

vae_cond_dropout_prob: float = 0.3,

vit_cond_dropout_prob: float = 0.3,

timestep_shift: float = 1.0,

pad_token_id: int = 151643,

llm_config: Union[Dict[str, Any], transformers.Qwen2Config, None] = None,

vit_config: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,

vae_config: Union[Dict[str, Any], None] = None,

**kwargs: Any,

)#

Bases: transformers.configuration_utils.PretrainedConfig

Top-level BAGEL config.

The text and vision sub-configs are nested :class:PretrainedConfig instances (not bare dicts) so callers can introspect them the same way they would with any other HF config.

Attribute aliases:

text_config <-> llm_config (both point at the same object)
vision_config <-> vit_config (ditto)

Initialization

model_type#: ‘bagel’

property llm_config: transformers.Qwen2Config#

property vit_config: Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]#

to_dict() → Dict[str, Any]#

nemo_automodel.components.models.bagel.configuration#

Module Contents#

Classes#

Functions#

API#

`nemo_automodel.components.models.bagel.configuration`#