nemo_automodel.components.models.bagel.configuration#

Configuration for the BAGEL mixed-modal LLM.

The config wraps a text config and a vision config. The flags visual_und / visual_gen gate which paths are built at init time:

  • Stage 1 (understanding-only): visual_und=True, visual_gen=False. Only the ViT + connector + LM path is active.

  • Stage 2 (joint): visual_gen=True. Activates the MoT *_moe_gen parameter siblings, the VAE encode path, and the flow-matching head.

The checkpoint config names the nested configs llm_config / vit_config. AM prefers text_config / vision_config to match the rest of the VLM tree. We accept both sets of keys on input and expose both attributes on the instance so that:

  • BagelConfig.from_pretrained("ByteDance-Seed/BAGEL-7B-MoT") works against the checkpoint config.json.

  • AM-native YAML (_target_: nemo_automodel...BagelConfig) can use the AM-flavored key names without surprises.

Module Contents#

Classes#

BagelConfig

Top-level BAGEL config.

Functions#

_coerce_text_config

Coerce cfg into a Qwen2Config with BAGEL’s extra attributes set.

_coerce_vision_config

Coerce cfg into a SiglipVisionConfig (our rope-flag variant).

API#

nemo_automodel.components.models.bagel.configuration._coerce_text_config(
cfg: Union[Dict[str, Any], transformers.Qwen2Config, None],
) transformers.Qwen2Config#

Coerce cfg into a Qwen2Config with BAGEL’s extra attributes set.

BAGEL adds three attributes to Qwen2Config that aren’t part of stock transformers:

  • qk_norm (bool, default True for BAGEL-7B-MoT)

  • layer_module ("Qwen2DecoderLayer" or "Qwen2MoTDecoderLayer")

  • freeze_und (bool, default False)

We also ensure pad_token_id is populated. Some checkpoint configs omit it, and transformers 5.x raises AttributeError on missing config attrs.

nemo_automodel.components.models.bagel.configuration._coerce_vision_config(
cfg: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None],
) Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]#

Coerce cfg into a SiglipVisionConfig (our rope-flag variant).

class nemo_automodel.components.models.bagel.configuration.BagelConfig(
vision_config: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
text_config: Union[Dict[str, Any], transformers.Qwen2Config, None] = None,
*,
visual_und: bool = True,
visual_gen: bool = False,
stage: Union[int, str, None] = None,
llm_path: str = '',
vit_path: str = '',
vae_path: str = '',
max_latent_size: int = 64,
latent_patch_size: int = 2,
vit_patch_size: int = 14,
vit_max_num_patch_per_side: int = 70,
connector_act: str = 'gelu_pytorch_tanh',
interpolate_pos: bool = False,
vit_select_layer: int = -2,
vit_rope: bool = False,
text_cond_dropout_prob: float = 0.1,
vae_cond_dropout_prob: float = 0.3,
vit_cond_dropout_prob: float = 0.3,
timestep_shift: float = 1.0,
pad_token_id: int = 151643,
llm_config: Union[Dict[str, Any], transformers.Qwen2Config, None] = None,
vit_config: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
vae_config: Union[Dict[str, Any], None] = None,
**kwargs: Any,
)#

Bases: transformers.configuration_utils.PretrainedConfig

Top-level BAGEL config.

The text and vision sub-configs are nested :class:PretrainedConfig instances (not bare dicts) so callers can introspect them the same way they would with any other HF config.

Attribute aliases:

  • text_config <-> llm_config (both point at the same object)

  • vision_config <-> vit_config (ditto)

Initialization

model_type#

‘bagel’

property llm_config: transformers.Qwen2Config#
property vit_config: Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]#
to_dict() Dict[str, Any]#