nemo_automodel.components.models.bagel.configuration

View as Markdown

Configuration for the BAGEL mixed-modal LLM.

The config wraps a text config and a vision config. The flags visual_und / visual_gen gate which paths are built at init time:

  • Stage 1 (understanding-only): visual_und=True, visual_gen=False. Only the ViT + connector + LM path is active.
  • Stage 2 (joint): visual_gen=True. Activates the MoT *_moe_gen parameter siblings, the VAE encode path, and the flow-matching head.

The checkpoint config names the nested configs llm_config / vit_config. AM prefers text_config / vision_config to match the rest of the VLM tree. We accept both sets of keys on input and expose both attributes on the instance so that:

  • BagelConfig.from_pretrained("ByteDance-Seed/BAGEL-7B-MoT") works against the checkpoint config.json.
  • AM-native YAML (_target_: nemo_automodel...BagelConfig) can use the AM-flavored key names without surprises.

Module Contents

Classes

NameDescription
BagelConfigTop-level BAGEL config.

Functions

NameDescription
_coerce_text_configCoerce cfg into a Qwen2Config with BAGEL’s extra attributes set.
_coerce_vision_configCoerce cfg into a SiglipVisionConfig (our rope-flag variant).

API

class nemo_automodel.components.models.bagel.configuration.BagelConfig(
vision_config: typing.Union[typing.Dict[str, typing.Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
text_config: typing.Union[typing.Dict[str, typing.Any], transformers.Qwen2Config, None] = None,
visual_und: bool = True,
visual_gen: bool = False,
stage: typing.Union[int, str, None] = None,
llm_path: str = '',
vit_path: str = '',
vae_path: str = '',
max_latent_size: int = 64,
latent_patch_size: int = 2,
vit_patch_size: int = 14,
vit_max_num_patch_per_side: int = 70,
connector_act: str = 'gelu_pytorch_tanh',
interpolate_pos: bool = False,
vit_select_layer: int = -2,
vit_rope: bool = False,
text_cond_dropout_prob: float = 0.1,
vae_cond_dropout_prob: float = 0.3,
vit_cond_dropout_prob: float = 0.3,
timestep_shift: float = 1.0,
pad_token_id: int = 151643,
llm_config: typing.Union[typing.Dict[str, typing.Any], transformers.Qwen2Config, None] = None,
vit_config: typing.Union[typing.Dict[str, typing.Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
vae_config: typing.Union[typing.Dict[str, typing.Any], None] = None,
kwargs: typing.Any = {}
)

Bases: PretrainedConfig

Top-level BAGEL config.

The text and vision sub-configs are nested :class:PretrainedConfig instances (not bare dicts) so callers can introspect them the same way they would with any other HF config.

Attribute aliases:

  • text_config <-> llm_config (both point at the same object)
  • vision_config <-> vit_config (ditto)
architectures
= ['BagelForUnifiedMultimodal']
llm_config
Qwen2Config
model_type
= 'bagel'
text_config
= _coerce_text_config(text_config)
vae_config
= vae_config or {}
vision_config
= _coerce_vision_config(vision_config)
vit_config
Optional[SiglipVisionConfig]
nemo_automodel.components.models.bagel.configuration.BagelConfig.to_dict() -> typing.Dict[str, typing.Any]
nemo_automodel.components.models.bagel.configuration._coerce_text_config(
cfg: typing.Union[typing.Dict[str, typing.Any], transformers.Qwen2Config, None]
) -> transformers.Qwen2Config

Coerce cfg into a Qwen2Config with BAGEL’s extra attributes set.

BAGEL adds three attributes to Qwen2Config that aren’t part of stock transformers:

  • qk_norm (bool, default True for BAGEL-7B-MoT)
  • layer_module ("Qwen2DecoderLayer" or "Qwen2MoTDecoderLayer")
  • freeze_und (bool, default False)

We also ensure pad_token_id is populated. Some checkpoint configs omit it, and transformers 5.x raises AttributeError on missing config attrs.

nemo_automodel.components.models.bagel.configuration._coerce_vision_config(
cfg: typing.Union[typing.Dict[str, typing.Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None]
) -> typing.Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]

Coerce cfg into a SiglipVisionConfig (our rope-flag variant).