nemo_automodel.components.models.bagel.configuration
nemo_automodel.components.models.bagel.configuration
Configuration for the BAGEL mixed-modal LLM.
The config wraps a text config and a vision config. The flags
visual_und / visual_gen gate which paths are built at init time:
- Stage 1 (understanding-only):
visual_und=True,visual_gen=False. Only the ViT + connector + LM path is active. - Stage 2 (joint):
visual_gen=True. Activates the MoT*_moe_genparameter siblings, the VAE encode path, and the flow-matching head.
The checkpoint config names the nested configs llm_config / vit_config.
AM prefers text_config / vision_config to match the rest of the VLM
tree. We accept both sets of keys on input and expose both attributes on the
instance so that:
BagelConfig.from_pretrained("ByteDance-Seed/BAGEL-7B-MoT")works against the checkpointconfig.json.- AM-native YAML (
_target_: nemo_automodel...BagelConfig) can use the AM-flavored key names without surprises.
Module Contents
Classes
Functions
API
Bases: PretrainedConfig
Top-level BAGEL config.
The text and vision sub-configs are nested :class:PretrainedConfig
instances (not bare dicts) so callers can introspect them the same way
they would with any other HF config.
Attribute aliases:
text_config<->llm_config(both point at the same object)vision_config<->vit_config(ditto)
Coerce cfg into a Qwen2Config with BAGEL’s extra attributes set.
BAGEL adds three attributes to Qwen2Config that aren’t part of stock transformers:
qk_norm(bool, default True for BAGEL-7B-MoT)layer_module("Qwen2DecoderLayer"or"Qwen2MoTDecoderLayer")freeze_und(bool, default False)
We also ensure pad_token_id is populated. Some checkpoint configs omit
it, and transformers 5.x raises AttributeError on missing config attrs.
Coerce cfg into a SiglipVisionConfig (our rope-flag variant).