nemo_automodel.components.models.bagel.configuration#
Configuration for the BAGEL mixed-modal LLM.
The config wraps a text config and a vision config. The flags
visual_und / visual_gen gate which paths are built at init time:
Stage 1 (understanding-only):
visual_und=True,visual_gen=False. Only the ViT + connector + LM path is active.Stage 2 (joint):
visual_gen=True. Activates the MoT*_moe_genparameter siblings, the VAE encode path, and the flow-matching head.
The checkpoint config names the nested configs llm_config / vit_config.
AM prefers text_config / vision_config to match the rest of the VLM
tree. We accept both sets of keys on input and expose both attributes on the
instance so that:
BagelConfig.from_pretrained("ByteDance-Seed/BAGEL-7B-MoT")works against the checkpointconfig.json.AM-native YAML (
_target_: nemo_automodel...BagelConfig) can use the AM-flavored key names without surprises.
Module Contents#
Classes#
Top-level BAGEL config. |
Functions#
Coerce |
|
Coerce |
API#
- nemo_automodel.components.models.bagel.configuration._coerce_text_config(
- cfg: Union[Dict[str, Any], transformers.Qwen2Config, None],
Coerce
cfginto aQwen2Configwith BAGEL’s extra attributes set.BAGEL adds three attributes to Qwen2Config that aren’t part of stock transformers:
qk_norm(bool, default True for BAGEL-7B-MoT)layer_module("Qwen2DecoderLayer"or"Qwen2MoTDecoderLayer")freeze_und(bool, default False)
We also ensure
pad_token_idis populated. Some checkpoint configs omit it, and transformers 5.x raisesAttributeErroron missing config attrs.
- nemo_automodel.components.models.bagel.configuration._coerce_vision_config(
- cfg: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None],
Coerce
cfginto aSiglipVisionConfig(ourrope-flag variant).
- class nemo_automodel.components.models.bagel.configuration.BagelConfig(
- vision_config: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
- text_config: Union[Dict[str, Any], transformers.Qwen2Config, None] = None,
- *,
- visual_und: bool = True,
- visual_gen: bool = False,
- stage: Union[int, str, None] = None,
- llm_path: str = '',
- vit_path: str = '',
- vae_path: str = '',
- max_latent_size: int = 64,
- latent_patch_size: int = 2,
- vit_patch_size: int = 14,
- vit_max_num_patch_per_side: int = 70,
- connector_act: str = 'gelu_pytorch_tanh',
- interpolate_pos: bool = False,
- vit_select_layer: int = -2,
- vit_rope: bool = False,
- text_cond_dropout_prob: float = 0.1,
- vae_cond_dropout_prob: float = 0.3,
- vit_cond_dropout_prob: float = 0.3,
- timestep_shift: float = 1.0,
- pad_token_id: int = 151643,
- llm_config: Union[Dict[str, Any], transformers.Qwen2Config, None] = None,
- vit_config: Union[Dict[str, Any], nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig, None] = None,
- vae_config: Union[Dict[str, Any], None] = None,
- **kwargs: Any,
Bases:
transformers.configuration_utils.PretrainedConfigTop-level BAGEL config.
The text and vision sub-configs are nested :class:
PretrainedConfiginstances (not bare dicts) so callers can introspect them the same way they would with any other HF config.Attribute aliases:
text_config<->llm_config(both point at the same object)vision_config<->vit_config(ditto)
Initialization
- model_type#
‘bagel’
- property llm_config: transformers.Qwen2Config#
- property vit_config: Optional[nemo_automodel.components.models.bagel.modeling_siglip_navit.SiglipVisionConfig]#
- to_dict() Dict[str, Any]#