BAGEL
BAGEL-7B-MoT is a unified multimodal model from ByteDance Seed. It combines a Qwen2 language backbone, a SigLIP-NaViT vision encoder, and mixture-of-transformations layers for mixed understanding and visual-generation training.
Available Models
- BAGEL-7B-MoT
Architecture
BagelForUnifiedMultimodalBagelForConditionalGeneration
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo: