Multimodal Models#

Introduction#

Multimodal models in this section combine understanding and generation capabilities across text and visual modalities. These model families may use custom training recipes, packed multimodal datasets, or task-specific model wrappers beyond the standard image-text-to-text fine-tuning path.

Supported Models#

Owner	Model	Architectures
ByteDance Seed	BAGEL	`BagelForUnifiedMultimodal`, `BagelForConditionalGeneration`