Multimodal Models#

Introduction#

Multimodal models in this section combine understanding and generation capabilities across text and visual modalities. These model families may use custom training recipes, packed multimodal datasets, or task-specific model wrappers beyond the standard image-text-to-text fine-tuning path.

Supported Models#

Owner

Model

Architectures

ByteDance Seed

BAGEL

BagelForUnifiedMultimodal, BagelForConditionalGeneration