Multimodal Models

View as Markdown

Introduction

Multimodal models in this section combine understanding and generation capabilities across text and visual modalities. These model families may use custom training recipes, packed multimodal datasets, or task-specific model wrappers beyond the standard image-text-to-text fine-tuning path.

Supported Models

OwnerModelArchitectures
ByteDance SeedBAGELBagelForUnifiedMultimodal, BagelForConditionalGeneration