Multimodal Models#
Introduction#
Multimodal models in this section combine understanding and generation capabilities across text and visual modalities. These model families may use custom training recipes, packed multimodal datasets, or task-specific model wrappers beyond the standard image-text-to-text fine-tuning path.
Supported Models#
Owner |
Model |
Architectures |
|---|---|---|
ByteDance Seed |
|