Imagen

Imagen is a multi-stage text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Given a text prompt, Imagen first generates an image at a 64x64 resolution and then upsamples the generated image to 256x256 and 1024x1024 resolutions, all using diffusion models.

imagen_arch.png


Feature

Training

Inference

Data parallelism Yes N/A
Tensor parallelism Yes Yes
Pipeline parallelism No No
Sequence parallelism No No
Activation checkpointing Yes (Uniform or Block) No
FP32/TF32 Yes Yes (FP16 enabled by default)
AMP/FP16 No Yes
AMP/BF16 Yes No
BF16 O2 Yes No
TransformerEngine/FP8 No No
Multi-GPU Yes Yes
Multi-Node Yes Yes
Inference deployment N/A NVIDIA Triton supported
SW stack support Slurm DeepOps/Base Command Manager/Base Command Platform Slurm DeepOps/Base Command Manager/Base Command Platform
NVfuser No N/A
Distributed Optimizer No N/A
TorchInductor No N/A
Flash Attention Yes N/A
Previous Performance
Next Data Preparation
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.