Imagen is a multi-stage text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Given a text prompt, Imagen first generates an image at a 64x64 resolution and then upsamples the generated image to 256x256 and 1024x1024 resolutions, all using diffusion models.
Feature |
Training |
Inference |
---|---|---|
Data parallelism | Yes | N/A |
Tensor parallelism | Yes | Yes |
Pipeline parallelism | No | No |
Sequence parallelism | No | No |
Activation checkpointing | Yes (Uniform or Block) | No |
FP32/TF32 | Yes | Yes (FP16 enabled by default) |
AMP/FP16 | No | Yes |
AMP/BF16 | Yes | No |
BF16 O2 | Yes | No |
TransformerEngine/FP8 | No | No |
Multi-GPU | Yes | Yes |
Multi-Node | Yes | Yes |
Inference deployment | N/A | NVIDIA Triton supported |
SW stack support | Slurm DeepOps/Base Command Manager/Base Command Platform | Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser | No | N/A |
Distributed Optimizer | No | N/A |
TorchInductor | No | N/A |
Flash Attention | Yes | N/A |