Imagen

Imagen is a multi-stage text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Given a text prompt, Imagen first generates an image at a 64x64 resolution and then upsamples the generated image to 256x256 and 1024x1024 resolutions, all using diffusion models.

imagen model

Feature

Training

Inference

Data parallelism

Yes

N/A

Tensor parallelism

Yes

Yes

Pipeline parallelism

No

No

Sequence parallelism

No

No

Activation checkpointing

Yes (Uniform or Block)

No

FP32/TF32

Yes

Yes (FP16 enabled by default)

AMP/FP16

No

Yes

AMP/BF16

Yes

No

BF16 O2

Yes

No

TransformerEngine/FP8

No

No

Multi-GPU

Yes

Yes

Multi-Node

Yes

Yes

Inference deployment

N/A

NVIDIA Triton supported

SW stack support

Slurm DeepOps/Base Command Manager/Base Command Platform

Slurm DeepOps/Base Command Manager/Base Command Platform

NVfuser

No

N/A

Distributed Optimizer

No

N/A

TorchInductor

No

N/A

Flash Attention

Yes

N/A