InstructPix2Pix introduces a method for editing images based on human-written instructions. Given an input image and a textual directive, the model follows these instructions to modify the image accordingly.

NeMo Multimodal offers a training pipeline for conditional diffusion models using the edit dataset. Additionally, we provide a tool that generates modified images based on user-written instructions during the inference process.




Data parallelism Yes N/A
Tensor parallelism No No
Pipeline parallelism No No
Sequence parallelism No No
Activation checkpointing No No
FP32/TF32 Yes Yes (FP16 enabled by default)
AMP/FP16 Yes Yes
AMP/BF16 Yes No
BF16 O2 No No
TransformerEngine/FP8 No No
Multi-GPU Yes Yes
Multi-Node Yes Yes
Inference deployment N/A NVIDIA Triton supported
SW stack support Slurm DeepOps/Base Command Manager/Base Command Platform Slurm DeepOps/Base Command Manager/Base Command Platform
NVfuser No N/A
Distributed Optimizer No N/A
TorchInductor Yes N/A
Flash Attention Yes N/A
