InstructPix2Pix

InstructPix2Pix introduces a method for editing images based on human-written instructions. Given an input image and a textual directive, the model follows these instructions to modify the image accordingly.

NeMo Multimodal offers a training pipeline for conditional diffusion models using the edit dataset. Additionally, we provide a tool that generates modified images based on user-written instructions during the inference process.

Feature

Training

Inference

Data parallelism

Yes

N/A

Tensor parallelism

No

No

Pipeline parallelism

No

No

Sequence parallelism

No

No

Activation checkpointing

No

No

FP32/TF32

Yes

Yes (FP16 enabled by default)

AMP/FP16

Yes

Yes

AMP/BF16

Yes

No

BF16 O2

No

No

TransformerEngine/FP8

No

No

Multi-GPU

Yes

Yes

Multi-Node

Yes

Yes

Inference deployment

N/A

NVIDIA Triton supported

SW stack support

Slurm DeepOps/Base Command Manager/Base Command Platform

Slurm DeepOps/Base Command Manager/Base Command Platform

NVfuser

No

N/A

Distributed Optimizer

No

N/A

TorchInductor

Yes

N/A

Flash Attention

Yes

N/A