Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

InstructPix2Pix#

InstructPix2Pix introduces a method for editing images based on human-written instructions. Given an input image and a textual directive, the model follows these instructions to modify the image accordingly.

NeMo Multimodal offers a training pipeline for conditional diffusion models using the edit dataset. Additionally, we provide a tool that generates modified images based on user-written instructions during the inference process.

Feature	Training	Inference
Data parallelism	Yes	N/A
Tensor parallelism	No	No
Pipeline parallelism	No	No
Sequence parallelism	No	No
Activation checkpointing	No	No
FP32/TF32	Yes	Yes (FP16 enabled by default)
AMP/FP16	Yes	Yes
AMP/BF16	Yes	No
BF16 O2	No	No
TransformerEngine/FP8	No	No
Multi-GPU	Yes	Yes
Multi-Node	Yes	Yes
Inference deployment	N/A	NVIDIA Triton supported
SW stack support	Slurm DeepOps/Base Command Manager/Base Command Platform	Slurm DeepOps/Base Command Manager/Base Command Platform
NVfuser	No	N/A
Distributed Optimizer	No	N/A
TorchInductor	Yes	N/A
Flash Attention	Yes	N/A