Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
InstructPix2Pix#
InstructPix2Pix introduces a method for editing images based on human-written instructions. Given an input image and a textual directive, the model follows these instructions to modify the image accordingly.
NeMo Multimodal offers a training pipeline for conditional diffusion models using the edit dataset. Additionally, we provide a tool that generates modified images based on user-written instructions during the inference process.
Feature |
Training |
Inference |
---|---|---|
Data parallelism |
Yes |
N/A |
Tensor parallelism |
No |
No |
Pipeline parallelism |
No |
No |
Sequence parallelism |
No |
No |
Activation checkpointing |
No |
No |
FP32/TF32 |
Yes |
Yes (FP16 enabled by default) |
AMP/FP16 |
Yes |
Yes |
AMP/BF16 |
Yes |
No |
BF16 O2 |
No |
No |
TransformerEngine/FP8 |
No |
No |
Multi-GPU |
Yes |
Yes |
Multi-Node |
Yes |
Yes |
Inference deployment |
N/A |
|
SW stack support |
Slurm DeepOps/Base Command Manager/Base Command Platform |
Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser |
No |
N/A |
Distributed Optimizer |
No |
N/A |
TorchInductor |
Yes |
N/A |
Flash Attention |
Yes |
N/A |