InstructPix2Pix
InstructPix2Pix introduces a method for editing images based on human-written instructions. Given an input image and a textual directive, the model follows these instructions to modify the image accordingly.
NeMo Multimodal offers a training pipeline for conditional diffusion models using the edit dataset. Additionally, we provide a tool that generates modified images based on user-written instructions during the inference process.
Feature |
Training |
Inference |
---|---|---|
Data parallelism |
Yes |
N/A |
Tensor parallelism |
No |
No |
Pipeline parallelism |
No |
No |
Sequence parallelism |
No |
No |
Activation checkpointing |
No |
No |
FP32/TF32 |
Yes |
Yes (FP16 enabled by default) |
AMP/FP16 |
Yes |
Yes |
AMP/BF16 |
Yes |
No |
BF16 O2 |
No |
No |
TransformerEngine/FP8 |
No |
No |
Multi-GPU |
Yes |
Yes |
Multi-Node |
Yes |
Yes |
Inference deployment |
N/A |
|
SW stack support |
Slurm DeepOps/Base Command Manager/Base Command Platform |
Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser |
No |
N/A |
Distributed Optimizer |
No |
N/A |
TorchInductor |
Yes |
N/A |
Flash Attention |
Yes |
N/A |