InstructPix2Pix introduces a method for editing images based on human-written instructions. Given an input image and a textual directive, the model follows these instructions to modify the image accordingly.
NeMo Multimodal offers a training pipeline for conditional diffusion models using the edit dataset. Additionally, we provide a tool that generates modified images based on user-written instructions during the inference process.
Feature |
Training |
Inference |
---|---|---|
Data parallelism | Yes | N/A |
Tensor parallelism | No | No |
Pipeline parallelism | No | No |
Sequence parallelism | No | No |
Activation checkpointing | No | No |
FP32/TF32 | Yes | Yes (FP16 enabled by default) |
AMP/FP16 | Yes | Yes |
AMP/BF16 | Yes | No |
BF16 O2 | No | No |
TransformerEngine/FP8 | No | No |
Multi-GPU | Yes | Yes |
Multi-Node | Yes | Yes |
Inference deployment | N/A | NVIDIA Triton supported |
SW stack support | Slurm DeepOps/Base Command Manager/Base Command Platform | Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser | No | N/A |
Distributed Optimizer | No | N/A |
TorchInductor | Yes | N/A |
Flash Attention | Yes | N/A |