[ControlNet](https://github.com/lllyasviel/ControlNet) is a neural network structure to control diffusion models by adding extra conditions. It copys the weights of neural network blocks into a “locked” copy and a “trainable” copy. The “trainable” one learns your condition. The “locked” one preserves your model. In this way, the ControlNet can reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls.
NeMo Multimodal provides a training pipeline and example implementation for generating images based on segmentation maps. Users have the flexibility to explore other implementations using their own control input dataset and recipe.
Feature |
Training |
Inference |
---|---|---|
Data parallelism | Yes | N/A |
Tensor parallelism | No | No |
Pipeline parallelism | No | No |
Sequence parallelism | No | No |
Activation checkpointing | No | No |
FP32/TF32 | Yes | Yes (FP16 enabled by default) |
AMP/FP16 | Yes | Yes |
AMP/BF16 | Yes | No |
BF16 O2 | No | No |
TransformerEngine/FP8 | No | No |
Multi-GPU | Yes | Yes |
Multi-Node | Yes | Yes |
Inference deployment | N/A | NVIDIA Triton supported |
SW stack support | Slurm DeepOps/Base Command Manager/Base Command Platform | Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser | No | N/A |
Distributed Optimizer | No | N/A |
TorchInductor | Yes | N/A |
Flash Attention | Yes | N/A |