User Guide (Latest Version)

[ControlNet]( is a neural network structure to control diffusion models by adding extra conditions. It copys the weights of neural network blocks into a “locked” copy and a “trainable” copy. The “trainable” one learns your condition. The “locked” one preserves your model. In this way, the ControlNet can reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls.

NeMo Multimodal provides a training pipeline and example implementation for generating images based on segmentation maps. Users have the flexibility to explore other implementations using their own control input dataset and recipe.




Data parallelism Yes N/A
Tensor parallelism No No
Pipeline parallelism No No
Sequence parallelism No No
Activation checkpointing No No
FP32/TF32 Yes Yes (FP16 enabled by default)
AMP/FP16 Yes Yes
AMP/BF16 Yes No
BF16 O2 No No
TransformerEngine/FP8 No No
Multi-GPU Yes Yes
Multi-Node Yes Yes
Inference deployment N/A NVIDIA Triton supported
SW stack support Slurm DeepOps/Base Command Manager/Base Command Platform Slurm DeepOps/Base Command Manager/Base Command Platform
NVfuser No N/A
Distributed Optimizer No N/A
TorchInductor Yes N/A
Flash Attention Yes N/A
Previous Performance
Next Data Preparation
© | | | | | | |. Last updated on May 30, 2024.