ControlNet

[ControlNet](https://github.com/lllyasviel/ControlNet) is a neural network structure to control diffusion models by adding extra conditions. It copys the weights of neural network blocks into a “locked” copy and a “trainable” copy. The “trainable” one learns your condition. The “locked” one preserves your model. In this way, the ControlNet can reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls.

NeMo Multimodal provides a training pipeline and example implementation for generating images based on segmentation maps. Users have the flexibility to explore other implementations using their own control input dataset and recipe.

Feature

Training

Inference

Data parallelism

Yes

N/A

Tensor parallelism

No

No

Pipeline parallelism

No

No

Sequence parallelism

No

No

Activation checkpointing

No

No

FP32/TF32

Yes

Yes (FP16 enabled by default)

AMP/FP16

Yes

Yes

AMP/BF16

Yes

No

BF16 O2

No

No

TransformerEngine/FP8

No

No

Multi-GPU

Yes

Yes

Multi-Node

Yes

Yes

Inference deployment

N/A

NVIDIA Triton supported

SW stack support

Slurm DeepOps/Base Command Manager/Base Command Platform

Slurm DeepOps/Base Command Manager/Base Command Platform

NVfuser

No

N/A

Distributed Optimizer

No

N/A

TorchInductor

Yes

N/A

Flash Attention

Yes

N/A