Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Performance

Training Quality Results

Here we show some interesting results as an example of the dreambooth script.

Prompt: A ‘sks’ dog in a bucket.

../../../_images/Dreambooth_dog_in_a_bucket.png

Prompt: A ‘sks’ dog in Acropolis.

../../../_images/Dreambooth_dog_at_Acropolis.png

Prompt: A ‘sks’ dog in front of Eiffel tower.

../../../_images/Dreambooth_Eiffel_towel.png

Prompt: A ‘sks’ dog mecha robot.

../../../_images/Dreambooth_mecha_robot.png

The original source of images used for the above results are from link and is subject to the following license.

Inference Performance Results

Latency times are started directly before the text encoding (CLIP) and stopped directly after the output image decoding (VAE). For framework we use the Torch Automated Mixed Precision (AMP) for FP16 computation. For TRT, we export the various models with the FP16 acceleration. We use the optimized TRT engine setup present in the deployment directory to get the numbers in the same environment as the framework.

GPU: NVIDIA DGX A100 (1x A100 80 GB) Batch Size: Synonymous with num_images_per_prompt

Model

Batch Size

Sampler

Inference Steps

TRT FP 16 Latency (s)

FW FP 16 (AMP) Latency (s)

TRT vs FW Speedup (x)

DreamBooth (Res=256)

1

DDIM

100

2.0

5.6

2.8

DreamBooth (Res=256)

2

DDIM

100

3.1

9.0

2.9

DreamBooth (Res=256)

4

DDIM

100

5.7

16.0

2.8