Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Performance#

Traning Performace Results#

We measured the end to end training time of DreamFusions models on RTX-A6000-Ada and H100 cards, Using The following parameters:

  • Automated Mixed Precision (AMP) for FP16 computation.

  • The DreamFusion model was trained for 10,000 iterations, 2,000 iterations on the latent space and 8,000 iterations on the RGB space.

  • DreamFusion-DMTet was finetuned for 5,000 iterations.

Please note that the code provides multiple backend for NeRF, stable diffusion and renderers that were not covered in this table.

Model

GPU Model

Num GPUs

Batch Size Per GPU

NeRF backend

Rendering backend

Stable Diffusion backend

Train time [sec]

DreamFusion

H100

1

1

TorchNGP

TorchNGP

NeMo

1327 (*)

DreamFusion

RTX A6000

1

1

TorchNGP

TorchNGP

NeMo

990

DreamFusion-DMTet

H100

1

1

TorchNGP

TorchNGP

NeMo

699 (*)

DreamFusion-DMTet

RTX A6000

1

1

TorchNGP

TorchNGP

NeMo

503

Note

There is a performance bug with UNet attention layers that is affecting H100 performance. This issue will be solved in an upcoming release.