Performance

We measured the end to end training time of DreamFusions models on RTX-A6000-Ada and H100 cards, Using The following parameters:

  • Automated Mixed Precision (AMP) for FP16 computation.

  • The DreamFusion model was trained for 10,000 iterations, 2,000 iterations on the latent space and 8,000 iterations on the RGB space.

  • DreamFusion-DMTet was finetuned for 5,000 iterations.

Please note that the code provides multiple backend for NeRF, stable diffusion and renderers that were not covered in this table.

Model

GPU Model

Num GPUs

Batch Size Per GPU

NeRF backend

Rendering backend

Stable Diffusion backend

Train time [sec]

DreamFusion H100 1 1 TorchNGP TorchNGP NeMo 1327 (*)
DreamFusion RTX A6000 1 1 TorchNGP TorchNGP NeMo 990
DreamFusion-DMTet H100 1 1 TorchNGP TorchNGP NeMo 699 (*)
DreamFusion-DMTet RTX A6000 1 1 TorchNGP TorchNGP NeMo 503
Note

There is a performance bug with UNet attention layers that is affecting H100 performance. This issue will be solved in an upcoming release.

Previous Training with Predefined Configurations
Next Deploying the NeMo Models in the NeMo Framework Inference Container
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.