Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Performance#
Traning Performace Results#
We measured the end to end training time of DreamFusions models on RTX-A6000-Ada and H100 cards, Using The following parameters:
Automated Mixed Precision (AMP) for FP16 computation.
The DreamFusion model was trained for 10,000 iterations, 2,000 iterations on the latent space and 8,000 iterations on the RGB space.
DreamFusion-DMTet was finetuned for 5,000 iterations.
Please note that the code provides multiple backend for NeRF, stable diffusion and renderers that were not covered in this table.
Model |
GPU Model |
Num GPUs |
Batch Size Per GPU |
NeRF backend |
Rendering backend |
Stable Diffusion backend |
Train time [sec] |
---|---|---|---|---|---|---|---|
DreamFusion |
H100 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
1327 (*) |
DreamFusion |
RTX A6000 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
990 |
DreamFusion-DMTet |
H100 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
699 (*) |
DreamFusion-DMTet |
RTX A6000 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
503 |
Note
There is a performance bug with UNet attention layers that is affecting H100 performance. This issue will be solved in an upcoming release.