Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Performance#

Traning Performace Results#

We measured the end to end training time of DreamFusions models on RTX-A6000-Ada and H100 cards, Using The following parameters:

Automated Mixed Precision (AMP) for FP16 computation.
The DreamFusion model was trained for 10,000 iterations, 2,000 iterations on the latent space and 8,000 iterations on the RGB space.
DreamFusion-DMTet was finetuned for 5,000 iterations.

Please note that the code provides multiple backend for NeRF, stable diffusion and renderers that were not covered in this table.

Model	GPU Model	Num GPUs	Batch Size Per GPU	NeRF backend	Rendering backend	Stable Diffusion backend	Train time [sec]
DreamFusion	H100	1	1	TorchNGP	TorchNGP	NeMo	1327 (*)
DreamFusion	RTX A6000	1	1	TorchNGP	TorchNGP	NeMo	990
DreamFusion-DMTet	H100	1	1	TorchNGP	TorchNGP	NeMo	699 (*)
DreamFusion-DMTet	RTX A6000	1	1	TorchNGP	TorchNGP	NeMo	503

Note

There is a performance bug with UNet attention layers that is affecting H100 performance. This issue will be solved in an upcoming release.