Performance
Traning Performace Results
We measured the end to end training time of DreamFusions models on RTX-A6000-Ada and H100 cards, Using The following parameters:
Automated Mixed Precision (AMP) for FP16 computation.
The DreamFusion model was trained for 10,000 iterations, 2,000 iterations on the latent space and 8,000 iterations on the RGB space.
DreamFusion-DMTet was finetuned for 5,000 iterations.
Please note that the code provides multiple backend for NeRF, stable diffusion and renderers that were not covered in this table.
Model |
GPU Model |
Num GPUs |
Batch Size Per GPU |
NeRF backend |
Rendering backend |
Stable Diffusion backend |
Train time [sec] |
---|---|---|---|---|---|---|---|
DreamFusion |
H100 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
1327 (*) |
DreamFusion |
RTX A6000 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
990 |
DreamFusion-DMTet |
H100 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
699 (*) |
DreamFusion-DMTet |
RTX A6000 |
1 |
1 |
TorchNGP |
TorchNGP |
NeMo |
503 |
Note
There is a performance bug with UNet attention layers that is affecting H100 performance. This issue will be solved in an upcoming release.