Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
DreamFusion
DreamFusion uses a pretrained text-to-image diffusion model to perform text-to-3D synthesis. The model employs a loss based on probability density distillation, enabling the use of a 2D diffusion model as a prior for optimizing a parametric image generator.
Using this loss in a DeepDream-like procedure, the model optimizes a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. This approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
Remarks
Notable differences from the paper:
We use Stable Diffusion for the guidance model, while the paper uses Imagen.
The nerf model is trained in latent space for the first 20,000 iterations, then on the RGB space for the remainder of the training run.
The NeRF and renderer implementations are different from the paper, we provide multiple backends for each.
The training schedule, learning rates, optimizer and hyperparameters are also different from the paper.
This model is based on a number of research papers and open-source projects, including:
Feature |
Training |
---|---|
Data parallelism |
Yes |
Tensor parallelism |
No |
Sequence parallelism |
No |
Activation checkpointing |
Yes |
FP32/TF32 |
Yes |
AMP/BF16 |
Yes |
BF16 O2 |
No |
TransformerEngine/FP8 |
No |
Multi-GPU |
Yes |
Multi-Node |
No |
Inference deployment |
N/A |
SW stack support |
Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser |
No |
Distributed Optimizer |
No |
TorchInductor |
Yes |
Flash Attention |
Yes |
TorchNGP renderer |
Yes |
NerfAcc renderer |
Yes |
TCNN NeRF backend |
Yes |
Hugging Face Stable Diffusion backend |
Yes |
NeMo Stable Diffusion backend |
Yes |
NeMo-TRT Stable Diffusion backend |
Yes |
GUI |
No |