Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Framework Inference

For text-to-image models, the inference script generates images from text prompts defined in the config file.

To enable the inference stage with Imagen, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the fw_inference field to point to the desired Stable Diffusion inference configuration file. For example, if you want to use the imagen/text2img.yaml configuration, change the fw_inference field to imagen/text2img.

    defaults:
       - fw_inference: imagen/text2img
       ...
    
  2. In the stages field of conf/config.yaml, make sure the fw_inference stage is included. For example,

    stages:
       - fw_inference
       ...
    
  3. Configure infer.texts and infer.num_images_per_prompt fields of conf/fw_inference/imagen/text2img.yaml. Set model.customized_model.base_ckpt&sr256_ckpt&sr1024_ckpt to the .nemo ckpt you want generate images with. Set infer.target_resolution to the desired resolution.

Remarks:

We provide both DDPM and EDM sampler. We recommend for EDM training, at least 30 steps of inference is required; for DDPM training, at least 250 steps of inference is required.