Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Framework Inference

For InstructPix2Pix models, our inference script processes an original image based on a provided edit prompt, modifies the image accordingly, and saves the edited image as a new file.

To enable the inference stage with a InstructPix2Pix model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the fw_inference field to point to the desired Instruct Pix2Pix configuration file. For example, if you want to use the instruct_pix2pix/edit_cli configuration, change the fw_inference field to instruct_pix2pix/edit_cli.

    defaults:
      - fw_inference: instruct_pix2pix/edit_cli
      ...
    
  2. In the stages field of conf/config.yaml, make sure the fw_inference stage is included. For example,

    stages:
      - fw_inference
      ...
    
  3. Configure the edit section in conf/fw_inference/instruct_pix2pix/edit_cli.yaml. Most importantly, set the input field to the path of the original image for inference, and provide an edit prompt in the prompt field. The script will generate num_images_per_prompt images at once based on the provided prompt.

    edit:
      resolution: 512
      steps: 100
      input: ??? # path/to/input/picture
      outpath: ${fw_inference.run.results_dir}
      prompt: ""
      cfg_text: 7.5
      cfg_image: 1.2
      num_images_per_prompt: 8
      combine_images: [2, 4] # [row, column], set to null if don't want to combine
      seed: 1234
    
  4. Execute the launcher pipeline: python3 main.py.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/fw_inference/vit/imagenet1k.yaml. By default, this field links to the .nemo format checkpoint located in the ImageNet 1K fine-tuning checkpoints folder.

  2. We highly recommend users to use the same precision (i.e., trainer.precision) for inference as was used during training.

  3. Tips for getting better quality results: https://github.com/timothybrooks/instruct-pix2pix#tips