Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Framework Inference#

For InstructPix2Pix models, our inference script processes an original image based on a provided edit prompt, modifies the image accordingly, and saves the edited image as a new file.

To enable the inference stage with a InstructPix2Pix model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the fw_inference field to point to the desired Instruct Pix2Pix configuration file. For example, if you want to use the instruct_pix2pix/edit_cli configuration, change the fw_inference field to instruct_pix2pix/edit_cli.

    defaults:
      - fw_inference: instruct_pix2pix/edit_cli
      ...
    
  2. In the stages field of conf/config.yaml, make sure the fw_inference stage is included. For example,

    stages:
      - fw_inference
      ...
    
  3. Configure the edit section in conf/fw_inference/instruct_pix2pix/edit_cli.yaml. Most importantly, set the input field to the path of the original image for inference, and provide an edit prompt in the prompt field. The script will generate num_images_per_prompt images at once based on the provided prompt.

    edit:
      resolution: 512
      steps: 100
      input: ??? # path/to/input/picture
      outpath: ${fw_inference.run.results_dir}
      prompt: ""
      cfg_text: 7.5
      cfg_image: 1.2
      num_images_per_prompt: 8
      combine_images: [2, 4] # [row, column], set to null if don't want to combine
      seed: 1234
    
  4. Execute the launcher pipeline: python3 main.py.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/fw_inference/vit/imagenet1k.yaml. By default, this field links to the .nemo format checkpoint located in the ImageNet 1K fine-tuning checkpoints folder.

  2. We highly recommend users to use the same precision (i.e., trainer.precision) for inference as was used during training.

  3. Tips for getting better quality results: timothybrooks/instruct-pix2pix