Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Framework Inference#
For InstructPix2Pix models, our inference script processes an original image based on a provided edit prompt, modifies the image accordingly, and saves the edited image as a new file.
To enable the inference stage with a InstructPix2Pix model, configure the configuration files:
In the
defaults
section ofconf/config.yaml
, update thefw_inference
field to point to the desired Instruct Pix2Pix configuration file. For example, if you want to use theinstruct_pix2pix/edit_cli
configuration, change thefw_inference
field toinstruct_pix2pix/edit_cli
.defaults: - fw_inference: instruct_pix2pix/edit_cli ...
In the
stages
field ofconf/config.yaml
, make sure thefw_inference
stage is included. For example,stages: - fw_inference ...
Configure the
edit
section inconf/fw_inference/instruct_pix2pix/edit_cli.yaml
. Most importantly, set theinput
field to the path of the original image for inference, and provide an edit prompt in theprompt
field. The script will generatenum_images_per_prompt
images at once based on the provided prompt.edit: resolution: 512 steps: 100 input: ??? # path/to/input/picture outpath: ${fw_inference.run.results_dir} prompt: "" cfg_text: 7.5 cfg_image: 1.2 num_images_per_prompt: 8 combine_images: [2, 4] # [row, column], set to null if don't want to combine seed: 1234
Execute the launcher pipeline:
python3 main.py
.
Remarks:
To load a pretrained checkpoint for inference, set the
restore_from_path
field in themodel
section to the path of the pretrained checkpoint in.nemo
format inconf/fw_inference/vit/imagenet1k.yaml
. By default, this field links to the.nemo
format checkpoint located in the ImageNet 1K fine-tuning checkpoints folder.We highly recommend users to use the same precision (i.e.,
trainer.precision
) for inference as was used during training.Tips for getting better quality results: timothybrooks/instruct-pix2pix