Performance

InstructPix2Pix is an image editing tool that transforms original images based on user instructions. For example, when provided with a photo of Toy Jensen, the AI can seamlessly edit the image according to your creative vision.

Here are some examples generated using our NeMo Stable Diffusion 1.2 model, fine-tuned with NeMo InstructPix2Pix. For each instruction, we showcase 8 distinct images generated from different seeds:

  • Original image

    tjjingle-1280x712.png

  • Instruction: Add fireworks to the background

    add_fireworks_to_the_background_7.5_1.2_1234_combine.jpg

  • Instruction: Make it on a beach

    make_it_in_on_a_beach_7.5_1.2_1234_combine.jpg

  • Instruction: Make it Van Gogh style

    make_it_Van_Gogh_style_7.5_1.2_1234_combine.jpg

Latency times are started directly before the text encoding (CLIP) and stopped directly after the output image decoding (VAE). For framework we use the Torch Automated Mixed Precision (AMP) for FP16 computation. For TRT, we export the various models with the FP16 acceleration. We use the optimized TRT engine setup present in the deployment directory to get the numbers in the same environment as the framework.

GPU: NVIDIA DGX A100 (1x A100 80 GB) Batch Size: Synonymous with num_images_per_prompt

Model

Batch Size

Sampler

Inference Steps

TRT FP 16 Latency (s)

FW FP 16 (AMP) Latency (s)

TRT vs FW Speedup (x)

InstructPix2Pix (Res=256) 1 N/A 100 1.0 3.6 3.6
InstructPix2Pix (Res=256) 2 N/A 100 1.3 3.7 2.8
InstructPix2Pix (Res=256) 4 N/A 100 2.2 4.9 2.2
Previous Model Deployment
Next Imagen
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.