Profile Image Edit (Image-to-Image) Models with AIPerf
Profile Image Edit (Image-to-Image) Models with AIPerf
Overview
This guide shows how to benchmark image-to-image (TI2I) APIs using a Docker-based server and AIPerf. You’ll learn how to:
- Set up the server (FLUX.2-Klein-4B on SGLang)
- Run the benchmark with synthetic reference images or your own input file
- View the results and extract the edited images
The endpoint follows the OpenAI Image Edit shape: prompt + reference image are POSTed to /v1/images/edits as multipart/form-data. AIPerf auto-defaults request_content_type to multipart for image_edit, so you don’t need to pass --request-content-type explicitly.
References
For the most up-to-date information, please refer to the following resources:
- OpenAI Image Edit API
- SGLang Multimodal Gen —
/v1/images/editsroute - FLUX.2-Klein-4B on Hugging Face
Setting up the server
Login to Hugging Face, and accept the terms of use for FLUX.2-Klein-4B.
Export your Hugging Face token as an environment variable:
Start the Docker container:
The following steps are to be performed inside the Docker container. lmsysorg/sglang:dev ships the diffusion stack ready to run — no extra pip install step is needed for FLUX.2-Klein-4B.
Set the server arguments:
These arguments set up FLUX.2-Klein-4B on a single GPU at port 30000. Adjust the model path, GPU count, or port to match your environment. The flags below come from upstream SGLang multimodal_gen and may change over time — treat the SGLang Multimodal Gen CLI as the source of truth if any flag here is rejected.
Start the server:
Wait until the server is ready (watch the logs for the following message):
Running the benchmark (basic usage)
Image Edit Using Synthetic Reference Images
The simplest path: AIPerf generates a synthetic reference image for every request and pairs it with a synthetic prompt. The mock image bytes are uploaded as the multipart image field — the server processes the request end-to-end just like a real one.
Done! This sends 50 requests to http://localhost:30000/v1/images/edits with multipart-encoded prompt + reference image, plus diffusion-specific extras (size, num_inference_steps, guidance_scale).
Sample Output (shape only — exact numbers will depend on your hardware):
Image Edit Using an Input File
For deterministic prompt + reference image sequences, use a JSONL input file. Each line must include both the prompt (text) and the reference image (image, a local path or URL) — the image_edit endpoint rejects turns without a reference image, and the single_turn loader does not synthesize one.
Create an input file (replace the paths/URLs with real reference images you want to edit):
Run the benchmark:
Understanding the Metrics
Image edit shares its metric set with image generation; both endpoints report image-level throughput/latency on top of the standard request-level metrics. There are no token-streaming metrics (TTFT, ITL) because the edited image is returned as a single response.
The first request typically pays a torch.compile cold-start cost (multiple seconds). Use --warmup-request-count to exclude warmup requests from the reported metrics.
Running the benchmark (advanced usage)
Use --export-level raw to capture the raw input/output payloads, which lets you extract the edited images afterwards.
Viewing the edited images
The edited images come back as base64 strings inside each response. You can reuse the same extraction script from the Image Generation tutorial — the response shape is identical. Point it at the image_edit artifacts directory:
Conclusion
You’ve set up an image-to-image diffusion server, benchmarked it with both synthetic and file-driven prompts, and seen the metric set AIPerf reports for image_edit. From here you can sweep over num_inference_steps, guidance_scale, resolution, or concurrency to map the perf trade-offs of your model and hardware.