Profile Image Edit (Image-to-Image) Models with AIPerf
Profile Image Edit (Image-to-Image) Models with AIPerf
Profile Image Edit (Image-to-Image) Models with AIPerf
This guide shows how to benchmark image-to-image (TI2I) APIs using a Docker-based server and AIPerf. You’ll learn how to:
The endpoint follows the OpenAI Image Edit shape: prompt + reference image are POSTed to /v1/images/edits as multipart/form-data. AIPerf auto-defaults request_content_type to multipart for image_edit, so you don’t need to pass --request-content-type explicitly.
For the most up-to-date information, please refer to the following resources:
/v1/images/edits routeLogin to Hugging Face, and accept the terms of use for FLUX.2-Klein-4B.
Export your Hugging Face token as an environment variable:
Start the Docker container:
The following steps are to be performed inside the Docker container. lmsysorg/sglang:dev ships the diffusion stack ready to run — no extra pip install step is needed for FLUX.2-Klein-4B.
Set the server arguments:
These arguments set up FLUX.2-Klein-4B on a single GPU at port 30000. Adjust the model path, GPU count, or port to match your environment. The flags below come from upstream SGLang multimodal_gen and may change over time — treat the SGLang Multimodal Gen CLI as the source of truth if any flag here is rejected.
Start the server:
Wait until the server is ready (watch the logs for the following message):
The simplest path: AIPerf generates a synthetic reference image for every request and pairs it with a synthetic prompt. The mock image bytes are uploaded as the multipart image field — the server processes the request end-to-end just like a real one.
Done! This sends 50 requests to http://localhost:30000/v1/images/edits with multipart-encoded prompt + reference image, plus diffusion-specific extras (size, num_inference_steps, guidance_scale).
Sample Output (shape only — exact numbers will depend on your hardware):
For deterministic prompt + reference image sequences, use a JSONL input file. Each line must include both the prompt (text) and the reference image (image, a local path or URL) — the image_edit endpoint rejects turns without a reference image, and the single_turn loader does not synthesize one.
Create an input file (replace the paths/URLs with real reference images you want to edit):
Run the benchmark:
Image edit shares its metric set with image generation; both endpoints report image-level throughput/latency on top of the standard request-level metrics. There are no token-streaming metrics (TTFT, ITL) because the edited image is returned as a single response.
The first request typically pays a torch.compile cold-start cost (multiple seconds). Use --warmup-request-count to exclude warmup requests from the reported metrics.
Use --export-level raw to capture the raw input/output payloads, which lets you extract the edited images afterwards.
The edited images come back as base64 strings inside each response. You can reuse the same extraction script from the Image Generation tutorial — the response shape is identical. Point it at the image_edit artifacts directory:
You’ve set up an image-to-image diffusion server, benchmarked it with both synthetic and file-driven prompts, and seen the metric set AIPerf reports for image_edit. From here you can sweep over num_inference_steps, guidance_scale, resolution, or concurrency to map the perf trade-offs of your model and hardware.