This guide shows how to benchmark text-to-video generation APIs using SGLang and AIPerf. You’ll learn how to set up the SGLang video generation server, create input prompts, run benchmarks, and analyze the results.
Video generation follows an asynchronous job pattern:
/v1/videos with your prompt, receive a job ID/v1/videos/{id} until status is completed or failed/v1/videos/{id}/content to retrieve the generated videoAIPerf handles this polling workflow automatically.
For the most up-to-date information, please refer to the following resources:
AIPerf supports any SGLang-compatible text-to-video model, including:
Export your Hugging Face token as an environment variable:
Start the SGLang Docker container:
Install the diffusion dependencies:
Set the server arguments:
The following arguments set up the SGLang server to use Wan2.1-T2V-1.3B on port 30010.
Adjust --num-gpus, --ulysses-degree, and --ring-degree based on your GPU configuration.
Single GPU setup:
Multi-GPU setup (4 GPUs with sequence parallelism):
Start the SGLang server:
Wait until the server is ready (watch the logs for the following message):
Install SGLang with diffusion support:
Start the server:
The following steps are to be performed on your local machine (outside the SGLang Docker container).
Create an input file with video prompts:
Run the benchmark:
Done! This sends 3 requests to http://localhost:30010/v1/videos and polls until each video is complete.
Sample Output (Successful Run):
Generate videos using synthetic prompts with configurable token lengths:
Control video generation through --extra-inputs:
Video Download Option:
Use --download-video-content to include video content download in the benchmark timing. When enabled, request latency includes the time to download the generated video from the server. By default, only generation time is measured.
Example with advanced parameters:
AIPerf automatically handles polling for video generation. Configure polling behavior:
Example with custom timeout and polling interval:
To extract and save the generated videos, use --export-level raw to capture the full response payloads.
Run the benchmark with raw export:
Download the generated videos:
The response contains a URL to download the video. Copy the following script to download_videos.py:
Run the script:
Output:
Test maximum throughput with multiple concurrent requests:
Test single-request latency for different video sizes:
Compare generation quality at different inference step counts:
If you see Connection refused errors:
curl http://localhost:30010/health-p 30010:30010)If requests time out during generation:
--request-timeout-seconds 1200If the server crashes with OOM errors:
--extra-inputs "size:720x480"--text-encoder-cpu-offload--concurrency 1If you see model loading errors:
The video generation API returns the following fields:
You’ve successfully set up SGLang for video generation, run benchmarks with AIPerf, and learned how to download the generated videos. You can now experiment with different models, prompts, resolutions, and generation parameters to optimize your text-to-video workloads.
Key takeaways:
--endpoint-type video_generation--extra-inputs--export-level raw to capture full responses for video extractionNow go forth and generate!