SGLang Video Generation
Overview
This guide shows how to benchmark text-to-video generation APIs using SGLang and AIPerf. You’ll learn how to set up the SGLang video generation server, create input prompts, run benchmarks, and analyze the results.
Video generation follows an asynchronous job pattern:
- Submit - POST to
/v1/videoswith your prompt, receive a job ID - Poll - GET
/v1/videos/{id}until status iscompletedorfailed - Download - GET
/v1/videos/{id}/contentto retrieve the generated video
AIPerf handles this polling workflow automatically.
References
For the most up-to-date information, please refer to the following resources:
- SGLang Video Generation API
- SGLang Diffusion Installation Guide
- SGLang CLI Reference
- OpenAI Videos API
Supported Models
AIPerf supports any SGLang-compatible text-to-video model, including:
Setting Up the Server
Option 1: Docker (Recommended)
Export your Hugging Face token as an environment variable:
Start the SGLang Docker container:
The following steps are to be performed inside the SGLang Docker container.
Install the diffusion dependencies:
Set the server arguments:
The following arguments set up the SGLang server to use Wan2.1-T2V-1.3B on port 30010.
Adjust --num-gpus, --ulysses-degree, and --ring-degree based on your GPU configuration.
Single GPU setup:
Multi-GPU setup (4 GPUs with sequence parallelism):
Start the SGLang server:
Wait until the server is ready (watch the logs for the following message):
Option 2: Native Installation
Install SGLang with diffusion support:
Start the server:
Running the Benchmark
The following steps are to be performed on your local machine (outside the SGLang Docker container).
Basic Usage: Text-to-Video with Input File
Create an input file with video prompts:
Run the benchmark:
Done! This sends 3 requests to http://localhost:30010/v1/videos and polls until each video is complete.
Sample Output (Successful Run):
Basic Usage: Text-to-Video with Synthetic Prompts
Generate videos using synthetic prompts with configurable token lengths:
Generation Parameters
Control video generation through --extra-inputs:
Video Download Option:
Use --download-video-content to include video content download in the benchmark timing. When enabled, request latency includes the time to download the generated video from the server. By default, only generation time is measured.
Example with advanced parameters:
Polling Configuration
AIPerf automatically handles polling for video generation. Configure polling behavior:
Example with custom timeout and polling interval:
Advanced Usage: Extracting Generated Videos
To extract and save the generated videos, use --export-level raw to capture the full response payloads.
Run the benchmark with raw export:
Download the generated videos:
The response contains a URL to download the video. Copy the following script to download_videos.py:
Run the script:
Output:
Benchmark Scenarios
Scenario 1: Throughput Testing
Test maximum throughput with multiple concurrent requests:
Scenario 2: Latency Testing
Test single-request latency for different video sizes:
Scenario 3: Quality vs Speed Trade-off
Compare generation quality at different inference step counts:
Troubleshooting
Connection Refused
If you see Connection refused errors:
- Verify the SGLang server is running:
curl http://localhost:30010/health - Check the port matches your server configuration
- If using Docker, ensure port mapping is correct (
-p 30010:30010)
Timeout Errors
If requests time out during generation:
- Increase the request timeout:
--request-timeout-seconds 1200 - Check server logs for errors
- Reduce video resolution or duration for faster generation
Out of Memory
If the server crashes with OOM errors:
- Use a smaller model (e.g., Wan2.1-T2V-1.3B instead of 14B)
- Reduce video resolution:
--extra-inputs "size:720x480" - Enable CPU offloading:
--text-encoder-cpu-offload - Reduce concurrency:
--concurrency 1
Model Not Found
If you see model loading errors:
- Verify your Hugging Face token has access to the model
- Check the model path is correct
- Ensure sufficient disk space for model download
Response Fields
The video generation API returns the following fields:
Conclusion
You’ve successfully set up SGLang for video generation, run benchmarks with AIPerf, and learned how to download the generated videos. You can now experiment with different models, prompts, resolutions, and generation parameters to optimize your text-to-video workloads.
Key takeaways:
- Use
--endpoint-type video_generation - Control video parameters via
--extra-inputs - The transport handles polling automatically
- Use
--export-level rawto capture full responses for video extraction
Now go forth and generate!