AIPerf supports synthetic video generation for benchmarking multimodal models that process video inputs. This feature allows you to generate videos with different patterns, resolutions, frame rates, and durations to simulate various video understanding workloads.
Video generation requires FFmpeg to be installed on your system.
Ubuntu/Debian:
macOS (with Homebrew):
Fedora/RHEL/CentOS:
Windows (with Chocolatey):
The synthetic video feature provides:
Generate videos at 640x480 with default temporal settings (4 fps, 5 seconds):
Note: Video generation is disabled by default (width and height are unset). You must specify both width and height to enable video generation.
Sample Output (Successful Run):
Control the resolution of generated videos:
Adjust temporal properties:
Parameters:
--video-fps: Frames per second (default: 4, recommended for models like Cosmos)--video-duration: Clip duration in seconds (default: 5.0)AIPerf supports three built-in video patterns:
Generates videos with animated geometric shapes moving across the screen:
Features:
Generates videos with a grid pattern and clock-like animation:
Features:
Generates videos with random noise pixels in each frame:
Features:
Choose encoding codec based on your hardware and requirements:
Available CPU Codecs:
libvpx-vp9: VP9 encoding, BSD-licensed (default, WebM format)libx264: H.264 encoding, GPL-licensed, widely compatible (MP4 format)libx265: H.265 encoding, GPL-licensed, smaller file sizes, slower encoding (MP4 format)For faster encoding with NVIDIA GPUs:
Available NVIDIA GPU Codecs:
h264_nvenc: H.264 GPU encodinghevc_nvenc: H.265 GPU encoding, smaller filesControl the number of videos per request:
AIPerf can embed a synthetic audio track into generated videos for benchmarking multimodal models that process video+audio inputs together. When enabled, a Gaussian noise audio signal matching the video duration is muxed into each video file via FFmpeg.
Audio embedding is disabled by default to maintain backward compatibility and minimize file size for video-only workloads.
Set --video-audio-num-channels to 1 (mono) or 2 (stereo) to embed an audio track:
This generates videos with a mono, 44.1 kHz audio track using an auto-selected codec (libvorbis for WebM, aac for MP4).
When --video-audio-codec is not specified, the codec is automatically selected based on the video format:
You can override the auto-selection with an explicit codec:
-shortest to ensure duration alignmentThe audio generation uses a deterministic RNG seed (dataset.video.audio), so videos with audio are reproducible across runs when using --random-seed.
Factors affecting audio contribution to file size:
For most benchmarking scenarios, the audio track adds minimal overhead compared to the video stream.
Benchmark with small, low-framerate videos:
Use case: Testing lightweight video processing or mobile-optimized models.
Test with high-resolution, longer videos:
Use case: Stress testing with high-quality video inputs.
Combine video with text prompts for multimodal testing:
Use case: Simulating video question-answering or video captioning workloads.
Benchmark models that process both video and audio streams:
Use case: Testing video+audio understanding models (e.g., video QA with spoken audio, meeting transcription with video context).
Test with MP4 format and stereo audio for maximum compatibility:
Use case: Simulating real-world video files with stereo audio tracks for production-like multimodal workloads.
Test with many short video clips:
Use case: Testing throughput with brief video clips.
AIPerf supports both WebM (default) and MP4 formats:
WebM format (default):
MP4 format:
Generated videos are automatically:
This allows seamless integration with vision-language model APIs that accept base64-encoded video content.
libvpx-vp9, libx264, libx265): Slower but universally availableh264_nvenc, hevc_nvenc): Much faster, requires NVIDIA GPUFactors affecting video file size:
If you see an error about FFmpeg not being installed:
Follow the installation instructions in the Prerequisites section.
If NVIDIA GPU codecs fail:
Solutions:
nvidia-smiffmpeg -encoders | grep nvenc--video-codec libvpx-vp9 --video-format webm or --video-codec libx264 --video-format mp4For high-resolution or long-duration videos:
--video-width and --video-height--video-duration--concurrencyAll video-related parameters at a glance:
The synthetic video generation feature enables comprehensive benchmarking of video understanding models with:
Use synthetic videos to test your model’s performance across different video characteristics without requiring large video datasets.