Synthetic Video Generation
AIPerf supports synthetic video generation for benchmarking multimodal models that process video inputs. This feature allows you to generate videos with different patterns, resolutions, frame rates, and durations to simulate various video understanding workloads.
Prerequisites
Video generation requires FFmpeg to be installed on your system.
Installing FFmpeg
Ubuntu/Debian:
macOS (with Homebrew):
Fedora/RHEL/CentOS:
Windows (with Chocolatey):
Overview
The synthetic video feature provides:
- Multiple synthesis types (moving shapes, grid clock, noise patterns)
- Configurable resolution, frame rate, and duration
- Hardware-accelerated encoding options (CPU and GPU codecs)
- Embedded synthetic audio tracks for video+audio multimodal benchmarking
- Base64-encoded video output for API requests
- MP4 and WebM format support
Basic Usage
Example: Basic Video Generation
Generate videos at 640x480 with default temporal settings (4 fps, 5 seconds):
Note: Video generation is disabled by default (width and height are unset). You must specify both width and height to enable video generation.
Sample Output (Successful Run):
Configuration Options
Video Dimensions
Control the resolution of generated videos:
Frame Rate and Duration
Adjust temporal properties:
Parameters:
--video-fps: Frames per second (default: 4, recommended for models like Cosmos)--video-duration: Clip duration in seconds (default: 5.0)
Synthesis Types
AIPerf supports three built-in video patterns:
1. Moving Shapes (Default)
Generates videos with animated geometric shapes moving across the screen:
Features:
- Multiple colored shapes (circles and rectangles)
- Smooth motion patterns
- Wrapping at screen edges
- Black background
2. Grid Clock
Generates videos with a grid pattern and clock-like animation:
Features:
- Grid overlay
- Animated clock hands (hour and minute)
- Dark gray background
- Frame number overlay
3. Noise
Generates videos with random noise pixels in each frame:
Features:
- Random RGB pixel values per frame
- Deterministic output via seeded RNG
- Maximum entropy content for codec stress testing
Advanced Configuration
Video Codec Selection
Choose encoding codec based on your hardware and requirements:
CPU Encoding (Default)
Available CPU Codecs:
libvpx-vp9: VP9 encoding, BSD-licensed (default, WebM format)libx264: H.264 encoding, GPL-licensed, widely compatible (MP4 format)libx265: H.265 encoding, GPL-licensed, smaller file sizes, slower encoding (MP4 format)
GPU Encoding (NVIDIA)
For faster encoding with NVIDIA GPUs:
Available NVIDIA GPU Codecs:
h264_nvenc: H.264 GPU encodinghevc_nvenc: H.265 GPU encoding, smaller files
Batch Size
Control the number of videos per request:
Embedded Audio Track
AIPerf can embed a synthetic audio track into generated videos for benchmarking multimodal models that process video+audio inputs together. When enabled, a Gaussian noise audio signal matching the video duration is muxed into each video file via FFmpeg.
Audio embedding is disabled by default to maintain backward compatibility and minimize file size for video-only workloads.
Enabling Audio
Set --video-audio-num-channels to 1 (mono) or 2 (stereo) to embed an audio track:
This generates videos with a mono, 44100 Hz audio track using an auto-selected codec (libvorbis for WebM, aac for MP4).
Audio Parameters
Audio Codec Selection
When --video-audio-codec is not specified, the codec is automatically selected based on the video format:
You can override the auto-selection with an explicit codec:
Stereo Audio with Custom Sample Rate
How It Works
- A Gaussian noise audio signal is generated matching the video duration
- The audio is encoded as 16-bit PCM WAV
- FFmpeg muxes the video and audio streams together using
-shortestto ensure duration alignment - The audio codec converts the WAV data to the target format (AAC, Vorbis, or Opus)
- The resulting video+audio file is base64-encoded for API requests
The audio generation uses a deterministic RNG seed (dataset.video.audio), so videos with audio are reproducible across runs when using --random-seed.
Audio Size Impact
Factors affecting audio contribution to file size:
- Sample rate: 48000 Hz produces ~9% more data than 44100 Hz
- Channels: Stereo (2) doubles audio data compared to mono (1)
- Codec: Vorbis and Opus provide better compression than AAC at lower bitrates
- Duration: Audio size scales linearly with video duration
For most benchmarking scenarios, the audio track adds minimal overhead compared to the video stream.
Example Workflows
Example 1: Low-Resolution Video Understanding
Benchmark with small, low-framerate videos:
Use case: Testing lightweight video processing or mobile-optimized models.
Example 2: HD Video Benchmarking
Test with high-resolution, longer videos:
Use case: Stress testing with high-quality video inputs.
Example 3: Mixed Text and Video
Combine video with text prompts for multimodal testing:
Use case: Simulating video question-answering or video captioning workloads.
Example 4: Video + Audio Multimodal
Benchmark models that process both video and audio streams:
Use case: Testing video+audio understanding models (e.g., video QA with spoken audio, meeting transcription with video context).
Example 5: Video + Audio with MP4 and Stereo
Test with MP4 format and stereo audio for maximum compatibility:
Use case: Simulating real-world video files with stereo audio tracks for production-like multimodal workloads.
Example 6: Rapid Short Clips
Test with many short video clips:
Use case: Testing throughput with brief video clips.
Format and Output
Video Format
AIPerf supports both WebM (default) and MP4 formats:
WebM format (default):
MP4 format:
Data Encoding
Generated videos are automatically:
- Encoded using the specified codec
- Converted to base64 strings
- Embedded in API request payloads
This allows seamless integration with vision-language model APIs that accept base64-encoded video content.
Performance Considerations
Encoding Performance
- CPU codecs (
libvpx-vp9,libx264,libx265): Slower but universally available - GPU codecs (
h264_nvenc,hevc_nvenc): Much faster, requires NVIDIA GPU - Higher resolution and frame rates increase encoding time
Video Size Impact
Factors affecting video file size:
- Resolution: Higher dimensions = larger files
- Duration: Longer videos = larger files
- Frame rate: More frames = larger files
- Codec: H.265/HEVC produces smaller files than H.264
Recommendations
- For high-throughput testing: Use lower resolutions (320x240 or 640x480) and GPU encoding
- For quality testing: Use higher resolutions (1920x1080) with appropriate concurrency limits
- For API payload testing: Match your production video specifications
- For development: Start with small dimensions and short durations
Troubleshooting
FFmpeg Not Found
If you see an error about FFmpeg not being installed:
Follow the installation instructions in the Prerequisites section.
GPU Codec Not Available
If NVIDIA GPU codecs fail:
Solutions:
- Verify NVIDIA GPU is available:
nvidia-smi - Check FFmpeg was compiled with NVENC support:
ffmpeg -encoders | grep nvenc - Fall back to CPU codec:
--video-codec libvpx-vp9 --video-format webmor--video-codec libx264 --video-format mp4
Out of Memory
For high-resolution or long-duration videos:
- Reduce
--video-widthand--video-height - Decrease
--video-duration - Lower
--concurrency
CLI Reference
All video-related parameters at a glance:
Video Parameters
Audio Parameters
Summary
The synthetic video generation feature enables comprehensive benchmarking of video understanding models with:
- Flexible video parameters (resolution, frame rate, duration)
- Multiple synthesis patterns for variety
- Hardware-accelerated encoding options
- Optional embedded audio tracks for video+audio multimodal workloads
- Easy integration with multimodal APIs
Use synthetic videos to test your model’s performance across different video characteristics without requiring large video datasets.