Overview#

The NVIDIA Synthetic Video Detector NIM is a GPU-accelerated microservice that analyzes video content and estimates the likelihood that a video has been AI-generated (“synthetic”) versus real. The service consumes an input video and produces per-frame detection scores along with an aggregated video-level probability, enabling reliable assessment at both fine-grained and holistic levels.

The service employs a forensic-oriented detection strategy that focuses on identifying intrinsic artifacts introduced by generative video pipelines. Rather than relying on semantic content, the detector analyzes low-level statistical and frequency-domain traces that are characteristic of synthetic generation.

These signals are resilient to common video transformations, including compression and re-encoding, and generalize across different generative architectures.

Designed for high-throughput and low-latency deployment, the service supports use cases such as media authenticity verification, digital forensics, content moderation, and integrity monitoring in real-world environments.

Inputs and Outputs#

  • Input

    • H.264-encoded MP4 video. The service decodes the video into RGB frames internally using GPU-accelerated video decoding.

  • Per-frame or per-segment output (optional)

    • Synthetic likelihood scores associated with individual frames or temporal segments, which are useful for analysis and visualization.

  • Final aggregated output

    • A single probability in ([0, 1]) that summarizes how likely the entire video is synthetic.

    • Optional summary artifacts (for example, CSV or JSON) for offline analysis.

High-Level Service Architecture#

At a high level, the service consists of the following:

  • gRPC front-end: A bi-directional streaming gRPC API that accepts video bytes and streams detection results back to the client.

  • Video decoding layer: A GPU-accelerated decoding pipeline that converts H.264 video into normalized RGB frames suitable for model inference.

  • Inference engine: A batched inference layer that runs frames through the DINOv2 + DINOv3 ensemble using TensorRT-optimized engines.

  • Results and reporting: Video classification and per-frame analysis.

Try It Out#

Try the NVIDIA Synthetic Video Detector NIM at build.nvidia.com/nvidia/synthetic-video-detector.

To experience the NVIDIA Synthetic Video Detector NIM API without having to host your own servers, use the Try API feature, which uses the NVIDIA Cloud Function backend.