1. Introduction# NVIDIA GPUs and NVIDIA Jetson platforms contain a hardware-based decoder (referred to as NVDEC in this document) which provides fully accelerated hardware-based video decoding for several popular codecs. With complete decoding offloaded to NVDEC, the graphics engine and CPU are free for other operations. NVDEC supports much faster than real-time decoding which makes it suitable for transcoding scenarios in addition to video playback. The hardware capabilities available in NVDEC are exposed through APIs referred to as NVDECODE APIs in this document. This document provides information about the capabilities of the NVDEC engine and the features exposed through NVDECODE APIs.

2. NVDEC Capabilities# At a high level, NVDEC Hardware Capabilities summarizes the capabilities of the NVDEC engine exposed through NVDECODE APIs. NVDEC Hardware Capabilities # Hardware Features 1st Gen Maxwell GPUs 2nd Gen Maxwell GPUs Pascal GPUs Volta GPUs Turing/ GA100/ Hopper GPUs GA10x3 and ADA GPUs Blackwell GPUs/ Jetson Thor VC1 Simple, Main & Advanced profiles Y Y Y Y Y Y Y MPEG4 Simple and Advanced Simple Profiles Y Y Y Y Y Y Y MPEG2 Simple & Main profiles Y Y Y Y Y Y Y H.264 Baseline, Main, High Profiles Y Y Y Y Y Y Y H.264 8192x8192 Decoding support N N N N N N Y H264 High10/High422 profiles N N N N N N Y VP8 N Y Y1 Y Y Y Y HEVC Main and Main 10 Profile1 N Y1 Y Y Y Y Y HEVC 444 decoding N N N N Y Y Y HEVC main 422 10/12 profiles N N N N N N Y HEVC 8192x8192 Decoding support N N Y1 Y Y Y Y VP9 8192x8192 Decoding support N N Y1 Y Y Y Y VP9 Profile 01 N Y1 Y Y Y Y Y Multiple NVDECs2 N N N N Y Y Y AV1 Main Profile decoding N N N N N Y Y Y : Supported, N : Unsupported

1 : Present in select GPUs

2 : Present in select GPUs

3: GA10x GPUs include all GPUs based on Ampere architecture except GA100

3. NVDEC Performance# NVDEC natively supports multiple hardware decoding contexts with negligible context-switching penalty. As a result, subject to the hardware performance limit and available memory, an application can decode multiple videos simultaneously. The hardware and software maintain the context for each decoding session, allowing many simultaneous decoding sessions to run in parallel with minimal context switch penalty. NVDEC Decoding Performance provides indicative data of the decoding performance of NVDEC in GPUs based on Maxwell, Pascal, Turing, Ampere and Blackwell architectures as well as Jetson platforms based on Thor for AV1, HEVC, VP9, and H.264 encoded bitstreams. The performance varies across GPU classes (e.g. Quadro, Tesla), and scales (almost) linearly with the clock speeds for each hardware. NVDEC Decoding Performance (Indicative) # GPU Architecture/ Jetson Platform Codec

Performance in frames/second

Pascal H.264 694 VP9 846 HEVC 810 HEVC Main10 789 Turing H.264 771 VP9 932 VP9 10 bit 925 HEVC 1316 HEVC Main10 1158 Ampere H.264 748 VP9 1075 VP9 10 bit 1120 HEVC 1415 HEVC Main10 1299 AV1 790 Ada H.264 903 VP9 1290 VP9 10 bit 1342 HEVC 1641 HEVC Main10 1520 AV1 1018 Blackwell H.264 2172 VP9 1445 VP9 10 bit 1498 HEVC 1872 HEVC Main10 1818 AV1 1119 Jetson Thor H.264 1434 VP9 1019 VP9 10 bit 1016 HEVC 1293 HEVC Main10 1130 AV1 794 All the measurement is done on the highest video clocks as reported by nvidia-smi (i.e. 1544 MHz, 1860 MHz, 1665 MHz, 2160 MHz, 2362 MHz for Pascal, Turing, Ampere, Ada, and Blackwell GPUs respectively and 1691 MHz for Jetson Thor platform). The performance should scale according to the video clocks as reported by nvidia-smi on target GPU. Information on nvidia-smi can be found at https://developer.nvidia.com/nvidia-system-management-interface .

Resolution/Input format: 1920x1080/YUV 4:2:0

Software: Windows 11/Jetson Linux 38.2 for GPUs/Jetson platform, Video Codec SDK v13.0

Hopper and GA100 GPUs contain NVDEC with same architecture as Turing. As a result, the decoding performance on Hopper and GA100 GPUs is same as that of Turing GPUs, scaled by the clock speed. To view the clocks available on your GPU, please use the tool nvidia-smi included with the NVIDIA driver. While first-generation Maxwell GPUs had one NVDEC engine per chip, certain variants of the second-generation Maxwell, Pascal, Volta and Ada GPUs have two/three NVDEC engines per chip. This increases the aggregate decoder performance of the GPU. NVIDIA driver takes care of load balancing among multiple NVDEC engines on the chip, so that applications don’t require any special code to take advantage of multiple decoders and automatically benefit from higher decoder capacity on higher-end GPU hardware. The decode performance listed in NVDEC Decoding Performance is given per NVDEC engine. Thus, if a Quadro or Tesla GPU has 2 NVDECs, multiply the corresponding number in NVDEC Decoding Performance by the number of NVDECs per chip to get aggregate maximum performance (applicable only when running multiple simultaneous decode sessions). Note that performance with a single decoding session cannot exceed performance per NVDEC, regardless of the number of NVDECs present on the GPU. All GeForce products consist of a single NVDEC.

4. Programming NVDEC# The NVDECODE API provides access to the video decoding features of NVDEC described in the previous chapters. The API includes implementations of commonly used post-processing operations such as scaling, cropping, aspect ratio conversion, deinterlacing and color space conversion to the decoded output. Refer to the SDK release notes for information regarding the required driver version. Various capabilities of NVDEC are exposed to the application software via the NVIDIA proprietary application programming interface (NVDECODE APIs). Refer to the Video Decoder Programming guide for details on using these APIs. For a complete list of GPUs supporting hardware accelerated decoding refer to https://developer.nvidia.com/nvidia-video-codec-sdk.