

# NVIDIA VIDEO CODEC SDK - ENCODER

**Application Note** 

### Table of Contents

| Chapter 1. NVIDIA Hardware Video Encoder | 1 |
|------------------------------------------|---|
| 1.1. Introduction                        | 1 |
| 1.2. NVENC Capabilities                  | 1 |
| 1.3. NVENC Licensing Policy              | 3 |
| 1.4. NVENC Performance                   | 3 |
| 1.5. Programming NVENC                   | 5 |
| 1.6. FFmpeg Support                      | 5 |

## Chapter 1. NVIDIA Hardware Video Encoder

### 1.1. Introduction

NVIDIA GPUs - beginning with the Kepler generation - contain a hardware-based encoder (referred to as NVENC in this document) which provides fully accelerated hardware-based video encoding and is independent of graphics/CUDA cores. With end-to-end encoding offloaded to NVENC, the graphics/CUDA cores and the CPU cores are free for other operations. For example, in a game recording scenario, offloading the encoding to NVENC makes the graphics engine fully available for game rendering. In the video transcoding use-case, video encoding/decoding can happen on NVENC/NVDEC in parallel with other video post-/pre-processing on CUDA cores.

The hardware capabilities available in NVENC are exposed through APIs referred to as NVENCODE APIs in the document. This document provides information about the capabilities of the hardware encoder and features exposed through NVENCODE APIs.

### 1.2. **NVENC** Capabilities

NVENC can perform end-to-end encoding for H.264, HEVC 8-bit, HEVC 10-bit, AV1 8-bit and AV1 10-bit. This includes motion estimation and mode decision, motion compensation and residual coding, and entropy coding. It can also be used to generate motion vectors between two frames, which are useful for applications such as depth estimation, frame interpolation, encoding using other codecs not supported by NVENC, or hybrid encoding wherein motion estimation is performed by NVENC and the rest of the encoding is handled elsewhere in the system. These operations are hardware accelerated by a dedicated block on GPU silicon die. NVENCODE APIs provide the necessary knobs to utilize the hardware encoding capabilities.

<u>Table 1</u> summarizes the capabilities of the NVENC hardware exposed through NVENCODE APIs.

| Feature                                          | Description                                                                            | Kepler<br>GPUs | 1st Gen<br>Maxwell<br>GPUs | 2nd Gen<br>Maxwell<br>GPUs | Pascal<br>GPUs | Volta and<br>TU117<br>GPUs | Ampere<br>and Turing<br>GPUs<br>(except<br>TU117) | Ada GPUs |
|--------------------------------------------------|----------------------------------------------------------------------------------------|----------------|----------------------------|----------------------------|----------------|----------------------------|---------------------------------------------------|----------|
| H.264 baseline,<br>main and high<br>profiles     | Capability to encode<br>YUV 4:2:0 sequence<br>and generate a<br>H.264-bit stream.      | Y              | Y                          | Y                          | Y              | Y                          | Y                                                 | Y        |
| H.264 4:4:4<br>encoding (only<br>CAVLC)          | Capability to encode<br>YUV 4:4:4 sequence<br>and generate a<br>H.264-bit stream.      | Ν              | Y                          | Y                          | Y              | Y                          | Y                                                 | Y        |
| H.264 lossless<br>encoding                       | Lossless encoding.                                                                     | Ν              | Y                          | Y                          | Y              | Y                          | Y                                                 | Y        |
| H.264 motion<br>estimation (ME)<br>only mode     | Capability to provide<br>macro-block level<br>motion vectors and<br>intra/inter modes. | N              | Y                          | Y                          | Y              | Y                          | Y                                                 | Y        |
| H.264 field<br>encoding                          | Capability to encode field content.                                                    | Y              | Y                          | Y                          | Y              | Y                          | N                                                 | Ν        |
| H.264/HEVC<br>weighted<br>prediction             | Support for weighted prediction.                                                       | N              | N                          | N                          | Y              | Y                          | Y                                                 | Y        |
| Encoding<br>support for<br>H.264 ARGB<br>content | Capability to encode<br>RGB input.                                                     | Y              | Y                          | Y                          | Y              | Y                          | Y                                                 | Y        |
| Multiple<br>reference<br>frames for H.264        | Capability to use<br>different reference<br>frames                                     | N              | N                          | N                          | N              | N                          | Y                                                 | Y        |
| HEVC main<br>profile                             | Capability to encode<br>YUV 4:2:0 sequence<br>and generate a<br>HEVC bit stream.       | N              | N                          | Y                          | Y              | Y                          | Y                                                 | Y        |
| HEVC lossless<br>encoding                        | Lossless encoding.                                                                     | N              | N                          | N                          | Y              | Y                          | Y                                                 | Y        |
| HEVC main10<br>profile                           | Support for<br>encoding 10-bit<br>content generate a<br>HEVC bit stream.               | N              | N                          | N                          | Y              | Y                          | Y                                                 | Y        |
| HEVC 4:4:4<br>encoding                           | Capability to encode<br>YUV 4:4:4 sequence<br>and generate a<br>HEVC bit stream.       | N              | N                          | N                          | Y              | Y                          | Y                                                 | Y        |
| HEVC motion<br>estimation (ME)<br>only mode      | Capability to provide<br>CTB level motion<br>vectors and intra/<br>inter modes.        | N              | N                          | N                          | Y              | Y                          | Y                                                 | Y        |
| HEVC 8K<br>encoding                              | Support for<br>encoding 8192 ×<br>8192 Content.                                        | N              | N                          | N                          | Y*             | Y                          | Y                                                 | Y        |
| HEVC sample<br>adaptive offset<br>(SAO)          | Improves encoded video quality.                                                        | N              | N                          | N                          | Y              | Y                          | Y                                                 | Y        |
| HEVC B frame                                     | Improves encoded quality                                                               | N              | N                          | N                          | N              | N                          | Y                                                 | Y        |

| Feature                                  | Description                                                                                                                           | Kepler<br>GPUs | 1st Gen<br>Maxwell<br>GPUs | 2nd Gen<br>Maxwell<br>GPUs | Pascal<br>GPUs | Volta and<br>TU117<br>GPUs | Ampere<br>and Turing<br>GPUs<br>(except<br>TU117) | Ada GPUs |
|------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|----------------|----------------------------|----------------------------|----------------|----------------------------|---------------------------------------------------|----------|
| Multiple<br>reference<br>frames for HEVC | Capability to use<br>different reference<br>frames                                                                                    | Ν              | N                          | N                          | N              | N                          | Y                                                 | Y        |
| AV1 main profile                         | Capability to encode<br>YUV 4:2:0 8-bit<br>and 10-bit content<br>up to 8192 × 8192<br>resolution and<br>generate a AV1 bit<br>stream. | N              | N                          | N                          | Ν              | Ν                          | N                                                 | Y        |

- ► **Y**: Supported, **N**: Not supported
- \*Supported in select Pascal generation GPUs

## 1.3. NVENC Licensing Policy

As far as NVENC hardware encoding is concerned, NVIDIA GPUs are classified into two categories: "qualified" and "non-qualified". On qualified GPUs, the number of concurrent encode sessions is limited by available system resources (encoder capacity, system memory, video memory etc.). On non-qualified GPUs, the number of concurrent encode sessions is limited to 8 per system. This limit of 8 concurrent sessions per system applies to the combined number of encoding sessions executed on all non-qualified cards present in the system.

For a complete list of qualified and non-qualified GPUs, refer to <u>https://developer.nvidia.com/</u> <u>nvidia-video-codec-sdk</u>..

For example, on a system with one Quadro RTX4000 card (which is a qualified GPU) and three GeForce cards (which are non-qualified GPUs), the application can run N simultaneous encode sessions on Quadro RTX4000 card (where N is defined by the encoder/memory/hardware limitations) and 8 sessions on all the GeForce cards combined. Thus, the limit on the number of simultaneous encode sessions for such a system is N + 8.

### 1.4. NVENC Performance

With every generation of NVIDIA GPUs (Maxwell 1st/2nd gen, Pascal, Volta, Turing, Ampere and Ada), NVENC performance has increased steadily. <u>Table 2</u> provides *indicative*<sup>1</sup> NVENC performance on Pascal, Turing, and Ada GPUs for different presets and rate control modes (these two factors play a major role in determining the performance and quality). Note that performance numbers in <u>Table 2</u> are measured on GeForce hardware with assumptions listed under the table. The performance varies across GPU classes (e.g. Quadro, Tesla), and scales (almost) linearly with the clock speeds for each hardware.

<sup>&</sup>lt;sup>1</sup> Encoder performance depends on many factors, including but not limited to: Encoder settings, GPU clocks, GPU type, video content type etc.

While first-generation Maxwell GPUs had one NVENC engine per chip, certain variants of the second-generation Maxwell, Pascal, Volta and Ada GPUs have two/three NVENC engines per chip. This increases the aggregate encoder performance of the GPU. NVIDIA driver takes care of load balancing among multiple NVENC engines on the chip, so that applications don't require any special code to take advantage of multiple encoders and automatically benefit from higher encoder capacity on higher-end GPU hardware. The encode performance listed in <u>Table 2</u> is given *per NVENC engine*. Thus, if the GPU has 2 NVENCs (e.g. GP104, AD104), multiply the corresponding number in <u>Table 2</u> by the number of NVENCs per chip to get aggregate maximum performance (applicable only when running multiple simultaneous encode sessions). Note that unless Split Frame Encoding is enabled, performance with single encoding session cannot exceed performance per NVENC, regardless of the number of NVENCs present on the GPU. Multi NVENC Split Frame Encoding is a feature introduced in SDK12.0 on Ada GPUs for HEVC and AV1. Refer to the NVENC Video Encoder API Programming Guide for more details on this feature.

NVENC hardware natively supports multiple hardware encoding contexts with negligible context-switching penalty. As a result, subject to the hardware performance limit and available memory, an application can encode multiple videos simultaneously. NVENCODE API exposes several presets, rate control modes and other parameters for programming the hardware. A combination of these parameters enables video encoding at varying quality and performance levels. In general, one can trade performance for quality and vice versa.

|        | RC   | Tuning |        | Н.:    | 264    |     | AV1    |        |        |     |     |
|--------|------|--------|--------|--------|--------|-----|--------|--------|--------|-----|-----|
| Preset | Mode | Info   | Pascal | Turing | Ampere | Ada | Pascal | Turing | Ampere | Ada | Ada |
| р1     | CBR  | LL     | 681    | 787    | 810    | 805 | 537    | 834    | 875    | 903 | 930 |
|        | VBR  | HQ     | 700    | 761    | 788    | 779 | 503    | 824    | 865    | 883 | 696 |
| р3     | CBR  | LL     | 663    | 550    | 570    | 595 | 440    | 412    | 433    | 458 | 697 |
|        | VBR  | HQ     | 396    | 547    | 568    | 589 | 438    | 494    | 514    | 639 | 532 |
| р5     | CBR  | LL     | 364    | 243    | 253    | 278 | 367    | 271    | 282    | 326 | 475 |
|        | VBR  | HQ     | 325    | 236    | 246    | 272 | 366    | 298    | 308    | 389 | 419 |
| р7     | CBR  | LL     | 322    | 204    | 213    | 238 | 341    | 271    | 282    | 325 | 338 |
|        | VBR  | HQ     | 244    | 186    | 194    | 206 | 262    | 149    | 156    | 178 | 312 |

### Table 2.NVENC encoding performance in frames/second (fps)

▶ <u>Resolution/Input Format/Bit depth</u>: 1920 × 1080/YUV 4:2:0/8-bit

Above measurements are made using the following GPUs: GTX 1060 for Pascal, RTX 8000 for Turing, RTX 3090 for Ampere, and RTX 4090 for Ada. All measurements are done at the highest video clocks as reported by nvidia-smi (i.e. 1708 MHz, 1950 MHz, 1950 MHz, 2415 MHz for GTX 1060, RTX 8000, RTX 3090, and RTX 4090 respectively). The performance should scale according to the video clocks as reported by nvidia-smi for other GPUs of every

individual family. Information on nvidia-smi can be found at <u>https://developer.nvidia.com/</u><u>nvidia-system-management-interface</u>.

- H.264 and HEVC encoding fps for Volta GPU can be obtained by multiplying the Pascal fps in the above table by ratio of the clocks, as reported by nvidia-smi.
- Software: Windows 11, Video Codec SDK v12.2, NVIDIA display driver: 551.76
- CBR: Constant bitrate rate control mode, VBR: Variable bitrate rate control mode, LL : Low latency tuning info, HQ: High quality tuning info

### 1.5. Programming NVENC

Refer to the SDK release notes for information regarding the required driver version.

Refer to the documents and the sample applications included in the SDK package for details on how to program NVENC.

## 1.6. FFmpeg Support

FFmpeg is the most popular multimedia transcoding tool used extensively for video and audio transcoding.

The video hardware accelerators in NVIDIA GPUs can be effectively used with FFmpeg to significantly speed up the video decoding, encoding and end-to-end transcoding at very high performance. For more information on how to use NVENC or NVDEC with FFmpeg, please refer to the FFmpeg guide in the Video Codec SDK.

Note that FFmpeg is open-source project and its usage is governed by specific licenses and terms and conditions for FFmpeg.

### Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation ("NVIDIA") makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.

Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgment, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer ("Terms of Sale"). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use is at customer's own risk.

NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer's sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer's product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

#### Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA Toolkit, cuDNN, DALI, DIGITS, DGX, DGX-1, DGX-2, DGX Station, DLProf, GPU, Jetson, Kepler, Maxwell, NCCL, Nsight Compute, Nsight Systems, NVCaffe, NVIDIA Deep Learning SDK, NVIDIA Developer Program, NVIDIA GPU Cloud, NVLink, NVSHMEM, PerfWorks, Pascal, SDK Manager, Tegra, TensorRT, TensorRT Inference Server, Tesla, TF-TRT, Triton Inference Server, Turing, and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

### Copyright

© 2010-2024 NVIDIA Corporation. All rights reserved.

