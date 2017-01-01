The sample applications provided in the package are for demonstration purposes only and may not be fully tuned for quality and performance. Hence the users are advised to do their independent evaluation for quality and/or performance.

1.2. Sample Applications#

Video Codec SDK contains various sample applications that demonstrate how to use NVENC and NVDEC APIs. These applications are primarily developed as reference code for developers to understand API usage, quickly run experiments, and compare the control/data flow with their applications. These samples are built on top of reusable classes NvEncoder and NvDecoder which encapsulate the main functionality and provide a simple high-level programming interface for quick development. A few other applications are also included as samples, which demonstrate how to use NVIDIA codec APIs optimized for scalability along with a few advanced features for quality/performance tradeoff.

Folder Structure:

Samples |-- AppDecode This folder contains all the decoder sample applications . |-- AppEncode This folder contains all the encoder sample applications . |-- AppTranscode This folder contains all the transcode sample applications . |-- External This folder contains the external dependencies required to build samples |-- NvCodec This folder contains classes that implement high - level interface for NVENC and NVDEC APIs . |-- Utils This folder contains the utility code used by samples .

Help section is available for all applications. Execute any application with command line parameter -h or --help to get basic usage information. Execute with -A or --advanced-options to get detailed usage information. Help section will contain list of supported parameters indicating which ones are mandatory and which ones are optional. For optional parameters, it also mentions what default value is set in case user does not pass any value to them.

Here is a short description of what each application does:

Decoder Applications

AppDec

This sample application illustrates the demuxing and decoding of a media file followed by resize and crop of the output frames. The application supports 4:2:0, 4:2:2 and 4:4:4 chroma formats (planar and semi-planar), e.g., YUV420P/YUV420P16, YUV422P/YUV422P16, YUV444P/YUV444P16, as well as NV12 and P016 output formats.

Sample command line:

AppDec . exe - i input . h264 - o output . yuv - gpu 0

AppDecD3D

This sample application illustrates the decoding of media file and display of decoded frames in a window. This is done by CUDA interop with D3D (both D3D9 and D3D11).

Sample command line:

AppDecD3D . exe - i input . h264 - d3d 11

AppDecGL

This sample application illustrates the decoding of media file and display of decoded frames in a window. This is done by CUDA interop with OpenGL. Synchronization between rendering and decode thread is achieved using ConcurrentQueue implementation.

Sample command line:

AppDecGL . exe - i input . h264

AppDecImageProvider

This sample application illustrates the decoding output of a media file in a desired color format. The application supports native YUV and various RGB (bgra, bgrp, bgra64) output formats.

Output formats supported: NV12/P016 (native YUV), BGRA/BGRP/BGRA64 (RGB).

Sample command line:

AppDecImageProvider . exe - i input . h264 - o output . bgra

AppDecLowLatency

This sample application demonstrates low-latency decoding. This feature sets CUVIDPARSERPARAMS::ulMaxDisplayDelay = 0 while creating the parser object to get an output frame as soon as it is available for display, without any delay.

There is a command-line option “force_zero_latency” that allows capturing decoded frames immediately after decode. The feature will work for streams having I and P frames only (same display and decode order), as the application captures decoded frames just after decode in the decode callback function.

Sample command line:

AppDecLowLatency . exe - i input . h264 - force_zero_latency

AppDecMem

This sample application is similar to AppDec. It illustrates how to demux and decode media content from memory buffer. It allocates AVIOContext explicitly and also defines method to read data packets from input file. For simplicity, this application reads the input stream and stores it in a buffer before invoking the demuxer.

Sample command line:

AppDecMem . exe - i input . h264

AppDecMultiFiles

This sample application illustrates the decoding of multiple files with/without using the decoder Reconfigure API. It also displays the time taken for decoder creation and destruction. Multiple files are specified using -filelist commandline option. The app will decode files in a sequential manner.

Sample command line:

AppDecMultiFiles . exe - filelist files . txt

AppDecMultiInput

This sample application demonstrates how to decode multiple raw video files and post-process them with CUDA kernels on different CUDA streams. This sample applies Ripple effect as a part of post processing. The effect consists of ripples expanding across the surface of decoded frames.

Sample command line:

AppDecMultiInput . exe - i input1 . yuv - s 1920 x1080 - i input2 . yuv - s 1280 x720

AppDecPerf

This sample application measures decoding performance in FPS. The application creates multiple host threads and runs a different decoding session on each thread. The number of threads can be controlled by the CLI option -thread . The application creates 2 host threads, each with a separate decode session, by default. The application supports measuring the decode performance only (keeping decoded frames in device memory) as well as measuring the decode performance including transfer of frames to the host memory.

Sample command line:

AppDecPerf . exe - i input . h264 - thread 2

Encoder Applications

AppEncCuda

This sample application illustrates encoding of frames in CUDA device buffers. The application reads the image data from file and loads it to CUDA input buffers obtained from the encoder using NvEncoder::GetNextInputFrame(). The encoder subsequently maps the CUDA buffers for encoder using NvEncodeAPI and submits them to NVENC hardware for encoding as part of EncodeFrame() function. The NVENC hardware output is written in system memory for this case.

This sample application also illustrates the use of video memory buffer allocated by the application to get the NVENC hardware output. This feature can be used for H264 ME-only mode, H264 encode and HEVC encode. This application copies the NVENC output from video memory buffer to host memory buffer in order to dump to a file, but this is not needed if application chooses to use it in some other way.

Since encoding may involve CUDA pre-processing on the input and post-processing on output, use of CUDA streams is also illustrated to pipeline the CUDA pre-processing and post-processing tasks, for output in video memory case.

CUDA streams can be used for H.264 ME-only, HEVC ME-only, H264 encode, HEVC encode and AV1 encode.

Input formats supported: IYUV/YV12, NV12, P010, NV16, P210, YUV444, YUV444P16, BGRA, BGRA10, AYUV, ABGR, ABGR10.

Sample command line:

AppEncCuda . exe - i input . yuv - s 1920 x1080 - if iyuv - o out . h264

AppEncD3D9

This sample application illustrates encoding of frames in IDirect3DSurface9 surfaces. There are 2 modes of operation demonstrated in this application. In the default mode application reads RGB data from file and copies it to D3D9 surfaces obtained from the encoder using NvEncoder::GetNextInputFrame() and the RGB surface is submitted to NVENC for encoding. In the second case ( -nv12 option) the application performs a color space conversion from RGB to NV12 using DXVA’s VideoProcessBlt API call and the NV12 surface is submitted for encoding.

Input formats supported: RGB; NV12 via -nv12 parameter.

Sample command line:

AppEncD3D9 . exe - i input . bgra - s 1920 x1080 - nv12 - o out . h264

AppEncD3D11

This sample application illustrates encoding of frames in ID3D11Texture2D textures. There are 2 modes of operation demonstrated in this application. In the default mode application reads RGB data from file and copies it to D3D11 textures obtained from the encoder using NvEncoder::GetNextInputFrame() and the RGB texture is submitted to NVENC for encoding. In the second case ( -nv12 option) the application converts RGB textures to NV12 textures using DXVA’s VideoProcessBlt API call and the NV12 texture is submitted for encoding.

This sample application also illustrates the use of video memory buffer allocated by the application to get the NVENC hardware output. This feature can be used for H264 ME-only mode, H264 encode, HEVC encode and AV1 encode.

Input formats supported: RGB; NV12 via -nv12 parameter.

Sample command line:

AppEncD3D11 . exe - i input . bgra - s 1920 x1080 - nv12 - o out . h264

AppEncD3D12

This sample application illustrates encoding of ID3D12Resource. This feature can be used for H264 encode, HEVC encode and AV1 encode.

Input formats supported: RGB.

Sample command line:

AppEncD3D12 . exe - i input . bgra - s 1920 x1080 - o out . h264

AppEncDec

This sample application illustrates the encoding and streaming of a video with one thread while another thread receives and decodes the video. HDR video streaming is also demonstrated in this application.

Input formats supported: IYUV, NV12, NV16, P010, P210, BGRA, BGRA64.

Sample command line:

AppEncDec . exe - i input . h264 - o out . h264

AppEncGL

This sample application illustrates encoding of frames stored in OpenGL textures. The application reads frames from the input file and uploads them to the textures obtained from the encoder using NvEncoder::GetNextInputFrame(). The encoder subsequently maps the textures for encoder using NvEncodeAPI and submits them to NVENC hardware for encoding as part of NvEncoder::EncodeFrame().

The X server must be running and the DISPLAY environment variable must be set when attempting to run this application.

Input formats supported: IYUV, NV12.

Sample command line:

AppEncGL . exe - i input . yuv - s 1920 x1080 - if iyuv - o out . h264

AppEncLowLatency

This sample application demonstrates low latency encoding features and other QOS features like bitrate change and resolution change. The application uses the CUDA interface to demonstrate the above features but can also be used with the D3D or OpenGL interfaces. There are 2 cases of operation demonstrated in this application, controlled by the CLI option -case . In the first case the application demonstrates bitrate change at runtime without the need to reset the encoder session. The application reduces the bitrate by half and then restores it to the original value after 100 frames. The second case demonstrates dynamic resolution change feature where the application can reduce resolution depending upon bandwidth requirement. In the application, the encode dimensions are reduced by half and restored to the original dimensions after 100 frames.

Input formats supported: IYUV, NV12, NV16, P210.

Sample command line:

AppEncLowLatency . exe - i input . yuv - s 1920 x1080 - if iyuv - case 1 - o out . h264

AppEncME

This sample application illustrates the use of NVENC hardware to calculate motion vectors. The application uses the CUDA device type and associated buffers when demonstrating the usage of the ME-only mode but can be used with other device types like D3D and OpenGL.

Input formats supported: IYUV, NV12, YV12, YUV444, P010, YUV444P16, BGRA, ARGB10, AYUV, ABGR, ABGR10.

Sample command line:

AppEncME . exe - i input . yuv - s 1920 x1080 - if iyuv

AppEncMultiInstance

This sample application was created to accelerate file compression storage applications. It does this by splitting the input video into N separate and independent video portions, i.e., independent GOPs (Split GOP). After being encoded independently, the compressed video portions are then written to file preserving the original order generating a single output bitstream. More than one encoding session thread can be used to encode the several independent video portions. Using more than 1 encoding session threads should allow for speedups when using NVIDIA GPUs with more than 1 NVENC. The number of portions the input video should be partitioned in is controlled by the CLI option -nf and the number of encoding session threads -thread . Note that on systems with GeForce GPUs, the number of simultaneous encode sessions allowed on the system is restricted to 5 sessions. There are separate threads for: 1. reading the RAW input frames from disk; 2. copying the RAW frames from RAM to VRAM, encoding and copying the compressed data from VRAM to RAM; 3. writing the compressed data to the output file. Additionally, the main thread is only used for initialization and to create work queues for the described threads.

Input formats supported: IYUV, NV12, YV12, YUV444, P010, YUV444P16, BGRA, ARGB10, AYUV, ABGR, ABGR10.

Sample command line:

AppEncMultiInstance . exe - i input . yuv - s 1920 x1080 - if iyuv - nf 4 - thread 2 - o out . h264

AppEncPerf

This sample application measures encoding performance in FPS. The application creates multiple host threads and runs a different encoding session on each thread. The number of threads can be controlled by the CLI option -thread . The application creates 2 host threads, each with a separate encode session, by default. Note that on systems with GeForce GPUs, the number of simultaneous encode sessions allowed on the system is restricted to 3 sessions.

Input formats supported: IYUV, NV12, YV12, NV16, YUV444, P010, P210, YUV444P16, BGRA, ARGB10, AYUV, ABGR, ABGR10.

Sample command line:

AppEncPerf . exe - i input . yuv - s 1920 x1080 - if iyuv - thread 2 - frame 2000

AppEncQual

This sample application demonstrates an Iterative Encoder implementation. A constant quality mode is implemented where the user is able to specify a minimum and maximum PSNR-Y as well as maximum number of iterations per frame. The Iterative Encoder will:

Interrupt the encoder state after each encoded frame; Check the Reconstructed frame’s PSNR-Y (Reconstructed Frame Output API); Compare against the user defined range of desired PSNRs; Adjust the QP/CQ parameter for the next iteration (Reconfigure API); After the desired PSNR range or maximum number of iterations is reached, the encoder state is advanced and the next frame is encoded

This sample is compatible with rate controls Constant QP and VBR Constant Quality. The QP/CQ parameter is adjusted based on qpDelta input parameter (default: 1).

Input formats supported: IYUV, NV12, YV12, NV16, P210, YUV444.

Sample command line:

AppEncQual . exe - i input . yuv - s 1920 x1080 - if iyuv - maxiter 3 - minpsnr 35 - maxpsnr 40

AppEncExternalMEHint

This sample application demonstrates passing external ME hints to NVENC. The application uses the CUDA interface to demonstrate the above feature but can also be used with the D3D or OpenGL interfaces. When external ME hints are enabled, the application expects configuration files that specify hint counts per block and paths to input hint files for each frame, allowing users to provide custom motion estimation guidance to the encoder.

Input formats supported: IYUV, NV12.

Sample command line:

AppEncExternalMEHint . exe - i input . yuv - s 1920 x1080 - if nv12 - externalMEHintConfigFile hints . cfg

AppMotionEstimationVkCuda

This sample application demonstrates feeding of CUarrays to EncodeAPI for the purposes of motion estimation between pairs of frames, using the H.264 motion estimation-only mode. The CUarrays registered with EncodeAPI have not been created by the application but have been obtained through the interop of CUDA with the Vulkan graphics API.

Transcode Applications

AppTranscode

This sample application demonstrates transcoding of an input video stream. If requested by the user, the bit-depth of the decoded content will be converted to the target bit-depth before encoding. The only supported conversions are from 8-bit to 10-bit (per component) and vice versa.

Sample command line:

AppTranscode . exe - i input . h264 - o out . h264

AppTransOneToN

This sample application demonstrates 1:N transcoding of a single input stream. Decoding of frames from the input stream takes place on the main thread and new threads are spawned for each output stream. A different resolution can be specified for each output stream and the decoded frames will be scaled as required. If no output resolutions are specified, this application will generate two streams: one of 1280x720 and the other of 800x480.

Sample command line:

AppTransOneToN . exe - i input . h264 - o out - r 1280 x720 800 x480

AppTransPerf

This sample application measures transcoding performance in FPS. This sample application takes a single input stream and spawns N pairs of threads. In each pair, one thread is responsible for decoding the input stream and making the decoded frames available to the other thread for encoding.

Sample command line: