Overview

The video_decode sample application demonstrates how to use the buffer allocated by the libv4l2 component to decode H.264, H.265, VP8, VP9, MPEG4, or MPEG2 video streams.

The application reads an H.264, H.265, VP8, VP9, MPEG4, or MPEG2 elementary video file, decodes it, and passes it to the EGL renderer to show the decoded images without any extra memory copy. The basic support to display with Vulkan swapchain is also added, it imports the dma_buf FD created for decoded frame into Vulkan driver and binds it to a vulkan image. This will be further enhanced in future releases.

Note: The application supports slice-level decoding for H.265 on Xavier.

Supported video formats are:

H.264
H.265
VP8
VP9
MPEG4
MPEG2
AV1
MJPEG
Note
Decoding of MJPEG format using nvv4l2decoder is not supported in Jetson Thor.

CUDA Integration

The application includes comprehensive CUDA support for GPU-accelerated decoding:

CUDA Memory Types: Supports CUDA Device and CUDA Pinned memory allocation
GPU Detection: Automatically detects GPU capabilities using isGPUEnabled()
Memory Management: Intelligent memory type selection based on GPU availability
Performance Optimization: GPU-accelerated processing paths for enhanced performance

Thor Platform Support

On Thor platforms, the application utilizes GPU-accelerated decoding for:

Enhanced GPU resource management and allocation
Optimized memory bandwidth utilization between CPU and GPU
Improved multi-stream decoding capabilities with GPU coordination
Better integration with CUDA-based system-wide resource scheduling

Building and Running

Prerequisites

You have followed steps 1-3 in Building and Running.
If you are building from your host Linux PC (x86), you have followed step 4 in Building and Running.
For CUDA functionality, ensure CUDA runtime is properly installed and configured.

To build

Enter:

 $ cd /usr/src/jetson_multimedia_api/samples/00_video_decode
 $ apt install libvulkan-dev
 $ make

To run

Enter:

 $ ./video_decode <in-format> [options] <in-file>

CUDA-specific Options

The application supports several CUDA-related command-line options:

-alloc_type_cplane <num>: Allocation memory type for output plane buffer
- 0: Default (system-managed allocation)
- 1: CUDA Pinned (host memory accessible by GPU)
- 2: CUDA Device (GPU memory)

To view supported options

Enter:

   $ ./video_decode --help

Example

  $ ./video_decode H264 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

CUDA-enabled Example

   $ ./video_decode H264 -alloc_type_cplane 2 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

This example uses CUDA Device memory for optimal GPU performance.

Vulkan rendering

$ ./video_decode H264 –vk-rendering -ww 1920 -wh 1080 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

Example for slice-level decoding

  $ ./video_decode H265 --input-nalu --enable-sld h265_ulld_1080p_30fps_10sec.265

Flow

The following diagram shows the flow through this sample.

The Output Plane receives input in bitstream format and delivers it to the Decoder for decoding.
The Capture Plane transfer decoded frames to the application in YUV format.
For the Output Plane the application supports MMAP and USRPTR memory types. For the Capture Plane it supports MMAP and DMABUF memory types.
The application can also dump files from the Capture Plane.
Note
Only MMAP is supported in output plane for Jetson Thor.

CUDA Processing Flow

When CUDA is enabled, the processing flow includes additional GPU-accelerated paths:

Memory Allocation: CUDA-aware memory allocation using NvBufSurf APIs
GPU Detection: Runtime detection of GPU capabilities via isGPUEnabled()
Memory Type Configuration: Automatic configuration of CUDA memory types using setCudaMemType()
Buffer Management: Intelligent buffer management for CUDA Device/Pinned memory
Performance Optimization: GPU-accelerated decoding with optimized memory transfers

Thor Platform Flow

On Thor platforms:

Resource Discovery: GPU-based resource enumeration and capability detection
Memory Management: Enhanced memory allocation through GPU-aware APIs
Stream Management: Multi-stream decoding coordination via GPU scheduler
Performance Monitoring: Real-time performance metrics and adaptive GPU optimization

Key Structure and Classes

The sample uses the following key structures and classes.

Element	Description
NvVideoDecoder	Contains all video decoding-related elements and functions.
NvEglRenderer	Contains all EGL display rendering-related functions.
dec_capture_loop	A pointer to the thread handler for the decoding capture loop.

The NvVideoDecoder class packages all video decoding-related elements and functions. Key members used in the sample are:

Member	Description
output_plane	Specifies the V4L2 output plane.
capture_plane	Specifies the V4L2 capture plane.
createVideoDecoder	Static function to create video decode object.
subscribeEvent	Subscribe event.
setOutputPlaneFormat	Set output plane format.
setCapturePlaneFormat	Set capture plane format.
dqEvent	Dqueue the event which reports by the V4L2 device.
isInError	Checks if under error state.

Class NvVideoDecoder contains two key elements: output_plane and capture_plane. These objects are derived from class type NvV4l2ElementPlane. The sample uses the following key members:

Element	Description
setupPlane	Sets up the plane of V4L2 element.
deinitPlane	Destroys the plane of the V4L2 element.
setStreamStatus	Starts/stops the stream.
setDQThreadCallback	Sets the callback function of the dqueue buffer thread.
startDQThread	Starts the thread of the dqueue buffer.
stopDQThread	Stops the thread of the dqueue buffer.
qBuffer	Queues the V4L2 buffer.
dqBuffer	Dequeues the V4L2 buffer.
getNumBuffers	Gets the number of V4L2 buffers.
getNumQueuedBuffers	Gets the number of buffers currently queued on the plane.

CUDA Support

CUDA-Specific APIs

The sample includes additional CUDA-specific functionality:

API Function	Description
isGPUEnabled	Checks if GPU configuration is enabled for CUDA processing.
setCudaMemType	Sets CUDA memory type (Device/Pinned) when GPU is enabled.

Memory Management APIs

API	Description
`NvBufSurf::NvAllocate`	Allocates buffer surfaces with CUDA memory type support.
`NVBUF_MEM_CUDA_DEVICE`	Enum for CUDA device memory allocation.
`NVBUF_MEM_CUDA_PINNED`	Enum for CUDA pinned (host) memory allocation.
`V4L2_CUDA_MEM_TYPE_*`	V4L2 controls for CUDA memory type configuration.

GPU Integration on Thor

The GPU acceleration provides enhanced capabilities:

Unified Memory Management: Coordinated allocation across CPU/GPU memory domains
Performance Optimization: Real-time performance monitoring and adaptive GPU tuning
Multi-Stream Support: Efficient GPU resource sharing for concurrent decoding streams
System Integration: Deep integration with Thor platform GPU capabilities

Platform-Specific Considerations

Platform Feature	Thor (T264)	Other Platforms
CUDA Support	GPU-accelerated	Direct CUDA API
Memory Types	GPU-managed	Standard allocation
Multi-stream	GPU-scheduled	Application managed
Resource limits	Dynamic GPU allocation	Static configuration

Note: Features marked as "Not supported for T264" may have equivalent functionality through GPU-accelerated paths or platform-specific alternatives.

Jetson Linux API Reference

38.2 Release

Overview

CUDA Integration

Thor Platform Support

Building and Running

Prerequisites

To build

To run

CUDA-specific Options

To view supported options

Example

CUDA-enabled Example

Vulkan rendering

Example for slice-level decoding

Flow

CUDA Processing Flow

Thor Platform Flow

Key Structure and Classes

CUDA Support

CUDA-Specific APIs

Memory Management APIs

GPU Integration on Thor

Platform-Specific Considerations