Jetson Linux API Reference

38.2 Release
00_video_decode (video decode)

Overview

The video_decode sample application demonstrates how to use the buffer allocated by the libv4l2 component to decode H.264, H.265, VP8, VP9, MPEG4, or MPEG2 video streams.

The application reads an H.264, H.265, VP8, VP9, MPEG4, or MPEG2 elementary video file, decodes it, and passes it to the EGL renderer to show the decoded images without any extra memory copy. The basic support to display with Vulkan swapchain is also added, it imports the dma_buf FD created for decoded frame into Vulkan driver and binds it to a vulkan image. This will be further enhanced in future releases.

Note
The application supports slice-level decoding for H.265 on Xavier.

Supported video formats are:

  • H.264
  • H.265
  • VP8
  • VP9
  • MPEG4
  • MPEG2
  • AV1
  • MJPEG
    Note
    Decoding of MJPEG format using nvv4l2decoder is not supported in Jetson Thor.

CUDA Integration

The application includes comprehensive CUDA support for GPU-accelerated decoding:

  • CUDA Memory Types: Supports CUDA Device and CUDA Pinned memory allocation
  • GPU Detection: Automatically detects GPU capabilities using isGPUEnabled()
  • Memory Management: Intelligent memory type selection based on GPU availability
  • Performance Optimization: GPU-accelerated processing paths for enhanced performance

Thor Platform Support

On Thor platforms, the application utilizes GPU-accelerated decoding for:

  • Enhanced GPU resource management and allocation
  • Optimized memory bandwidth utilization between CPU and GPU
  • Improved multi-stream decoding capabilities with GPU coordination
  • Better integration with CUDA-based system-wide resource scheduling


Building and Running

Prerequisites

  • You have followed steps 1-3 in Building and Running.
  • If you are building from your host Linux PC (x86), you have followed step 4 in Building and Running.
  • For CUDA functionality, ensure CUDA runtime is properly installed and configured.

To build

  • Enter:
     $ cd /usr/src/jetson_multimedia_api/samples/00_video_decode
     $ apt install libvulkan-dev
     $ make
    

To run

  • Enter:
     $ ./video_decode <in-format> [options] <in-file>
    

CUDA-specific Options

The application supports several CUDA-related command-line options:

  • -alloc_type_cplane <num>: Allocation memory type for output plane buffer
    • 0: Default (system-managed allocation)
    • 1: CUDA Pinned (host memory accessible by GPU)
    • 2: CUDA Device (GPU memory)

To view supported options

Enter:

   $ ./video_decode --help

Example

  $ ./video_decode H264 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

CUDA-enabled Example

   $ ./video_decode H264 -alloc_type_cplane 2 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

This example uses CUDA Device memory for optimal GPU performance.

Vulkan rendering

$ ./video_decode H264 –vk-rendering -ww 1920 -wh 1080 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

Example for slice-level decoding

  $ ./video_decode H265 --input-nalu --enable-sld h265_ulld_1080p_30fps_10sec.265


Flow

The following diagram shows the flow through this sample.

  • The Output Plane receives input in bitstream format and delivers it to the Decoder for decoding.
  • The Capture Plane transfer decoded frames to the application in YUV format.
  • For the Output Plane the application supports MMAP and USRPTR memory types. For the Capture Plane it supports MMAP and DMABUF memory types.
  • The application can also dump files from the Capture Plane.
    Note
    Only MMAP is supported in output plane for Jetson Thor.

CUDA Processing Flow

When CUDA is enabled, the processing flow includes additional GPU-accelerated paths:

  1. Memory Allocation: CUDA-aware memory allocation using NvBufSurf APIs
  2. GPU Detection: Runtime detection of GPU capabilities via isGPUEnabled()
  3. Memory Type Configuration: Automatic configuration of CUDA memory types using setCudaMemType()
  4. Buffer Management: Intelligent buffer management for CUDA Device/Pinned memory
  5. Performance Optimization: GPU-accelerated decoding with optimized memory transfers

Thor Platform Flow

On Thor platforms:

  1. Resource Discovery: GPU-based resource enumeration and capability detection
  2. Memory Management: Enhanced memory allocation through GPU-aware APIs
  3. Stream Management: Multi-stream decoding coordination via GPU scheduler
  4. Performance Monitoring: Real-time performance metrics and adaptive GPU optimization


Key Structure and Classes

The sample uses the following key structures and classes.

Element Description
NvVideoDecoder Contains all video decoding-related elements and functions.
NvEglRenderer Contains all EGL display rendering-related functions.
dec_capture_loop A pointer to the thread handler for the decoding capture loop.

The NvVideoDecoder class packages all video decoding-related elements and functions. Key members used in the sample are:

Member Description
output_plane Specifies the V4L2 output plane.
capture_plane Specifies the V4L2 capture plane.
createVideoDecoder Static function to create video decode object.
subscribeEvent Subscribe event.
setOutputPlaneFormat Set output plane format.
setCapturePlaneFormat Set capture plane format.
dqEvent Dqueue the event which reports by the V4L2 device.
isInError Checks if under error state.

Class NvVideoDecoder contains two key elements: output_plane and capture_plane. These objects are derived from class type NvV4l2ElementPlane. The sample uses the following key members:

Element Description
setupPlane Sets up the plane of V4L2 element.
deinitPlane Destroys the plane of the V4L2 element.
setStreamStatus Starts/stops the stream.
setDQThreadCallback Sets the callback function of the dqueue buffer thread.
startDQThread Starts the thread of the dqueue buffer.
stopDQThread Stops the thread of the dqueue buffer.
qBuffer Queues the V4L2 buffer.
dqBuffer Dequeues the V4L2 buffer.
getNumBuffers Gets the number of V4L2 buffers.
getNumQueuedBuffers Gets the number of buffers currently queued on the plane.


CUDA Support

CUDA-Specific APIs

The sample includes additional CUDA-specific functionality:

API Function Description
isGPUEnabled Checks if GPU configuration is enabled for CUDA processing.
setCudaMemType Sets CUDA memory type (Device/Pinned) when GPU is enabled.

Memory Management APIs

API Description
NvBufSurf::NvAllocate Allocates buffer surfaces with CUDA memory type support.
NVBUF_MEM_CUDA_DEVICE Enum for CUDA device memory allocation.
NVBUF_MEM_CUDA_PINNED Enum for CUDA pinned (host) memory allocation.
V4L2_CUDA_MEM_TYPE_* V4L2 controls for CUDA memory type configuration.

GPU Integration on Thor

The GPU acceleration provides enhanced capabilities:

  • Unified Memory Management: Coordinated allocation across CPU/GPU memory domains
  • Performance Optimization: Real-time performance monitoring and adaptive GPU tuning
  • Multi-Stream Support: Efficient GPU resource sharing for concurrent decoding streams
  • System Integration: Deep integration with Thor platform GPU capabilities

Platform-Specific Considerations

Platform Feature Thor (T264) Other Platforms
CUDA Support GPU-accelerated Direct CUDA API
Memory Types GPU-managed Standard allocation
Multi-stream GPU-scheduled Application managed
Resource limits Dynamic GPU allocation Static configuration
Note
Features marked as "Not supported for T264" may have equivalent functionality through GPU-accelerated paths or platform-specific alternatives.