Overview

This use case application showcases a typical appliance performing intelligent video analytics. Application areas include public safety, smart cities, and autonomous machines. This example demonstrates four (4) concurrent video streams going through a decoding process using the on chip decoders, video scaling using on chip scalar, and GPU compute. For simplicity of demonstration, only one of the channels will use NVIDIA^® TensorRT^™ to perform object identification and generate bounding box around the identified object. This sample also uses video converter functions to do various format convrsions and uses EGLImage to demonstrate buffer sharing and image display capability.

In this sample, object detection is limited to identifying cars in video streams of 960 x 540 resolution, running up to 14 FPS. The network is based on GoogleNet. The inference is performed on a frame-by-frame basis and no object tracking is involved. Note that this network is intended to be purely an example to showcase TensorRT usage to build the compute pipeline quickly. The trained GoogleNet is provided as source and was trained using NVIDIA DIGITS with roughly 3000 frames taken from 5-10 feet elevation. Varying levels of detection accuracy are expected based on the video samples fed in. Given that this sample is locked to perform at half-HD resolutions under 10 FPS, video feeds with higher FPS for inference will show stuttering during playback.

This sample does not require a camera.

The image below shows a sample block diagram.

The images below shows data flow details for the channel using TensorRT.

NvEGLImageFromFd is NVIDIA defined API to return an EGLImage pointer form the file descriptor buffer that is allocated by way of Tegra mechanism. EGLImage buffer is then used by TensorRT to render the bounding box to the image.

Prerequisites

Before running the sample, you must have the following:

CUDA Toolkit
TensorRT (previously known as GPU Inference Engine (GIE))
OpenCV4Tegra
README that provides details on the environment requirements to build and run the sample

Key Structure and Classes

The context_t structure (backend/v4l2_backend_test.h) manages all resources in sample applications.

Element	Description
NvVideoDecoder	Contains all video decoding-related elements and functions.
NvVideoConverter	Contains elements and functions for video format conversion.
NvEglRenderer	Contains all EGL display rendering-related functions.
EGLImageKHR	The EGLImage used for CUDA processing. This type is from the EGL open source graphical library.

NvVideoDecoder

The NvVideoDecoder class creates a new V4L2 Video Decoder. The following table describes the key NvVideoDecoder members that this sample uses.

Member	Description
NvV4l2Element::output_plane	Holds the V4L2 output plane.
NvV4l2Element::capture_plane	Holds the V4L2 capture plane.
NvVideoDecoder::createVideoDecoder	Static function to create video decode object.
NvV4l2Element::subscribeEvent	Subscribes event.
NvVideoDecoder::setExtControls	Sets external control to V4L2 device.
NvVideoDecoder::setOutputPlaneFormat	Sets output plane format.
NvVideoDecoder::setCapturePlaneFormat	Sets capture plane format.
NvV4l2Element::getControl	Gets the value of a control setting.
NvV4l2Element::dqEvent	Dequeues the devent reported by the V4L2 device.
NvV4l2Element::isInError	Checks if under error state.

NvVideoConverter

The NvVideoConverter class packages all video converting related elements and functions. It performs color space conversion, scaling and conversion between hardware buffer memory and software buffer memory. The following table describes the key NvVideoConverter members that this sample uses.

Member	Description
NvV4l2Element::output_plane	Holds the output plane.
NvV4l2Element::capture_plane	Holds the capture plane.
NvVideoConverter::waitForIdle	Waits until all the buffers queued on the output plane are converted and dequeued from the capture plane. This is a blocking method.
NvVideoConverter::setCapturePlaneFormat	Sets the format on the converter capture plane.
NvVideoConverter::setOutputPlaneFormat	Sets the format on the converter output plane.

NvVideoDecoder and NvVideoConverter contain two key elements: output_plane and capture_plane. These objects are instantiated from the NvV4l2ElementPlane class type.

NvV4l2ElementPlane

NvV4l2ElementPlane creates an NVv4l2Element plane. The following table describes the key NvV4l2ElementPlane members used in this sample. v4l2_buf is a local variable inside the NvV4l2ElementPlane::dqThreadCallback function and, thus, the scope exists only inside the callback function. If other functions of the sample must access this buffer, a prior copy of the buffer inside callback function is required.

Member	Description
NvV4l2ElementPlane::setupPlane	Sets up the plane of V4l2 element.
NvV4l2ElementPlane::deinitPlane	Destroys the plane of V4l2 element.
NvV4l2ElementPlane::setStreamStatus	Starts/Stops the stream.
NvV4l2ElementPlane::setDQThreadCallback	Sets the callback function of the `dqueue` buffer thread.
NvV4l2ElementPlane::startDQThread	Starts the thread of the `dqueue` buffer.
NvV4l2ElementPlane::stopDQThread	Stops the thread of the `dqueue` buffer.
NvV4l2ElementPlane::qBuffer	Queues a V4l2 buffer from the plane.
NvV4l2ElementPlane::dqBuffer	Dequeues a V4l2 buffer from the plane.
NvV4l2ElementPlane::getNumBuffers	Gets the number of the V4l2 buffer.
NvV4l2ElementPlane::getNumQueuedBuffers	Gets the number of the V4l2 buffer in the queue.
NvV4l2ElementPlane::getNthBuffer	Gets the `NvBuffer` queue object at index N.

GIE_Context

GIE_Context provides a series of interfaces to load Caffe model and perform inference. The following table describes the key GIE_Context members used in this sample.

GIE_Context	Description
GIE_Context::destroyGieContext	Destroys the GIE_context.
GIE_Context::getNumGieInstances	Gets the number of GIE_context instances.
GIE_Context::doInference	Interface for inference after TensorRT model is loaded.

Functions to Create/Destroy EGLImage

The sample uses 2 global functions to create and destroy EGLImage from dmabuf file descriptor. These functions are defined in nvbuf_utils.h.

Global Function	Description
NvEGLImageFromFd()	Creates EGLImage from dmabuf fd.
NvDestroyEGLImage()	Destroys the EGLImage.

Command Line Options

To run the sample, execute:

backend <channel-num> <in-file1> <in-file2>... <in-format> [options]

The following video formats are supported for use with command line options:

H.264
H.265

Options

Use the -h option to view the currently supported options.

For X11 technical details, see:

http://www.x.org/docs/X11/xlib.pdf

L4T Multimedia API Reference

27.1 Release