Frequently Asked Questions

How do I uninstall DeepStream?

• For dGPU:

To remove all previous DeepStream 3.0 or prior installations, enter the command:

$ sudo rm -rf /usr/local/deepstream /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstnv* /usr/bin/deepstream* /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libnvdsgst*

/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream*

/opt/nvidia/deepstream/deepstream*

$ sudo rm -rf /usr/lib/x86_64-linux-gnu/libv41/plugins/libcuvidv4l2_plugin.so

To remove DeepStream 4.0 or later installations:

1. Open the uninstall.sh file in /opt/nvidia/deepstream/deepstream/

2. Set PREV_DS_VER as 4.0

3. Run the script as sudo: ./uninstall.sh

• For Jetson: Flash the target device with the latest release of JetPack.

What types of input streams does DeepStream 5.0 support?

It supports H.264, H.265, JPEG, and MJPEG streams

What’s the throughput of H.264 and H.265 decode on dGPU (Tesla)?

See https://developer.nvidia.com/nvidia-video-codec-sdk for information.

How can I run the DeepStream sample application in debug mode?

Enter this command:

$ deepstream-app -c <config> --gst-debug=<debug#>

Where:

• <config> is the pathname of the configuration file

• <debug#> is a number specifying the amount of detail in the debugging output

For information about debugging tools, see:

https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html

Where can I find the DeepStream sample applications?

The DeepStream sample applications are located at:

<DeepStream installation dir>/sources/apps/sample_apps/

The configuration files for the sample applications are located at:

<DeepStream installation dir>/samples/configs/deepstream-app

For more information, see the NVIDIA DeepStream SDK Development Guide.

How can I verify that CUDA was installed correctly?

Check the CUDA version:

$ nvcc --version

How can I interpret frames per second (FPS) display information on console?

The FPS number shown on the console when deepstream-app runs is an average over the most recent five seconds. The number in brackets is average FPS over the entire run. The numbers are displayed per stream. The performance measurement interval is set by the perf-measurement-interval-sec setting in the configuration file.

My DeepStream performance is lower than expected. How can I determine the reason?

See the Troubleshooting chapter of DeepStream 5.0 Plugin Manual.

How can I specify RTSP streaming of DeepStream output?

You can enable remote display by adding an RTSP sink in the application configuration file. The sample configuration file source30_1080p_dec_infer-resnet_tiled_display_int8.txt has an example of this in the [sink2] section. You must set the enable flag to 1.

Once you enable remote display, the application prints the RTSP URL, which you can open in any media player like VLC.

What is the official DeepStream Docker image and where do I get it?

You can download the official DeepStream Docker image from DeepStream docker image.

For dGPU, see: https://ngc.nvidia.com/containers/nvidia:deepstream

For Jetson, see: https://ngc.nvidia.com/containers/nvidia:deepstream-l4t

What is the recipe for creating my own Docker image?

Use the DeepStream container as the base image. Add your own custom layers on top of it using standard technique in Docker.

How can I display graphical output remotely over VNC? How can I determine whether X11 is running?

If the host machine is running X, starting VNC is trivial. Otherwise you must start X, then start VNC.

To determine whether X is running, check the DISPLAY environment variable.

If X is not running you must start it first, then run DeepStream with GUI, or set type to 1 or 3 under sink groups to select fakesink or save to a file. If you are using an NVIDIA® Tesla® V100 or P100 GPU Accelerator (both compute-only cards without a display), you must set type to 4 for DeepStream output RTSP streaming. See the NVIDIA DeepStream SDK Development Guide for sink settings.

Why does the deepstream-nvof-test application show the error message “Device Does NOT support Optical Flow Functionality” if run with NVIDIA Tesla P4 or NVIDIA Jetson Nano, Jetson TX2, or Jetson TX1?

Optical flow functionality is supported only on NVIDIA® Jetson AGX Xavier™ and on GPUs with Turing architecture (NVIDIA® T4, NVIDIA® GeForce® RTX 2080 etc.).

Why is a Gst-nvstreammux plugin required in DeepStream 4.0+?

Multiple source components like decoder, camera, etc. are connected to the Gst-nvstreammux plugin to form a batch.

This plugin is responsible for creating batch metadata, which is stored in the structure NvDsBatchMeta. This is the primary form of metadata in DeepStream 4.0.1.

All plugins downstream from Gst-nvstreammux work on NvDsBatchMeta to access metadata and fill in the metadata they generate.

Why is a Gst-nvegltransform plugin required on a Jetson platform upstream from Gst-nveglglessink?

On a Jetson platform Gst-nveglglessink works on EGLImage structures. Gst-nvegltranform is required to convert incoming data (wrapped in an NVMM structure) to an EGLImage instance. On a dGPU platform, Gst-nveglglessink works directly on data wrapped in an NVMM structure.

How to do I profile DeepStream pipeline?

You can use NVIDIA® Nsight™ Systems, a system-wide performance analysis tool. See https://developer.nvidia.com/nsight-systems for more details.

How can I check GPU and memory utilization on a dGPU system?

Enter nvidia-smi or nvidia-settings on the console.

What is the approximate memory utilization for 1080p streams on dGPU?

Use the table below as a guide to memory utilization in this case.

Note:

Width and height in Gst-nvstreammux are set to the input stream resolution specified in the configuration file.

The pipeline is: decoder→ nvstreammux→ nvinfer→ fakesink.

Batch size (Number of streams)	Decode memory	Gst-nvinfer memory
1	32 MB	333 MB
2	64 MB	341 MB
4	128 MB	359 MB
8	256 MB	391 MB
16	512 MB	457 MB

If input stream resolution and Gst-nvstreammux resolution (set in the configuration file) are the same, no additional GPU memory is allocated in Gst-nvstreammux.

If input stream resolution is not same as Gst-nvstreammux resolution, Gst-nvstreammux allocates memory of size:

Where:

• buffers is the number of Gst-nvstreammux output buffers (set to 4).

• width and height are the mux output width and height.

• mismatches is the number of sources with resolution mismatch.

This table shows some examples:

Example	Gst-nvstreammux width × height settings	Gst-nvstreammux GPU memory size
16 sources at 1920×1080 resolution	1280×720	4 * (1.5 * 1280 * 720) * 16 ≈ 84 MB
15 sources at 1280×720 resolution & one source at 1920×1080 resolution	1280×720	4 * (1.5 * 1280 * 720) * 1 ≈ 5.2 MB

What trackers are included in DeepStream and which one should I choose for my application?

DeepStream ships with three trackers: KLT, IOU, and NvDCF. The trackers vary from high performance to high accuracy. The trade-off table below can help you choose the best tracker for your applications. For more information about the trackers, read the “Gst-nvtracker” chapter in the DeepStream 5.0 Plugin Manual.

Tracker	Computational Load		Pros	Cons	Best Use Cases
Tracker	GPU	CPU	Pros	Cons	Best Use Cases
IOU	X	Very Low	Light-weight	No visual features for matching, so prone to frequent tracker ID switches and failures. Not suitable for fast moving scene.	Objects are sparsely located, with distinct sizes. Detector is expected to run every frame or very frequently (ex. every alternate frame).
KLT	X	High	Works reasonably well for simple scenes	High CPU utilization. Susceptible to change in the visual appearance due to noise and perturbations, such as shadow, non-rigid deformation, out-of-plane rotation, and partial occlusion. Cannot work on objects with low textures.	Objects with strong textures and simpler background. Ideal for high CPU resource availability.
NvDCF	Medium	Low	Highly robust against partial occlusions, shadow, and other transient visual changes. Less frequent ID switches.	Slower than KLT and IOU due to increased computational complexity. Reduces the total number of streams processed.	Multi-object, complex scenes with partial occlusion.

When deepstream-app is run in loop on Jetson AGX Xavier using “while true; do deepstream-app -c <config_file>; done;”, after a few iterations I see low FPS for certain iterations.

This may happen when you are running thirty 1080p streams at 30 frames/second. The issue is caused by initial load. I/O operations bog down the CPU, and with qos=1 as a default property of the [sink0] group, decodebin starts dropping frames. To avoid this, set qos=0 in the [sink0] group in the configuration file.

Why do I get the error “Makefile:13: *** "CUDA_VER is not set". Stop” when I compile DeepStream sample applications?

Export this environment variable:

• For dGPU: CUDA_VER=10.2

• For Jetson: CUDA_VER=10.2

Then compile again.

How can I construct the DeepStream GStreamer pipeline?

Here are few examples of how to construct the pipeline. To run these example pipelines as-is, run the applications from the samples directory:

• V4l2 decoder→ nvinfer→ nvtracker→ nvinfer (secondary)→ nvmultistreamtiler→ nvdsosd→ nveglglessink

• For multistream (4×1080p) operation on dGPU:

$ gst-launch-1.0 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=4 width=1920 height=1080 ! nvinfer config-file-path= configs/deepstream-app/config_infer_primary.txt batch-size=4 unique-id=1 ! nvtracker ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so ! nvinfer config-file-path= configs/deepstream-app/config_infer_secondary_carcolor.txt batch-size=16 unique-id=2 infer-on-gie-id=1 infer-on-class-ids=0 ! nvmultistreamtiler rows=2 columns=2 width=1280 height=720 ! nvvideoconvert ! nvdsosd ! nveglglessink filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_1 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_2 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_3

• For multistream (4×1080p) operation on Jetson:

$ gst-launch-1.0 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=4 width=1920 height=1080 ! nvinfer config-file-path= configs/deepstream-app/config_infer_primary.txt batch-size=4 unique-id=1 ! nvtracker ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so ! nvinfer config-file-path= configs/deepstream-app/config_infer_secondary_carcolor.txt batch-size=16 unique-id=2 infer-on-gie-id=1 infer-on-class-ids=0 ! nvmultistreamtiler rows=2 columns=2 width=1280 height=720 ! nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_1 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_2 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_3

• For single stream (1080p) operation on dGPU:

$ gst-launch-1.0 filesrc location= streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= configs/deepstream-app/config_infer_primary.txt batch-size=1 unique-id=1 ! nvtracker ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so ! nvinfer config-file-path= configs/deepstream-app/config_infer_secondary_carcolor.txt batch-size=16 unique-id=2 infer-on-gie-id=1 infer-on-class-ids=0 ! nvmultistreamtiler rows=1 columns=1 width=1280 height=720 ! nvvideoconvert ! nvdsosd ! nveglglessink

• For single stream (1080p) operation on Jetson:

• JPEG decode

• Using nvv4l2decoder on Jetson:

$ gst-launch-1.0 filesrc location= ./streams/sample_720p.jpg ! jpegparse ! nvv4l2decoder ! nvegltransform ! nveglglessink

• Using nvv4l2decoder on dGPU:

$ gst-launch-1.0 filesrc location= ./streams/sample_720p.jpg ! jpegparse ! nvv4l2decoder ! nveglglessink

• Using nvjpegdec on Jetson:

$ gst-launch-1.0 filesrc location= ./streams/sample_720p.jpg ! nvjpegdec ! nvegltransform ! nveglglessink

• Using nvjpegdec on dGPU:

$ gst-launch-1.0 filesrc location= ./streams/sample_720p.jpg ! nvjpegdec ! nveglglessink

• Dewarper

• On dGPU:

$ gst-launch-1.0 uridecodebin uri= file://`pwd`/../../../../samples/streams/sample_cam6.mp4 ! nvvideoconvert ! nvdewarper source-id=6 num-output-buffers=4 config-file=config_dewarper.txt ! m.sink_0 nvstreammux name=m width=1280 height=720 batch-size=4 batched-push-timeout=100000 num-surfaces-per-frame=4 ! nvmultistreamtiler rows=1 columns=1 width=720 height=576 ! nvvideoconvert ! nveglglessink

• On Jetson:

Note:

This Gst pipeline must be run from the dewarper test application directory, sources/apps/sample_apps/deepstream-dewarper-test.

This pipeline runs only for four surfaces. To run for one, two, or three surfaces, use the dewarper test application.

• Dsexample

• On dGPU:

$ gst-launch-1.0 filesrc location = ./streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1280 height=720 batch-size=1 ! nvinfer config-file-path= ./configs/deepstream-app/config_infer_primary.txt ! dsexample full-frame=1 ! nvvideoconvert ! nvdsosd ! nveglglessink sync=0

• On Jetson:

Why am I getting “ImportError: No module named google.protobuf.internal when running convert_to_uff.py on Jetson AGX Xavier”?

If you set up Tensorflow using https://elinux.org/Jetson_Zoo#TensorFlow, please use Python 3 for running convert_to_uff.py:

$ python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py

Does DeepStream Support 10 Bit Video streams?

Support for 10-bit decode (P010_10LE) is present, although most of the components work on 8-bit input, it is suggested to use nvvideoconvert to transform stream from 10-bit to 8-bit, and then add the relevant components.

Sometimes, RTSP output from DeepStream is not observed remotely

Try running following pipeline to see if there is issue in network. With this user should see output.

gst-launch-1.0 uridecodebin uri=rtsp://<rtsp link> ! nveglglessink sync=0 on remote machine.

What is the difference between batch-size of nvstreammux and nvinfer? What are the recommended values for nvstreammux' batch-size?

nvstreammux' batch-size is the number of buffers(frames) it will batch together in one muxed buffer. nvinfer's batch-size is the number of frame(primary-mode)/objects(secondary-mode) it will infer together.

We recommend that the nvstreammux' batch-size be set to either number of sources linked to it or the primary nvinfer's batch-size.

Why do some caffemodels fail to build after upgrading to DeepStream 5.0?

DeepStream 5.0 uses explicit batch dimension for caffemodels. Some caffemodels use TensorRT plugins/layers which have not been updated for explicit batch dimensions. Add "force-implicit-batch-dim=1" in the nvinfer config file for such models to build the models using implicit batch dimension networks.

How do I configure the pipeline to get NTP timestamps?

To get NTP timestamps, set attach-sys-ts property to FALSE on nvstreammux component.

Why is the NTP timestamp value 0?

If NTP timestamp is 0, it suggests that you are not receiving NTP timestamp from RTCP sender report. You can verify this using a tool like Wireshark.

How to handle operations not supported by Triton Inference Server?

For details on handling unsupported operations, see:

https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/custom_operation.html.

The custom library mentioned in the document can be loaded in the DeepStream application by one of the following methods:

• Running the application as

LD_PRELOAD=./libcustomOp.so deepstream-app -c <app-config>

• Add the custom-lib path in "nvinferserver" config file as

infer_config {

...

custom_lib { path: "./libcustomOp.so" }

}

Why is the NTP timestamp value 0?

If NTP timestamp is 0, it suggests that you are not receiving NTP timestamp from RTCP sender report. You can verify this using a tool like Wireshark.

Why do I see confidence value as -0.1.?

If "Group Rectangles" mode of clustering is chosen then confidence value is set to -0.1 since the algorithm does not preserve confidence value.

Also, for the objects being tracked by the tracker but those not detected by the inference component, confidence value is set to -0.1.

Why do I see tracker_confidence value as -0.1.?

If you are using IOU or KLT tracker tracker_confidence is set to -0.1 to indicate that underlying tracker algorithm does not generate tracking confidence.

tracker_confidence value is set only for nvdcf tracker.

Why I see below Error while processing H265 RTSP stream?

Error: gstrtph265depay.c:1196:gst_rtp_h265_finish_fragmentation_unit: assertion failed:(outsize >= 4)

This issue is observed from h265depay gstreamer plugin component when size of rtp payload is less than 4. The component throws assertion.

This invalid packet size could be because of packet corruption. To overcome this issue, user needs to ignore assertion and handle such errors.

Required modification in the code is present at https://forums.developer.nvidia.com/t/deepstream-sdk-faq/80236. User needs to compile the code and place the lib at the appropriate location.

Why do I observe : A lot of buffers are being dropped. When running deepstream-nvdsanalytics-test application on Jetson Nano ?

By default nvdsanalytics test application uses primary detector every frame as well as NvDCF tracker and nveglglessink, all the above component consume GPU resources, due to which application is not executing in real-time. To overcome this user can do the following.

1. Change tracker from NvDCF to KLT in deepstream_nvdsanalytics_test.cpp

2. Change video renderer from nveglglessink to nvoverlaysink, and also change nvegltransform element to "identity" while doing so.

3. If above two steps don’t help, user can increase “interval” property to value greater than 0 in nvdsanalytics_pgie_config.txt and check

Why do I observe: A lot of buffers are being dropped. When running live camera streams even for few or single stream, also output looks jittery?

For live streams, nvstreammux element ‘live-source’ property should be set as 1, Along with it sink/renderer element’s ‘sync’ and ‘qos’ property should be set as 0 or FALSE.

Why sometimes for RTSP source used in gst-launch pipeline through uridecodebin shows blank screen and following error WARNING: from element /GstPipeline:pipeline0/GstNvStreamMux:m: No Sources found at the input of muxer. Waiting for sources?

At times the requested muxer pad gets deleted before linking happens, as streams might contain both video and audio. If queue element is added between nvstreammux and the uridecodebin then the above pipeline will work. As uridecodebin will link to queue pad and not nvstreammux pad. This problem is not observed programmatically as the linking takes place new pad callback of decoder on video stream.

What if I do not get expected 30 FPS from camera using v4l2src plugin in pipeline but instead get 15 FPS or less than 30 FPS?

This could be possible due to exposure or lighting conditions around camera however this can be fixed by changing camera settings through below reference commands to change the exposure settings.

v4l2-ctl -d /dev/video0 --list-ctrls

v4l2-ctl --set-ctrl=exposure_auto=1

v4l2-ctl --set-ctrl=exposure_absolute=300

On Jetson platform, I get same output when multiple Jpeg images are fed to nvv4l2decoder using multifilesrc plugin
e.g. multifilesrc location = frame%d.jpeg ! jpegparse ! nvv4l2decoder ! nvegltransform ! nveglglessink

on Jetson platfroms nvv4l2decoder needs to set property mjpeg=1 in order to work with multifilesrc.

How do I obtain individual sources after batched inferencing/processing? What are the sample pipelines for nvstreamdemux?

Some sample nvstreamdemux pipelines:

gst-launch-1.0 filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_0 \

filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_1 \

filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_2 \

filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_3 \

nvstreammux name=m width=1920 height=1080 batch-size=4 batched-push-timeout=40000 ! \

queue ! nvinfer config-file-path=<config> batch-size=4 ! \

queue ! nvtracker ll-lib-file=<lib-file> ! \

nvstreamdemux name=d \

d.src_0 ! queue ! nvvideoconvert ! nvdsosd ! nveglglessink \

d.src_1 ! queue ! nvvideoconvert ! nvdsosd ! nveglglessink \

d.src_2 ! queue ! nvvideoconvert ! nvdsosd ! nveglglessink \

d.src_3 ! queue ! nvvideoconvert ! nvdsosd ! nveglglessink

NOTE: queue element should be inserted after every nvstreamdemux src pad.

It is not required to demux all sources / create all nvstreamdemux src pad. Also, the downstream pipeline for every source may be different. Sample pipeline:

gst-launch-1.0 filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_0 \

filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_1 \

filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_2 \

filesrc location = sample_1080p_h264.mp4 ! decodebin ! m.sink_3 \

nvstreammux name=m width=1920 height=1080 batch-size=4 batched-push-timeout=40000 ! \

queue ! nvinfer config-file-path=<config> batch-size=4 ! \

queue ! nvtracker ll-lib-file=<lib-file> ! \

nvstreamdemux name=d \

d.src_1 ! queue ! nvvideoconvert ! nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=out.mp4 \

d.src_2 ! queue ! nvvideoconvert ! nvdsosd ! nveglglessink

Why do I encounter such error while running Deepstream pipeline “memory type configured and i/p buffer mismatch ip_surf 0 muxer 3”?

This error is observed on dGPU, when NvStreamMux is configured for memory type 3, i.e., NVBUF_MEM_CUDA_UNIFIED and the input surface to the nvstreammux has the memory type 0 i.e., NVBUF_MEM_CUDA_DEFAULT (Cuda device for dGPU). The resolution of input surface is same as nvstreammux configured resolution, in such scenario the nvstreammux tries to send the original buffer on its sinkpad to downstream muxed with buffers from other sources, but due to different configured memory type of nvstreammux it can't do the same. To get around this ensure that all the sources connected to nvstreammux are generating same type of memory and configure the nvstreammux memory to the same type. Alternatively, if there is scaling involved in nvstreammux, then this error won’t be encountered.

How does secondary GIE crop and resize objects?

SGIE will crop the object from NvStreamMux buffer using the object's bbox detected by the Primary GIE. The crop is then scaled/converted to the network resolution/color format. For example, if the NvStreamMux resolution is 1920x1080, SGIE will crop using object bbox co-ordinates (e.g. x=1000, y=20, w=400, y=500) from the 1920x1080 image and then scale it to the SGIE network resolution (say 224x224). In practice, the object crop + scaling + color conversion happens in one go.

How to save frames from GstBuffer?

To save frames from gst buffer user needs to follow below procedure:

You need to Map gst buffer using gst_buffer_map ()API.

Below is pseudo code:

GstMapInfo in_map_info;

NvBufSurface *surface = NULL;

memset (&in_map_info, 0, sizeof (in_map_info));

if (!gst_buffer_map (inbuf, &in_map_info, GST_MAP_READ)) {

g_print ("Error: Failed to map gst buffer\n");

}

surface = (NvBufSurface *) in_map_info.data;

Now that you have access to NvBufSurface structure, you can access actual frame memory and save it

At the end you need to unmap gst buffer using gst_buffer_unmap (inbuf, &in_map_info

For more details, see gst_dsexample_transform_ip() in gst-dsexample plugin source code.

What are different Memory types supported on Jetson and dGPU?

Memory Type	Jetson	dGPU/X86_64
NVBUF_MEM_DEFAULT	Memory of type Surface Array which is 2D pitched allocated by default; used by all hardware accelerator on the platform; accessible by CPU using NvBufSurfaceMap() and NvBufSurfaceSyncForCpu() / NvBufSurfaceSyncForDevice() based on read write usage. GPU Access using EGLImageCreate and Map APIs	Memory of type Cuda Device is allocated by default, accessible only by GPU. User might need to have custom Cuda kernels to access or modify memory. Or NvBufSurfaceCopy to copy content into CPU accessible memory
NVBUF_MEM_CUDA_PINNED	Page Locked Memory allocated using cudaMallocHost(), accessible by CPU and GPU	Page Locked Memory allocated using cudaMallocHost(), accessible by CPU and GPU.
NVBUF_MEM_CUDA_DEVICE	Memory of type Cuda Device is allocated, accessible only by GPU. User might need to have custom Cuda kernels to access or modify memory. NvBufSurfaceCopy is not supported for Cuda memory on Jetson	Memory of type Cuda Device is allocated, accessible only by GPU. User might need to have custom Cuda kernels to access or modify memory. Or NvBufSurfaceCopy to copy content into CPU accessible memory
NVBUF_MEM_CUDA_UNIFIED	Unsupported	Unified Virtual Memory allocated using cudaMallocManaged(), accessible by CPU and multiple GPU
NVBUF_MEM_SURFACE_ARRAY	Memory of type Surface Array which is 2D pitched allocated by default; used by all hardware accelerator on the platform; accessible by CPU using NvBufSurfaceMap() and NvBufSurfaceSyncForCpu() / NvBufSurfaceSyncForDevice() based on usage read write usage. GPU Access using EGLImageCreate and Map APIs	Unsupported
NVBUF_MEM_HANDLE	Used internally for Jetson	Unsupported
NVBUF_MEM_SYSTEM	Allocated using malloc()	Allocated using malloc()

What are different Memory transformations supported on Jetson and dGPU?

dGPU: User can use NvBufSurfaceCopy() to copy from one memory type to another. If transformation is required, nvvideoconvert plugin support nvbuf-memory-type property to allow different type of memory. NvBufSurfTransform() can also be used to do the transformation between various Cuda types of memories. Cuda to NVBUF_MEM_SYSTEM transformation is not supported by NvBufSurfTransform directly, user can use NvBufSurfaceCopy() to copy into Cuda memory and perform transformation on that memory.

Jetson: User can use NvBufSurfaceCopy() to copy from one memory type to another, although Cuda memory copies are not supported directly. User can perform NvBufSurfTransform() for transformation from NVBUF_MEM_SURFACE_ARRAY/NVBUF_MEM_DEFAULT to Cuda Memory, but user need to use GPU as compute device for doing the transformation, as VIC doesn't support transformation to Cuda memory or NVBUF_MEM_SYTEM. Please refer NvBufSurfTransform APIs for the same.

Why does my image look distorted if I wrap my cudaMalloc'ed memory into NvBufSurface and provide to NvBufSurfTransform?

If you are not using NvBufSurfaceCreate for allocation, ensure the pitch of the allocated memory is multiple of 32. Also ensure that the starting address of each plane of the input is 128-byte aligned.

Why am I getting following waring when running deepstream app for first time?

"GStreamer-WARNING: Failed to load plugin '...libnvdsgst_inferserver.so': libtrtserver.so: cannot open shared object file: No such file or directory"

This is a harmless warning indicating that the DeepStream's nvinferserver plugin cannot be used since "Triton Inference Server" is not installed.

If required, try DeepStream's Triton docker image on dGPU or install the Triton Inference Server manually on Jetson.

For more details, refer to https://github.com/NVIDIA/triton-inference-server.

Smart Record

Does smart record module work with local video streams?

Yes. Smart record module expects encoded frames which can be from either local video or RTSP stream. But deepstream-test5-app only supports RTSP sources for smart record.

Are multiple parallel records on same source supported?

No. Only single record at a time on the same source is supported. You need to stop the ongoing record to start the new recording again.

What if I forgot to stop the recording?

There is default duration setting and if record is not stopped by stop event it would be stopped automatically as per default duration value.

I started the record with duration Can I stop it before that duration end?

Yes. Running recording instance can be stopped any time.

What if I don’t set default duration for smart record?

Default value of record duration is 10 seconds.

What if I don’t set video cache size for smart record?

Default value of video cache size is 30 seconds.

What is maximum duration of data I can cache as history for smart record?

As such there is no limit on cache size. It is limited by available system memory.

Can I record the video with bounding boxes and other information overlaid?

To have better performance and optimization, smart record avoids transcoding and caches only encoded frames. To that extent recording the video with overlaid bounding boxes is not possible. But you can implement that use case in two ways:

1. Run the inference pipeline on recorded video and save the output in a file using sink (type = filesink)

2. Add encoding components in the pipeline after OSD and then add smart record module.

Triton

Which Triton version is supported for deepstream 5.0 GA release?

Gst-nvinferserver is based on Triton Release 1.12.0 corresponding to NGC container 20.03 native CAPI.

For dGPU deepstream triton docker image is based on nvcr.io/nvidia/tritonserver:20.03-py3

For Jetson, Triton 1.12.0 library has been integrated into DeepStream package in release.

If you want to custom something in Triton or need upgrade Triton lib, make sure new version is compatible with Triton 1.12.0 (container 20.03).

Can Jetson platform support the same features as dGPU for Triton plugin?

Not exactly. dGPU can support most types of models such as TensorRT, Tensorflow (and TF-TRT), ONNX(and with TRT optimization), Pytorch.

Jetson can support TensorRT, Tensorflow (and TF-TRT). Support for other models are coming in future releases.

For more details, see GStreamer Plugin Details.

Can Gst-nvinfereserver (DeepSream Triton plugin) run on Nano platform?

Yes. But due to Nano’s memory limitation, performance of certain models is slow and even run into OOM (out of memory) issues, specifically on heavy Tensorflow models. There is an option to run CPU instance for certain models on Nano.

For more details, see samples/configs/deepstream-app-trtis/README

How to enable TensorRT optimization for Tensorflow and ONNX models?

To learn details TensorRT optimization setting in Triton models, see:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/optimization.html#framework-specific-optimization

TF-TRT is supported on both dGPU and Jetson platforms.

1. Open the model’s Triton config.pbtxt file

2. Make sure GPU instance enabled

3. Append tensorrt accelerator.(e.g. trtis_model_repo/ssd_mobilenet_v1_coco_2018_01_28/config.pbtxt)

optimization { execution_accelerators {

gpu_execution_accelerator : [ {

name : "tensorrt"

parameters { key: "precision_mode" value: "FP16" }

}] }}

For more on TF-TRT parameters, see TF-TRT API in Tensorflow 1.x. is_dynamic_op is set to True natively in Triton.

Additionally, you can generate offline TF-TRT models by their own script running with tensorflow environment.

Read TF-TRT User guide to generate offline(static) models. Once the original online model is replaced by offline model, remove ‘optimization’ block in case TF-TRT online runs again to overwrite offline TF-TRT caches.

ONNX is supported on dGPU only. TensorRT optimization can be enabled by

optimization { execution_accelerators {

gpu_execution_accelerator : [ { name : "tensorrt" } ]

}}

The TensorRT engine caches will be generated at run time. It is getting started during initialization or first frame come. This might take from several seconds even to minutes depends on how heavy the model is and how good platform it is.

How to tune GPU memory for Tensorflow models?

When running TensorFlow models using Triton Inference Server, the GPU device memory may fall short. The allowed GPU device memory allocation for TensorFlow models can be tuned using the 'tf_gpu_memory_fraction' parameter in the nvdsinferserver's config files (config_infer_*). For more details, see

samples/configs/deepstream-app-trtis/README

This parameter is same as tensorflow config’s per_process_gpu_memory_fraction, to understand more details, please see

• Tensorflow 1.x gpu-guide

• TF-TRT user guide

Can Gst-nvinferserver support models cross processes or containers?

No. The plugin is based on Triton Server CAPIs instead of client APIs. It doesn’t support client/server architecture. But the single process could run a standalone Triton model repo no matter how many models running together.

Can users set different model repos when running multiple Triton models in single process?

No. All config files for a same deepstream-app process must have same model_repo. Otherwise GSt-nvinferserver may report error or use random config on model_repo.

infer_config { trtis { model_repo {

root: “path/to/model_repo”

strict_model_config: true

tf_gpu_memory_fraction: 0.35

…

} } }

What is the difference between DeepStream classification and Triton classification?

Gst-nvinferserver plugin support 2 classification methods:

1. Use DeepStream plugin to parse classification output and select labels. Config this plugin’s postprocess block with labelfile_path, and classification options.

infer_config { postprocess {

labelfile_path: “path/to/classification_labels.txt”

classification { threshold: 0.5 }

} }

Example: samples/ configs/deepstream-app-trtis/config_infer_primary_classifier_inception_graphdef_postprocessInDS.txt

2. Use Triton native classification method. The label file configured in Triton model’s config.pbtxt (e.g. samples/trtis_model_repo/inception_graphdef/config.pbtxt)

output [ {

name: "InceptionV3/Predictions/Softmax"

data_type: TYPE_FP32

dims: [ 1001 ]

label_filename: "inception_labels.txt"

} ]

To enable it, need update Gst-nvinferserver’s config file with

infer_config { postprocess {

trtis_classification { topk:1 threshold: 0.5 }

} }

Example: samples/configs/deepstream-app-trtis/config_infer_primary_classifier_inception_graphdef_postprocessInTrtis.txt

Why reshape is there in some certain Triton config files?

Gst-nvinferserver plugin expect all models configuring input/output tensors shapes and a separate batch-size. Some certain models may define tensors shapes with batch-size inside or empty shapes. In these cases, Users can update Triton models’ config.pbtxt file to reshape input and output tensor dims.

For input tensors, it usually comes along with ‘max-batch-size: 0’ when the model is defined as full-dimensions. Specifically, on ONNX models. In that case, we may only support batch-size 1 to make reshape adapt Gst-nvinferserver plugin input and model input.

For more details, see:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_configuration.html#reshape

How to support Triton ensemble model?

See details in GStreamer Plugin Details.

Does Gst-nvinferserver support Triton multiple instance groups?

Yes. User can configure Triton model config.pbtxt with multiple instances on single GPU or CPU to make them running in parallel. If multiple instances configured on different settings(e.g. Run a instance on GPU, and another instance on GPU), Warmup the instances is recommended just in case the specific instance takes too long time for initialization on first frame which may cause timeout or even worse things in live streaming.

To enable multiple instances, see:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_configuration.html#instance-groups

To enable warmup, see:

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_configuration.html#model-warmup

Can Gst-nvinferserver support inference on multiple GPUs?

No, not yet. When running on multiple-gpu platform, Users need a specific single gpu-id for gpu instances. If no gpu-id is specified, all GPU instances would be running together by default. This could cause unexpected behaviors. Update config.pbtxt and specify single gpu-id explicitly.

instance_group {

count: 1

gpus: [0]

kind: KIND_GPU

}

Or specify single gpu in docker cmdline:

docker run -it --rm --gpus device=0 …

How to support Triton ensemble model?

See details in GStreamer Plugin Details.

What is batch-size differences for a single model in different config files (gie group in source…, config_inferserver.., and Triton model’s config.pbtxt)?

Take TensorRT Primary_Detector for example:

1. Gst-nvinferserver Plugin’s config file configs/deepstream-app-trtis/config_infer_plan_engine_primary.txt, defines

infer_config { max_batch_size: 30 }

This indicates the plugin can run with batch-size <= 30.

2. Deepstream-app’s config file source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt, defines

[primary-gie]

batch-size=4

config-file=config_infer_plan_engine_primary.txt

This config-file has top-priority and would overwrite config_infer_plan_engine_primary.txt -> max_batch_size to 4 at run time.

3. Triton models has its own model config file trtis_model_repo/Primary_Detector/config.pbtxt, which defines

max_batch_size: 30

This indicate the plan engine model resnet10.caffemodel_b30_gpu0_int8.engine in Triton backend can support batch-size <= 30.

You need to make sure batch-size in config-file of Gst-nvinferserver and deepstream-app must less than or equal to Triton backend’s max_batch_size.