Deploying to DeepStream for FasterRCNN#

The deep learning and computer vision models that you’ve trained can be deployed on edge devices, such as a Jetson Xavier or Jetson Nano, a discrete GPU, or in the cloud with NVIDIA GPUs. TAO has been designed to integrate with DeepStream SDK, so models trained with TAO will work out of the box with DeepStream SDK.

DeepStream SDK is a streaming analytic toolkit to accelerate building AI-based video analytic applications. This section will describe how to deploy your trained model to DeepStream SDK.

To deploy a model trained by TAO to DeepStream we have two options:

Option 1 : Integrate the .etlt model directly in the DeepStream app. The model file is generated by export.

Option 2 : Generate a device-specific optimized TensorRT engine using TAO Deploy. The generated TensorRT engine file can also be ingested by DeepStream.

Option 3 (Deprecated for x86 devices): Generate a device-specific optimized TensorRT engine using TAO Converter.

Machine-specific optimizations are done as part of the engine creation process, so a distinct engine should be generated for each environment and hardware configuration. If the TensorRT or CUDA libraries of the inference environment are updated (including minor version updates), or if a new model is generated, new engines need to be generated. Running an engine that was generated with a different version of TensorRT and CUDA is not supported and will cause unknown behavior that affects inference speed, accuracy, and stability, or it may fail to run altogether.

Option 1 is very straightforward. The .etlt file and calibration cache are directly used by DeepStream. DeepStream will automatically generate the TensorRT engine file and then run inference. TensorRT engine generation can take some time depending on size of the model and type of hardware.

Engine generation can be done ahead of time with Option 2: TAO Deploy is used to convert the .etlt file to TensorRT; this file is then provided directly to DeepStream. The TAO Deploy workflow is similar to TAO Converter, which is deprecated for x86 devices from TAO version 4.0.x but is still required for deployment to Jetson devices.

See the Exporting the Model section for more details on how to export a TAO model.

TensorRT Open Source Software (OSS)# Important As of 5.0.0, tao model converter is deprecated. This method may not be available in the future releases. This section is only applicable if you’re still using tao model converter for legacy. For tao deploy , please jump to Integrating FasterRCNN Model. TensorRT OSS build is required for FasterRCNN models. This is required because several TensorRT plugins that are required by these models are only available in TensorRT open source repo and not in the general TensorRT release. Specifically, for FasterRCNN, we need the cropAndResizePlugin and proposalPlugin . If the deployment platform is x86 with NVIDIA GPU, follow instructions for x86. If your deployment is on NVIDIA Jetson platform, follow instructions for Jetson. TensorRT OSS on x86# Building TensorRT OSS on x86: Install Cmake (>=3.13). Note TensorRT OSS requires cmake >= v3.13, so install cmake 3.13 if your cmake version is lower than 3.13c sudo apt remove --purge --auto-remove cmake wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz tar xvf cmake-3.13.5.tar.gz cd cmake-3.13.5/ ./configure make -j $( nproc ) sudo make install sudo ln -s /usr/local/bin/cmake /usr/bin/cmake Get GPU architecture. The GPU_ARCHS value can be retrieved by the deviceQuery CUDA sample: cd /usr/local/cuda/samples/1_Utilities/deviceQuery sudo make ./deviceQuery If the /usr/local/cuda/samples doesn’t exist in your system, you could download deviceQuery.cpp from this GitHub repo. Compile and run deviceQuery . nvcc deviceQuery.cpp -o deviceQuery ./deviceQuery This command will output something like this, which indicates the GPU_ARCHS is 75 based on CUDA Capability major/minor version. Detected 2 CUDA Capable device ( s ) Device 0 : "Tesla T4" CUDA Driver Version / Runtime Version 10 .2 / 10 .2 CUDA Capability Major/Minor version number: 7 .5 Build TensorRT OSS: git clone -b 21 .08 https://github.com/nvidia/TensorRT cd TensorRT/ git submodule update --init --recursive export TRT_SOURCE = ` pwd ` cd $TRT_SOURCE mkdir -p build && cd build Note Make sure your GPU_ARCHS from step 2 is in TensorRT OSS CMakeLists.txt . If GPU_ARCHS is not in TensorRT OSS CMakeLists.txt , add -DGPU_ARCHS=<VER> as below, where <VER> represents GPU_ARCHS from step 2. /usr/local/bin/cmake .. -DGPU_ARCHS = xy -DTRT_LIB_DIR = /usr/lib/x86_64-linux-gnu/ -DCMAKE_C_COMPILER = /usr/bin/gcc -DTRT_BIN_DIR = ` pwd ` /out make nvinfer_plugin -j $( nproc ) After building ends successfully, libnvinfer_plugin.so* will be generated under `pwd`/out/. Replace the original libnvinfer_plugin.so* : sudo mv /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.x.y ${ HOME } /libnvinfer_plugin.so.8.x.y.bak // backup original libnvinfer_plugin.so.x.y sudo cp $TRT_SOURCE / ` pwd ` /out/libnvinfer_plugin.so.8.m.n /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.x.y sudo ldconfig TensorRT OSS on Jetson (ARM64)# Install Cmake (>=3.13) Note TensorRT OSS requires cmake >= v3.13, while the default cmake on Jetson/Ubuntu 18.04 is cmake 3.10.2. Upgrade TensorRT OSS using: sudo apt remove --purge --auto-remove cmake wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz tar xvf cmake-3.13.5.tar.gz cd cmake-3.13.5/ ./configure make -j $( nproc ) sudo make install sudo ln -s /usr/local/bin/cmake /usr/bin/cmake Get GPU architecture based on your platform. The GPU_ARCHS for different Jetson platform are given in the following table. Jetson Platform GPU_ARCHS Nano/Tx1 53 Tx2 62 AGX Xavier/Xavier NX 72 Build TensorRT OSS: git clone -b 21 .03 https://github.com/nvidia/TensorRT cd TensorRT/ git submodule update --init --recursive export TRT_SOURCE = ` pwd ` cd $TRT_SOURCE mkdir -p build && cd build Note The -DGPU_ARCHS=72 below is for Xavier or NX, for other Jetson platform, change 72 referring to GPU_ARCHS from step 2. /usr/local/bin/cmake .. -DGPU_ARCHS = 72 -DTRT_LIB_DIR = /usr/lib/aarch64-linux-gnu/ -DCMAKE_C_COMPILER = /usr/bin/gcc -DTRT_BIN_DIR = ` pwd ` /out make nvinfer_plugin -j $( nproc ) After building ends successfully, libnvinfer_plugin.so* will be generated under ‘pwd’/out/. Replace "libnvinfer_plugin.so*" with the newly generated. sudo mv /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.x.y ${ HOME } /libnvinfer_plugin.so.8.x.y.bak // backup original libnvinfer_plugin.so.x.y sudo cp ` pwd ` /out/libnvinfer_plugin.so.8.m.n /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.x.y sudo ldconfig