Deploying nvOCDR to DeepStream#

The nvOCDR library wraps the entire inference pipeline for optical character detection and recognition (OCR). This library consumes OCDNet and OCRNet models that are trained on TAO. Whether you are building a surveillance system, a traffic monitoring application, or any other type of video-analytics solution, the nvOCDR library is an essential tool for achieving accurate and reliable results. This guide will walk through the steps for intgerating the nvOCDR library into DeepStream. Refer to the nvOCDR documentation for more information about nvOCDR.

Get Trained OCDNet and OCRNet Models# To deploy nvOCDR in DeepStream, you need to first train the OCDNet and OCRNet models with TAO. You can either get started with models from the NVIDIA TAO PTM (Pretrained Model) or can train your own model with TAO. Refer to the training documentation for OCDNet and OCRNet to learn how to train your own model. Download TAO PTM from NGC# Note Refer to NGC to set up your environment to run ngc commands. You can download the pretrained OCDNet and OCRNet models with the following commands: mkdir -p pretrained_models ngc registry model download-version nvidia/tao/ocdnet:deployable_v1.0 --dest ./pretrained_models ngc registry model download-version nvidia/tao/ocrnet:deployable_v1.0 --dest ./pretrained_models A character_list.txt file is included with the pretrained OCRNet ONNX model. This is the vocabulary of the trained OCRNet model and is consumed by the nvOCDR library. Refer to the Preparing the Dataset section of the OCRNet documentation for more information about the character_list.txt .

Set Up the Software Environment# Once you have the pretrained OCD/OCRNet models, you can build the software environment with one-click script or follow the step by step guidance. One-Click Script:# You could find the script under deepstream in nvOCDR repository. ./build_docker.sh <path_to_ocdnet_onnx> <path_to_ocrnet_onnx> <path_to_ocr_character_list> \ <ocdnet_height> <ocdnet_width> \ <ocdnet_max_batch_size> <gpu_id> Step by Step# If you need to set up the DeepStream development environment step by step, then: On x86 platforms, you can start from the following container: docker run --gpus = all -v <work_path>:<work_path> --rm -it --privileged --net = host nvcr.io/nvidia/deepstream:6.2-triton bash # install opencv apt update && apt install -y libopencv-dev

On Jetson platforms, you can start from the L4T container: docker run --gpus = all -v <work_path>:<work_path> --rm -it --privileged --net = host nvcr.io/nvidia/deepstream-l4t:6.2-triton bash # install opencv apt update && apt install -y libopencv-dev On Jetson platforms, you can also install Jetpack version 5.1 or greater and run the following command to install opencv: # install opencv apt update && apt install -y libopencv-dev Note If you’re using TensorRT 8.6 and above, you can skip the following steps of compiling TensorRT OSS plugin. Next, you need to compile the TensorRT OSS plugin because OCDNet requires modulatedDeformConvPlugin : Get the TensorRT repository: git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git cd TensorRT git submodule update --init --recursive Compile the TensorRT libnvinfer_plugin.so file: mkdir build && cd build # On X86 platform cmake .. # On Jetson platform # cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ make nvinfer_plugin -j4 Copy the library to the system library path: cp libnvinfer_plugin.so.8.6.0 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2 # On Jetson platform: # cp libnvinfer_plugin.so.8.6.0 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2

Generate a TensorRT Engine for OCDNet and OCRNet# After setting up the envrionment, you need to generate the TensorRT engine for the OCDNet and OCRNet models. This engine is used for running the models on the GPU. Use the following commands to generate TRT engines for OCDNet and OCRNet with dynamic batch size and specific height and weights: Generate the OCDNet TRT engine with trtexec : /usr/src/tensorrt/bin/trtexec --onnx = <path_to_pretrained ocdnet.onnx> --minShapes = input:1x3x736x1280 --optShapes = input:1x3x736x1280 --maxShapes = input:4x3x736x1280 --fp16 --saveEngine = <work_path>/ocdnet.fp16.engine

Generate the OCRNet TRT engine with trtexec : /usr/src/tensorrt/bin/trtexec --onnx = <path_to_pretrained ocrnet.onnx> --minShapes = input:1x1x32x100 --optShapes = input:32x1x32x100 --maxShapes = input:32x1x32x100 --fp16 --saveEngine = <work_path>/ocrnet.fp16.engine

Build the nvOCDR Library and DeepStream Intermedia Library# You must build the nvOCDR library and the DeepStream intermedia library. These libraries are used for integrating the trained models into the DeepStream pipeline. Get the nvOCDR repository: git clone https://github.com/NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution.git

Compile the libnvocdr.so nvOCDR library: cd NVIDIA-Optical-Character-Detection-and-Recognition-Solution make export LD_LIBRARY_PATH = $( pwd )

Compile the libnvocdr_impl.so nvOCDR intermedia library for DeepStream: cd deepstream make

Run the nvOCDR DeepStream Sample:# Finally, you can run the nvOCDR DeepStream sample to test the integration of the trained models into the DeepStream pipeline. You can build the DeepStream OCR pipeline with gst-launch-1.0 or run it with a C++ sample on github Running the Pipeline with gst-launch-1.0# The following command runs a JPEG-image input pipeline with input batch-size=1 . The output image is saved to output.jpg : gst-launch-1.0 filesrc location = <path_to_test_img> ! jpegparse ! nvv4l2decoder ! \ m.sink_0 nvstreammux name = m batch-size = 1 width = 1280 height = 1080 ! \ nvdsvideotemplate customlib-name = <path to libnvocdr_impl.so> \ customlib-props = "ocdnet-engine-path:<path to ocdnet.fp16.engine>" \ customlib-props = "ocdnet-input-shape:3,736,1280" \ customlib-props = "ocdnet-binarize-threshold:0.1" \ customlib-props = "ocdnet-polygon-threshold:0.3" \ customlib-props = "ocdnet-max-candidate:200" \ customlib-props = "ocrnet-engine-path:<path to ocrnet.fp16.engine>" \ customlib-props = "ocrnet-dict-path:<path to character_list.txt>" \ customlib-props = "ocrnet-input-shape:1,32,100" ! \ nvmultistreamtiler rows = 1 columns = 1 width = 1280 height = 720 ! nvvideoconvert ! nvdsosd ! \ nvvideoconvert ! 'video/x-raw,format=I420' ! jpegenc ! jpegparse ! filesink location = output.jpg

The following command runs a JPEG-image input pipeline with input batch-size=2 : gst-launch-1.0 filesrc location = <path_to_test_img> ! jpegparse ! nvv4l2decoder ! \ m.sink_0 nvstreammux name = m batch-size = 2 width = 1280 height = 1080 ! \ nvdsvideotemplate customlib-name = <path to libnvocdr_impl.so> \ customlib-props = "ocdnet-engine-path:<path to ocdnet.fp16.engine>" \ customlib-props = "ocdnet-input-shape:3,736,1280" \ customlib-props = "ocdnet-binarize-threshold:0.1" \ customlib-props = "ocdnet-polygon-threshold:0.3" \ customlib-props = "ocdnet-max-candidate:200" \ customlib-props = "ocrnet-engine-path:<path to ocrnet.fp16.engine>" \ customlib-props = "ocrnet-dict-path:<path to character_list.txt>" \ customlib-props = "ocrnet-input-shape:1,32,100" ! \ nvmultistreamtiler rows = 1 columns = 2 width = 1280 height = 720 ! nvvideoconvert ! nvdsosd ! \ nvvideoconvert ! 'video/x-raw,format=I420' ! jpegenc ! jpegparse ! filesink location = output.jpg \ filesrc location = <path to test image> ! jpegparse ! nvv4l2decoder ! m.sink_1 Note If you run into JPEG decoding issue on Jetson devices, try replacing the hardware decoder with the software decoder: jpegparse ! jpegdec ! nvvideoconvert ! "video/x-raw(memory:NVMM), format=NV12" The following command runs an MP4-video input pipeline with batch-size=1 . The output video is saved to output.mp4 : gst-launch-1.0 filesrc location = <path to test.mp4> ! qtdemux ! h264parse ! nvv4l2decoder ! \ m.sink_0 nvstreammux name = m batch-size = 1 width = 1280 height = 1080 ! \ nvdsvideotemplate customlib-name = <path to libnvocdr_impl.so> \ customlib-props = "ocdnet-engine-path:<path to ocdnet.fp16.engine>" \ customlib-props = "ocdnet-input-shape:3,736,1280" \ customlib-props = "ocdnet-binarize-threshold:0.1" \ customlib-props = "ocdnet-polygon-threshold:0.3" \ customlib-props = "ocdnet-max-candidate:200" \ customlib-props = "ocrnet-engine-path:<path to ocrnet.fp16.engine>" \ customlib-props = "ocrnet-dict-path:<path to character_list.txt>" \ customlib-props = "ocrnet-input-shape:1,32,100" ! \ nvmultistreamtiler rows = 1 columns = 1 width = 1280 height = 720 ! nvvideoconvert ! nvdsosd ! \ nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! nvv4l2h264enc ! h264parse ! \ mux.video_0 qtmux name = mux ! filesink location = output.mp4 You can download a test video from this link. Configuring the nvOCDR Library# You can configure the nvOCDR library parameters using the customlib-props arguments of nvdsvideotemplate . This is the template for setting the parameters: nvdsvideotemplate customlib-name = libnvocdr_impl.so customlib-props = "<nvOCDR attribute>:<nvOCDR attr value>" Parameter Data Type Default Description Supported ocdnet-engine-path String – The absolute path to the OCDNet TensorRT engine – ocdnet-input-shape

String

–

The input shape (in CHW format) of the OCDNet TensorRT engine. Channel, height, and width are separated with commas.



ocdnet-binarize-threshold Float – The threshold value to binarize the OCDNet output >0 ocdnet-unclip-ratio Float 1.5 The unclip ratio of the detected text region, which determines the output size >0 ocdnet-polygon-threshold

Float

–

The threshold value to filter the polygons generated from the OCDNet postprocess based on the confidence score of polygons [0, 1]

ocdnet-max-candidate Unsigned int – The maximum output polygons from OCDNet >0 rectifier-upsidedown

Unsigned int

0

The flag to enable upside-down processing in the Rectifier module. Set this option to 1 to enable nvOCDR to recognize characters totally upside down. The default value is 0.

0, 1 ocrnet-engine-path String – The absolute path to the OCRNet TensorRT engine – ocrnet-dict-path String – The absolute path to the OCRNet vocabulary file – ocrnet-input-shape

String

–

The input shape (in CHW format) of the OCRNet TensorRT engine. Channel, height, and width are separated with commas.



is_high_resolution Unsigned int 0 The flag to enable crop-based inference for high resolution input 0, 1 overlap-ratio Float 0.5 The overlap ratio of cropped patches for crop-based inference [0, 1] ocrnet-decode String CTC The decode mode of OCRNet CTC, Attnetion