NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

Deploying nvOCDR to DeepStream

The nvOCDR library wraps the entire inference pipeline for optical character detection and recognition (OCR). This library consumes OCDNet and OCRNet models that are trained on TAO Toolkit. Whether you are building a surveillance system, a traffic monitoring application, or any other type of video-analytics solution, the nvOCDR library is an essential tool for achieving accurate and reliable results. This guide will walk through the steps for intgerating the nvOCDR library into DeepStream. Refer to the nvOCDR documentation for more information about nvOCDR.

To deploy nvOCDR in DeepStream, you need to first train the OCDNet and OCRNet models with TAO Toolkit. You can either get started with models from the NVIDIA TAO Toolkit PTM (Pretrained Model) or can train your own model with TAO Toolkit. Refer to the training documentation for OCDNet and OCRNet to learn how to train your own model.

Download TAO Toolkit PTM from NGC

Note

Refer to NGC to set up your environment to run ngc commands.

You can download the pretrained OCDNet and OCRNet models with the following commands:

Copy
Copied!
            

mkdir -p pretrained_models ngc registry model download-version nvidia/tao/ocdnet:deployable_v1.0 --dest ./pretrained_models ngc registry model download-version nvidia/tao/ocrnet:deployable_v1.0 --dest ./pretrained_models

A character_list.txt file is included with the pretrained OCRNet ONNX model. This is the vocabulary of the trained OCRNet model and is consumed by the nvOCDR library. Refer to the Preparing the Dataset section of the OCRNet documentation for more information about the character_list.txt.

Once you have the trained models, you need to set up the DeepStream development environment.

  • On x86 platforms, you can start from the following container:

    Copy
    Copied!
                

    docker run --gpus=all -v <work_path>:<work_path> --rm -it --privileged --net=host nvcr.io/nvidia/deepstream:6.2-triton bash # install opencv apt update && apt install -y libopencv-dev

  • On Jetson platforms, you can start from the L4T container:

    Copy
    Copied!
                

    docker run --gpus=all -v <work_path>:<work_path> --rm -it --privileged --net=host nvcr.io/nvidia/deepstream-l4t:6.2-triton bash # install opencv apt update && apt install -y libopencv-dev

    On Jetson platforms, you can also install Jetpack version 5.1 or greater and run the following command to install opencv:

    Copy
    Copied!
                

    # install opencv apt update && apt install -y libopencv-dev

Next, you need to compile the TensorRT OSS plugin since OCDNet requires modulatedDeformConvPlugin:

  1. Get the TensorRT repository:

    Copy
    Copied!
                

    git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git cd TensorRT git submodule update --init --recursive

  2. Compile the TensorRT libnvinfer_plugin.so file:

    Copy
    Copied!
                

    mkdir build && cd build # On X86 platform cmake .. # On Jetson platform # cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ make nvinfer_plugin -j4

  3. Copy the library to the system library path:

    Copy
    Copied!
                

    cp libnvinfer_plugin.so.8.6.0 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2 # On Jetson platform: # cp libnvinfer_plugin.so.8.6.0 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2

After setting up the envrionment, you need to generate the TensorRT engine for the OCDNet and OCRNet models. This engine is used for running the models on the GPU. Use the following commands to generate TRT engines for OCDNet and OCRNet with dynamic batch size and specific height and weights:

  • Generate the OCDNet TRT engine with trtexec:

    Copy
    Copied!
                

    /usr/src/tensorrt/bin/trtexec --onnx=<path_to_pretrained ocdnet.onnx> --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=<work_path>/ocdnet.fp16.engine

  • Generate the OCRNet TRT engine with trtexec:

    Copy
    Copied!
                

    /usr/src/tensorrt/bin/trtexec --onnx=<path_to_pretrained ocrnet.onnx> --minShapes=input:1x1x32x100 --optShapes=input:32x1x32x100 --maxShapes=input:32x1x32x100 --fp16 --saveEngine=<work_path>/ocrnet.fp16.engine

Next, you need to build the nvOCDR library and the DeepStream intermedia library. These libraries are used for integrating the trained models into the DeepStream pipeline.

  • Get the nvOCDR repository:

    Copy
    Copied!
                

    git clone https://github.com/NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution.git

  • Compile the libnvocdr.so nvOCDR library:

    Copy
    Copied!
                

    cd NVIDIA-Optical-Character-Detection-and-Recognition-Solution make export LD_LIBRARY_PATH=$(pwd)

  • Compile the libnvocdr_impl.so nvOCDR intermedia library for DeepStream:

    Copy
    Copied!
                

    cd deepstream make

Finally, you can run the nvOCDR DeepStream sample to test the integration of the trained models into the DeepStream pipeline. You can build the DeepStream OCR pipeline with gst-launch-1.0 or run it with a C++ sample on github

Running the Pipeline with gst-launch-1.0

  • The following command runs a JPEG-image input pipeline with input batch-size=1. The output image will be saved to output.jpg:

    Copy
    Copied!
                

    gst-launch-1.0 filesrc location=<path_to_test_img> ! jpegparse ! nvv4l2decoder ! \ m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=1080 ! \ nvdsvideotemplate customlib-name=<path to libnvocdr_impl.so> \ customlib-props="ocdnet-engine-path:<path to ocdnet.fp16.engine>" \ customlib-props="ocdnet-input-shape:3,736,1280" \ customlib-props="ocdnet-binarize-threshold:0.1" \ customlib-props="ocdnet-polygon-threshold:0.3" \ customlib-props="ocdnet-max-candidate:200" \ customlib-props="ocrnet-engine-path:<path to ocrnet.fp16.engine>" \ customlib-props="ocrnet-dict-path:<path to character_list.txt>" \ customlib-props="ocrnet-input-shape:1,32,100" ! \ nvmultistreamtiler rows=1 columns=1 width=1280 height=720 ! nvvideoconvert ! nvdsosd ! \ nvvideoconvert ! 'video/x-raw,format=I420' ! jpegenc ! jpegparse ! filesink location=output.jpg

  • The following command runs a JPEG-image input pipeline with input batch-size=2:

    Copy
    Copied!
                

    gst-launch-1.0 filesrc location=<path_to_test_img> ! jpegparse ! nvv4l2decoder ! \ m.sink_0 nvstreammux name=m batch-size=2 width=1280 height=1080 ! \ nvdsvideotemplate customlib-name=<path to libnvocdr_impl.so> \ customlib-props="ocdnet-engine-path:<path to ocdnet.fp16.engine>" \ customlib-props="ocdnet-input-shape:3,736,1280" \ customlib-props="ocdnet-binarize-threshold:0.1" \ customlib-props="ocdnet-polygon-threshold:0.3" \ customlib-props="ocdnet-max-candidate:200" \ customlib-props="ocrnet-engine-path:<path to ocrnet.fp16.engine>" \ customlib-props="ocrnet-dict-path:<path to character_list.txt>" \ customlib-props="ocrnet-input-shape:1,32,100" ! \ nvmultistreamtiler rows=1 columns=2 width=1280 height=720 ! nvvideoconvert ! nvdsosd ! \ nvvideoconvert ! 'video/x-raw,format=I420' ! jpegenc ! jpegparse ! filesink location=output.jpg \ filesrc location=<path to test image> ! jpegparse ! nvv4l2decoder ! m.sink_1

  • The following command runs an MP4-video input pipeline with batch-size=1. The output video will be saved to output.mp4:

    Copy
    Copied!
                

    gst-launch-1.0 filesrc location=<path to test.mp4> ! qtdemux ! h264parse ! nvv4l2decoder ! \ m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=1080 ! \ nvdsvideotemplate customlib-name=<path to libnvocdr_impl.so> \ customlib-props="ocdnet-engine-path:<path to ocdnet.fp16.engine>" \ customlib-props="ocdnet-input-shape:3,736,1280" \ customlib-props="ocdnet-binarize-threshold:0.1" \ customlib-props="ocdnet-polygon-threshold:0.3" \ customlib-props="ocdnet-max-candidate:200" \ customlib-props="ocrnet-engine-path:<path to ocrnet.fp16.engine>" \ customlib-props="ocrnet-dict-path:<path to character_list.txt>" \ customlib-props="ocrnet-input-shape:1,32,100" ! \ nvmultistreamtiler rows=1 columns=1 width=1280 height=720 ! nvvideoconvert ! nvdsosd ! \ nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! nvv4l2h264enc ! h264parse ! \ mux.video_0 qtmux name=mux ! filesink location=output.mp4

You can download a test video from this link.

Configuring the nvOCDR library

You can configure the nvOCDR library parameters using the customlib-props arguments of nvdsvideotemplate. This is the template for setting the parameters:

Copy
Copied!
            

nvdsvideotemplate customlib-name=libnvocdr_impl.so customlib-props="<nvOCDR attribute>:<nvOCDR attr value>"

Parameter

Data Type

Default

Description

Supported

ocdnet-engine-path

String

The absolute path to the OCDNet TensorRT engine

ocdnet-input-shape

String

The input shape (in CHW format) of the OCDNet TensorRT engine. Channel, height, and width are separated with commas.

ocdnet-binarize-threshold

Float

The threshold value to binarize the OCDNet output

>0

ocdnet-unclip-ratio

Float

1.5

The unclip ratio of the detected text region, which determines the output size

>0

ocdnet-polygon-threshold

Float

The threshold value to filter the polygons generated from the OCDNet postprocess based on the confidence score of polygons

[0, 1]

ocdnet-max-candidate

Unsigned int

The maximum output polygons from OCDNet

>0

rectifier-upsidedown

Unsigned int

0

The flag to enable upside-down processing in the Rectifier module. Set this option to 1 to enable nvOCDR to recognize characters totally upside down. The default value is 0.

0, 1

ocrnet-engine-path

String

The absolute path to the OCRNet TensorRT engine

ocrnet-dict-path

String

The absolute path to the OCRNet vocabulary file

ocrnet-input-shape

String

The input shape (in CHW format) of the OCRNet TensorRT engine. Channel, height, and width are separated with commas.

Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more.

This guide will walk through the steps for intgerating the nvOCDR library into Triton. Refer to the nvOCDR documentation for more information about nvOCDR.

  • Step1: Get the nvOCDR repository:

    Copy
    Copied!
                

    git clone https://github.com/NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution.git

  • Step2: Download TAO Toolkit PTM from NGC

    Note

    Refer to NGC to set up your environment to run ngc commands.

    You can download the pretrained OCDNet and OCRNet models with the following commands:

    Copy
    Copied!
                

    mkdir -p pretrained_models ngc registry model download-version nvidia/tao/ocdnet:deployable_v1.0 --dest ./pretrained_models ngc registry model download-version nvidia/tao/ocrnet:deployable_v1.0 --dest ./pretrained_models

    A character_list.txt file is included with the pretrained OCRNet ONNX model. This is the vocabulary of the trained OCRNet model and is consumed by the nvOCDR library. Refer to the Preparing the Dataset section of the OCRNet documentation for more information about the character_list.txt.

  • Step3: Build triton server docker image:

    Copy
    Copied!
                

    cd NVIDIA-Optical-Character-Detection-and-Recognition-Solution/triton bash setup_triton_server.sh [OCD input height] [OCD input width] [OCD input max batchsize] [DEVICE] [ocd onnx path> [ocr onnx path] [ocr character list path] # For example bash setup_triton_server.sh 736 1280 4 0 model/ocd.onnx model/ocr.onnx model/ocr_character_list

  • Step4: Build triton client docker image:

    Copy
    Copied!
                

    cd NVIDIA-Optical-Character-Detection-and-Recognition-Solution/triton bash setup_triton_client.sh

  • Step5: Run nvocdr triton server:

    Copy
    Copied!
                

    docker run -it --net=host --gpus all --shm-size 8g nvcr.io/nvidian/tao/nvocdr_triton_server:v1.0 bash CUDA_VISIBLE_DEVICES=<gpu idx> tritonserver --model-repository /opt/nvocdr/ocdr/triton/models/

    • Inference for high resolution images

      nvocdr triton can support hight resolution images as input such as 4000x4000. you can change the spec file /opt/nvocdr/ocdr/triton/models/nvOCDR/spec.json in tirtion server container to support the high resolution images inference.

      Copy
      Copied!
                  

      # to support high resolution images is_high_resolution_input: true

      Note

      high resolution image inference only support batch size 1

  • Step6: Run nvocdr triton client:

    open a new terminal and run commands below

    Copy
    Copied!
                

    docker run -it --rm -v <path to images dir>:<path to images dir> --net=host nvcr.io/nvidian/tao/nvocdr_triton_client:v1.0 bash python3 client.py -d <path to images dir> -bs 1

    client.py args:
    • -d : path to the images folder, the support image format includes ‘.jpg’, ‘.jpeg’, ‘.png’

    • -bs: batch size for inference, only support batch size 1 when running the high resolution inference.

if you want to change some configures for triton server , you can exec the triton server container and stop the tritonserver process, then modify the spec file /opt/nvocdr/ocdr/triton/models/nvOCDR/spec.json. After you finished the modifing, you can launch the tritonserver and go to the client container to run the inference.

Below are the introduction of the parameters in spec file

Parameter

Data Type

Default

Description

Supported

ocdnet_trt_engine_path

String

The absolute path to the OCDNet TensorRT engine

ocdnet_infer_input_shape

List

The input shape (in CHW format) of the OCDNet TensorRT engine. Channel, height, and width are separated with commas.

ocdnet-binarize-threshold

Float

The threshold value to binarize the OCDNet output

>0

ocdnet-unclip-ratio

Float

1.5

The unclip ratio of the detected text region, which determines the output size

>0

ocdnet-polygon-threshold

Float

The threshold value to filter the polygons generated from the OCDNet postprocess based on the confidence score of polygons

[0, 1]

ocdnet-max-candidate

Unsigned int

The maximum output polygons from OCDNet

>0

upsidedown

bool

True

The flag to enable upside-down processing in the Rectifier module. Set this option to 1 to enable nvOCDR to recognize characters totally upside down. The default value is 0.

0, 1

ocrnet_trt_engine_path

String

The absolute path to the OCRNet TensorRT engine

ocrnet_dict_file

String

The absolute path to the OCRNet vocabulary file

ocrnet_infer_input_shape

List

The input shape (in CHW format) of the OCRNet TensorRT engine. Channel, height, and width are separated with commas.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.