NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

OCDNet

OCDNet is an optical-character detection model that is included in the TAO Toolkit. It supports the following tasks:

  • train

  • evaluate

  • inference

  • prune

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model ocdnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

The dataset for OCDNet contains images and the corresponding label files.

Both the training dataset and test dataset must follow the same structure. The directory structure should be organized as follows, where the directory name for images is img and the directory name for label files is gt. By default, the label file is expected to use gt_ as a prefix for comparison to the corresponding image file.

The exact directory names train and test are not required but are preferred by convention.

Copy
Copied!
            

/train /img img_0.jpg img_1.jpg ... /gt gt_img_0.txt gt_img_1.txt ... /test /img img_0.jpg img_1.jpg ... /gt gt_img_0.txt gt_img_1.txt ...

Below is an example label file from the public ICDAR2015 dataset:

Copy
Copied!
            

$ cat ICDAR2015/test/gt/gt_img_14.txt 268,82,335,93,332,164,267,164,the 344,94,433,112,427,159,336,163,Future 208,191,374,184,371,213,208,241,Communications 370,176,420,176,416,204,373,213,### 1,57,261,76,261,187,0,190,venting 1,208,203,200,203,241,3,294,ntelligence.

Note

The label file contains the cooridnates for all the points. The last one is the text. If the text is ### and the training spec file sets ignore_tags to ['###'], then those lines are ignored during training.

The spec file for OCDNet includes model, train, dataset, and evaluate, as well as other global parameters. Below is an example spec file for training an OCDNet model with a deformable_resnet18 backbone on an ICDAR2015 dataset.

Copy
Copied!
            

num_gpus: 1 model: load_pruned_graph: False pruned_graph_path: '/results/prune/pruned_0.1.pth' pretrained_model_path: '/data/ocdnet/ocdnet_deformable_resnet18.pth' backbone: deformable_resnet18 train: results_dir: /results/train num_epochs: 30 #resume_training_checkpoint_path: '/results/train/resume.pth' checkpoint_interval: 1 validation_interval: 1 trainer: clip_grad_norm: 5.0 optimizer: type: Adam args: lr: 0.001 lr_scheduler: type: WarmupPolyLR args: warmup_epoch: 3 post_processing: type: SegDetectorRepresenter args: thresh: 0.3 box_thresh: 0.55 max_candidates: 1000 unclip_ratio: 1.5 metric: type: QuadMetric args: is_output_polygon: false dataset: train_dataset: data_name: ICDAR2015Dataset data_path: ['/data/ocdnet/train'] args: pre_processes: - type: IaaAugment args: - {'type':Fliplr, 'args':{'p':0.5}} - {'type': Affine, 'args':{'rotate':[-10,10]}} - {'type':Resize,'args':{'size':[0.5,3]}} - type: EastRandomCropData args: size: [640,640] max_tries: 50 keep_ratio: true - type: MakeBorderMap args: shrink_ratio: 0.4 thresh_min: 0.3 thresh_max: 0.7 - type: MakeShrinkMap args: shrink_ratio: 0.4 min_text_size: 8 img_mode: BGR filter_keys: [img_path,img_name,text_polys,texts,ignore_tags,shape] ignore_tags: ['*', '###'] loader: batch_size: 4 pin_memory: true num_workers: 4 validate_dataset: data_name: ICDAR2015Dataset data_path: ['/data/ocdnet/test'] args: pre_processes: - type: Resize2D args: short_size: - 1280 - 736 resize_text_polys: true img_mode: BGR filter_keys: [] ignore_tags: ['*', '###'] loader: batch_size: 1 pin_memory: false num_workers: 4

The top level description of the spec file is provided in the table below.

Parameter

Data Type

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs

>0

Model

The model parameter provides the list of parameters for the model.

Parameter

Data Type

Default

Description

Supported Values

load_pruned_graph

bool

false

A flag specifying whether to load the pruned graph. Set to True if train/evaluate/export/inference is being performed against pruned model.

true/false

pruned_graph_path

string

The path to the pruned graph model (if load_pruned_graph is True)

unix path

pretrained_model_path

string

The path to the pretrained model

unix path

backbone

string

deformable_resnet18

The backbone of the model

deformable_resnet18
deformable_resnet50

Train

The train parameter provides the parameters for training.

Parameter

Data Type

Default

Description

Supported Values

results_dir

string

The directory for saving training result

unix path

num_epochs

unsigned int

50

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

1

The interval at which to save the checkpoint file

>0

validation_interval

unsigned int

1

The interval of validation

>0

optimizer

dict config

The configuration for the optimizer

lr_scheduler

dict config

The configuration for the lr_scheduler

post_processing

dict config

The configuration for post_processing.

metric

dict config

The configuration for metric computing. QuadMetric is supported. If is_output_polygon is True, a polygon will be generated. If it is False, a BBox will be generated.

optimizer

Copy
Copied!
            

optimizer: type: Adam args: lr: 0.001

Parameter

Data Type

Default

Description

Supported Values

type

string

Adam

The optimizer type

Adam

lr

float

The initial learning rate

>=0.0

lr_scheduler

Copy
Copied!
            

lr_scheduler: type: WarmupPolyLR args: warmup_epoch: 3

Parameter

Data Type

Default

Description

Supported Values

type

string

WarmupPolyLR

Decays the learning rate via a polynomial function. The learning rate increases to initial value during warmup stage and is reduced from the initial value to zero during the training stage.

WarmupPolyLR

warmup_epoch

unsigned int

3

The warmup epoch, which the learning rate increases to the intitial value (i.e. optimizer.args.lr). The warmup epoch should not be the same as the num_epochs.

>=0

post_processing

Copy
Copied!
            

post_processing: type: SegDetectorRepresenter args: thresh: 0.3 box_thresh: 0.55 max_candidates: 1000 unclip_ratio: 1.5

Parameter

Data Type

Default

Description

Supported Values

type

string

SegDetectorRepresenter

The name of the post_processing. The post_processing will generate BBox or polygon.

SegDetectorRepresenter

thresh

float

0.3

The threshold for binarization, which is used in generating an approximate binary map.

0.0 ~ 1.0

box_thresh

float

0.7

The BBox threshold. If the effective area is lower than this threshold, the prediction will be ignored, which means no text is detected.

0.0 ~ 1.0

max_candidates

unsigned int

1000

The maximum candidate output. Enlarge this parameter if characters are detected in one area but obviously not in the other area of the image.

> 1

unclip_ratio

float

1.5

The unclip ratio using the Vatti clipping algorithm in the probability map. The BBox will look larger if this ratio is set larger.

>0.0

Dataset

The dataset is defined by two sections: train_dataset and validate_dataset

Parameter

Data Type

Default

Description

Supported Values

train_dataset

dict config

The configuragtion for the training dataset

validate_dataset

dict config

The configuragtion for the validation dataset

The parameters for train_dataset is provided below.

Parameter

Data Type

Default

Description

Supported Values

data_name

string

ICDAR2015Dataset

The dataset name. For “ICDAR2015Dataset”, the label file is
expected to use gt_ as a prefix. For “UberDataset”,
the label file is expected to use truth_ as a prefix.

ICDAR2015Dataset
UberDataset

data_path

string list

The list of paths that contain images used for training: For example, ['path_1'] or ['path_1', 'path_2', ...]

pre_processes

dict

The pre-processing configuration (see ) train_preprocess for more details

img_mode

string

BGR

The image mode

BGR, RGB, GRAY

filter_keys

string list

['img_path', 'img_name', 'text_polys', 'texts', 'ignore_tags', 'shape']

The keys to ignore

ignore_tags

string list

['*', '###']

The labels that are not used to train

batch_size

unsigned int

False

The batch size. Set to a lower value if you encounter out-of-memory errors.

>0

pin_memory

bool

False

A flag specifying whether to enable pinned memory

true/false

num_workers

unsigned int

1

The threds used to load data

>=0

train_preprocess

Copy
Copied!
            

pre_processes: - type: IaaAugment args: - {'type':Fliplr, 'args':{'p':0.5}} - {'type': Affine, 'args':{'rotate':[-10,10]}} - {'type':Resize,'args':{'size':[0.5,3]}} - type: EastRandomCropData args: size: [640,640] max_tries: 50 keep_ratio: true - type: MakeBorderMap args: shrink_ratio: 0.4 thresh_min: 0.3 thresh_max: 0.7 - type: MakeShrinkMap args: shrink_ratio: 0.4 min_text_size: 8

Parameter

Data Type

Default

Description

Supported Values

IaaAugment

dict list

{'type':Fliplr, 'args':{'p':0.5}}
{'type': Affine, 'args':{'rotate':[-10,10]}}
{'type':Resize,'args':{'size':[0.5,3]}}

Uses imgaug to perform augmentation. “Fliplr”, “Affine”, and “Resize” are used by default.
p defines the probability of each image to be flipped. rotate defines the degree range when rotating images by a random value.
size defines the range when resizing each image compared to its original size.

p: 0.0 ~ 1.0 roate: -180 ~ 180 resize: >0.0 ~ >0.0

EastRandomCropData

dict config

The ramdom crop after augmentation. size defines the cropped target size(width,height). The width and height should be multiples of 32. max_tries defines the maximum times to try to crop since the cropped area may be too small or cropping may have failed. keep_ratio specifies whether to keep the aspect ratio.

size: [>0, >0] max_tries: >0 keep_ratio: true/false

MakeBorderMap

dict config

Defines the parameter when generating a threshold map. shrink_ratio is used to calculate the distance between expanding/shrinking polygons and the original text polygon. thresh_min and thresh_max will set the threshold range when generating the threshold map.

0.0 ~ 1.0

MakeShrinkMap

dict config

Defines the parameter when generating a probability map. shrink_ratio is used to generate shrunken polygons. min_text_size specifies that the text will be ignored if its height or width is lower than this parameter.

0.0 ~ 1.0

The parameters for validate_dataset are similar to train_dataset, except below validation_preprocess.

validation_preprocess

Copy
Copied!
            

pre_processes: - type: Resize2D args: short_size: - 1280 - 736 resize_text_polys: true

Parameter

Data Type

Default

Description

Supported Values

type

string

Resize2D

Resize the images and labels before evaluation.

Resize2D

short_size

list

Resize the image to (width x height).

>0, >0, and multiples of 32.

resize_text_polys

bool

A flag specifying whether to resize the text coordinate

true/false

Evaluate

The following is an example spec file for training on the ICDAR2015 dataset.

Copy
Copied!
            

model: load_pruned_graph: False pruned_graph_path: '/results/prune/pruned_0.1.pth' backbone: deformable_resnet18 evaluate: results_dir: /results/evaluate checkpoint: /results/train/model_best.pth gpu_id: 0 post_processing: type: SegDetectorRepresenter args: thresh: 0.3 box_thresh: 0.55 max_candidates: 1000 unclip_ratio: 1.5 metric: type: QuadMetric args: is_output_polygon: false dataset: validate_dataset: data_path: ['/data/ocdnet/test'] args: pre_processes: - type: Resize2D args: short_size: - 1280 - 736 resize_text_polys: true img_mode: BGR filter_keys: [] ignore_tags: ['*', '###'] loader: batch_size: 1 shuffle: false pin_memory: false num_workers: 4


Inference

The following is an example spec file for running infernce:

Copy
Copied!
            

model: load_pruned_graph: false pruned_graph_path: '/results/prune/pruned_0.1.pth' backbone: deformable_resnet18 inference: checkpoint: '/results/train/model_best.pth' input_folder: /data/ocdnet/test/img width: 1280 height: 736 img_mode: BGR polygon: false results_dir: /results/inference post_processing: type: SegDetectorRepresenter args: thresh: 0.3 box_thresh: 0.55 max_candidates: 1000 unclip_ratio: 1.5

The inference parameter defines the hyper-parameters of the inference process. Inference will draw bounding boxes or polygons and visualize it in images.

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

The path to the pth model

Unix path

input_folder

string

The path to the input folder for inference

Unix path

width

unsigned int

The input width

>=1

height

unsigned int

The input height

>=1

img_mode

string

The image mode

BGR/RGB/GRAY

polygon

bool

A True value specifies BBox, while a False value specifies polygon.

true, false

Use the following command to run OCDnet training:

Copy
Copied!
            

tao model ocdnet train -e <experiment_spec_file> -r <results_dir> [model.pretrained_model_path=<path_to_pretrained_model_file>] [train.resume_training_checkpoint_path=<path_to_resume_training_checkpoint>] [num_gpus=<num_gpus>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

Optional Arguments

  • -r, --results_dir: The path to the folder where the experiment outputs should be written

  • num_gpus: The number of GPUs to be used in the training in a multi-GPU scenario. The default value is 1.

Here’s an example of running train with a pretrained model:

Copy
Copied!
            

tao model ocdnet train \ -e $SPECS_DIR/train.yaml \ -r $RESULTS_DIR/train \ model.pretrained_model_path=$DATA_DIR/ocdnet_deformable_resnet18.pth

Here’s an example of resuming training:

Copy
Copied!
            

tao model ocdnet train \ -e $SPECS_DIR/train.yaml \ -r $RESULTS_DIR/train \ train.resume_training_checkpoint_path=$RESULTS_DIR/train/resume.pth

Here’s an example of running train with multi-gpus:

Copy
Copied!
            

tao model ocdnet train \ -e $SPECS_DIR/train.yaml \ -r $RESULTS_DIR/train \ model.pretrained_model_path=$DATA_DIR/ocdnet_deformable_resnet18.pth \ num_gpus=2

Note

By default, the training is using DDP(distributed data parallel) strategy. When train with multi-gpus, only if evaluation images are multiple of num_gpus * evaluate_batch_size, the hmean result during training will be the same as the hmean result of running tao model ocdnet evaluate`.


Use the following command to run OCDNet evaluation:

Copy
Copied!
            

tao model ocdnet evaluate -e <experiment_spec_file> [evaluate.checkpoint=<path_to_checkpoint>]

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the evaluation experiment

Optional Arguments

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

  • -h, --help: Show this help message and exit.

Here’s an example of using the OCDNet evaluation command:

Copy
Copied!
            

tao model ocdnet evaluate \ -e $SPECS_DIR/evaluate.yaml \ evaluate.checkpoint=$RESULTS_DIR/train/model_best.pth


Copy
Copied!
            

tao ocdnet inference -e <experiment_spec_file> -r <results_dir>

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the inference experiment.

Optional Arguments

  • -r, --results_dir: The path to a folder where the experiment outputs should be written

Here’s an example of using the OCDNet inference command:

Copy
Copied!
            

tao model ocdnet inference \ -e $SPECS_DIR/inference.yaml \ inference.checkpoint=$RESULTS_DIR/train/model_best.pth \ inference.input_folder=$DATA_DIR/test/img \ inference.results_dir=$RESULTS_DIR/infer

Note

Currently, inference expects existing label files in the gt folder. If there are not any label files, please generate dummy labels under the gt folder. Use the below script for reference:

Copy
Copied!
            

#!/bin/bash folder_path=/workspace/datasets/ICDAR2015/datasets/test mkdir -p ${folder_path}/gt for filename in `ls ${folder_path}/img`; do touch "${folder_path}/gt/gt_${filename%.*}.txt" echo "10,10,10,20,20,10,20,20,###" > "${folder_path}/gt/gt_${filename%.*}.txt" done


Model pruning reduces model parameters to improve inference frames per second (FPS) while maintaining nearly the same hmean.

Pruning is applied to an already trained OCDNet model. After pruning, the pruned graph model is generated. It is a new model with fewer parameters. Once you have this pruned graph model, you will need to retrain it on the same dataset to bring back the hmean. During retraining, you need to enable loading this pruned graph model and setting the path to this model.

The prune parameter defines the hyperparameters of the pruning process.

Copy
Copied!
            

prune: checkpoint: /results/train/model_best.pth pruning_thresh: 0.1 results_dir: /results/prune dataset: validate_dataset: data_path: ['/data/ocdnet/test'] args: pre_processes: - type: Resize2D args: short_size: - 1280 - 736 resize_text_polys: true img_mode: BGR filter_keys: [] ignore_tags: ['*', '###'] loader: batch_size: 1 shuffle: false pin_memory: false num_workers: 1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

The path to PyTorch model to prune

unix path

pruning_thresh

float

The pruning threshold

0.0 ~ 1.0

results_dir

string

The path to the results directory

unix path

Use the following command to run pruning on the OCDNet model.

Copy
Copied!
            

tao model ocdnet prune -e $SPECS_DIR/prune.yaml \ prune.checkpoint=$RESULTS_DIR/train/model_best.pth \ prune.pruning_thresh=0.1 \ prune.results_dir=$RESULTS_DIR/prune

Required Arguments

  • -e, --experiment_spec_file: The experiment spec file to set up the pruning experiment.

Optional Arguments

  • prune.pruning_thresh: The pruning threshold, which should be a float number between 0.0 and 1.0. The default value is 0.1.

After pruning, the pruned model can be used for retraining (i.e. fine tuning). To start the retraining, you need to set the load_pruned_graph parameter to true and set the pruned_graph_path parameter to point to the model that is generated from pruning.

Note

When retraining, evaluating, performing inference on, or exporting a model that has a pruned structure, you need to set load_pruned_graph to true so that the newly pruned model structure is imported. See the examples for more details.

Here’s an example of running training with a pruned model:

Copy
Copied!
            

tao model ocdnet train -e $SPECS_DIR/train.yaml \ -r $RESULTS_DIR/retrain \ model.load_pruned_graph=true \ model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth

Here’s an example of resuming training against a pruned model:

Copy
Copied!
            

tao model ocdnet train \ -e $SPECS_DIR/train.yaml \ -r $RESULTS_DIR/retrain \ model.load_pruned_graph=true \ model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth train.resume_training_checkpoint_path=$RESULTS_DIR/retrain/resume.pth

Here’s an example of running evalation against a pruned model:

Copy
Copied!
            

tao model ocdnet evaluate \ -e $SPECS_DIR/evaluate.yaml \ -r $RESULTS_DIR/evaluate \ model.load_pruned_graph=true \ model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth evaluate.checkpoint==$RESULTS_DIR/train/model_best.pth

Here’s an example of running inference against a pruned model:

Copy
Copied!
            

tao model ocdnet inference \ -e $SPECS_DIR/inference.yaml \ model.load_pruned_graph=true \ model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth inference.checkpoint=$RESULTS_DIR/train/model_best.pth \ inference.input_folder=$DATA_DIR/test/img \ inference.results_dir=$RESULTS_DIR/infer

Here’s an example of running export against a pruned model:

Copy
Copied!
            

tao model ocdnet export \ -e $SPECS_DIR/export.yaml \ model.load_pruned_graph=true \ model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth export.checkpoint=$RESULTS_DIR/train/model_best.pth \ export.onnx_file=$RESULTS_DIR/export/model_best.onnx


The export parameter defines the hyperparameters of the export process.

Copy
Copied!
            

model: load_pruned_graph: False pruned_graph_path: '/results/prune/pruned_0.1.pth' backbone: deformable_resnet18 export: results_dir: /results/export checkpoint: '/results/train/model_best.pth' onnx_file: '/results/export/model_best.onnx' width: 1280 height: 736 dataset: validate_dataset: data_path: ['/data/ocdnet/test']

Parameter

Datatype

Default

Description

Supported Values

checkpoint

string

The path to PyTorch model to export

Unix path

onnx_file

string

The path to onnx file

Unix path

opset_version

unsigned int

11

The opset version of the exported onnx

>0

input_width

unsigned int

1280

The input width

>0

input_height

unsigned int

736

The input height

>0

Copy
Copied!
            

tao ocdnet export -e $SPECS_DIR/export.yaml export.checkpoint=<path_to_pth_file> export.onnx_file=<path_to_onnx_file>

Required Arguments

  • -e, --experiment_spec: The experiment spec file to set up export

  • export.checkpoint: The path to save the exported model to

  • export.onnx_file: Show this help message and exit.

Here’s an example for using the OCDNet export command:

Copy
Copied!
            

tao model ocdnet export \ -e $SPECS_DIR/export.yaml \ export.checkpoint=$RESULTS_DIR/train/model_best.pth \ export.onnx_file=$RESULTS_DIR/export/model_best.onnx


For deployment, please refer to the TAO Deploy documentation.

Note

If you are not running OCDNet TensorRT engine with tao deploy, in other words, if there is no output when you run nm -gDC /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin in x86 platform or nm -gDC /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin in Jetson platform, you need to compile/replace the TensorRT OSS plugin since OCDNet requires the modulatedDeformConvPlugin.

  1. Get the TensorRT repository:

    Copy
    Copied!
                

    git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git cd TensorRT git submodule update --init --recursive

  2. Compile the TensorRT libnvinfer_plugin.so file:

    Copy
    Copied!
                

    mkdir build && cd build # On X86 platform cmake .. # On Jetson platform cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ make nvinfer_plugin -j12

    The libnvinfer_plugin.so.8.6.x is generated under the build folder. Note that x depends on the actual minor version.

  3. Replace the default plugin library. Note that the exact plugin name will depend on the TensorRT version installed in your system.

    Copy
    Copied!
                

    # On X86 platform, for example, if the default plugin is /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2, then cp libnvinfer_plugin.so.8.6.x /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2 # On Jetson platform, for example, if the default plugin is /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2, then cp libnvinfer_plugin.so.8.6.x /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2

Refer to the nvOCDR page for more information about deploying an OCDNet model to DeepStream. You can run nvOCDR with the DeepStream sample or Triton Inference Server. Specifically, nvOCDR Triton can support inference against high resolution image. In short, it will resize the image while keeping aspect ratio and then tile the image to small patches, and run OCDNet to get the output then merge the result. This is useful to improve hmean in case a model is trained with a smaller resolution but will run inference against higher resolution images. For images which are not high resolution, you can also set resize_keep_aspect_ratio:true, this is useful to improve hmean because the images are resized without distortion.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.