OCDNet
OCDNet is an optical-character detection model that is included in the TAO Toolkit. It supports the following tasks:
train
evaluate
inference
prune
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model ocdnet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
The dataset for OCDNet contains images and the corresponding label files.
Both the training dataset and test dataset must follow the same structure.
The directory structure should be organized as follows, where the directory name for images is
img
and the directory name for label files is gt
. By default, the label file is
expected to use gt_
as a prefix for comparison to the corresponding image file.
The exact directory names train
and test
are not required but are preferred by convention.
/train
/img
img_0.jpg
img_1.jpg
...
/gt
gt_img_0.txt
gt_img_1.txt
...
/test
/img
img_0.jpg
img_1.jpg
...
/gt
gt_img_0.txt
gt_img_1.txt
...
Below is an example label file from the public ICDAR2015 dataset:
$ cat ICDAR2015/test/gt/gt_img_14.txt
268,82,335,93,332,164,267,164,the
344,94,433,112,427,159,336,163,Future
208,191,374,184,371,213,208,241,Communications
370,176,420,176,416,204,373,213,###
1,57,261,76,261,187,0,190,venting
1,208,203,200,203,241,3,294,ntelligence.
The label file contains the cooridnates for all the points. The last one is the text.
If the text is ###
and the training spec file sets ignore_tags
to ['###']
, then those lines are ignored during training.
The spec file for OCDNet includes model
, train
, dataset
, and evaluate
, as well as
other global parameters. Below is an example spec file for training an OCDNet model with a deformable_resnet18 backbone
on an ICDAR2015 dataset.
num_gpus: 1
model:
load_pruned_graph: False
pruned_graph_path: '/results/prune/pruned_0.1.pth'
pretrained_model_path: '/data/ocdnet/ocdnet_deformable_resnet18.pth'
backbone: deformable_resnet18
train:
results_dir: /results/train
num_epochs: 30
#resume_training_checkpoint_path: '/results/train/resume.pth'
checkpoint_interval: 1
validation_interval: 1
trainer:
clip_grad_norm: 5.0
optimizer:
type: Adam
args:
lr: 0.001
lr_scheduler:
type: WarmupPolyLR
args:
warmup_epoch: 3
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.55
max_candidates: 1000
unclip_ratio: 1.5
metric:
type: QuadMetric
args:
is_output_polygon: false
dataset:
train_dataset:
data_name: ICDAR2015Dataset
data_path: ['/data/ocdnet/train']
args:
pre_processes:
- type: IaaAugment
args:
- {'type':Fliplr, 'args':{'p':0.5}}
- {'type': Affine, 'args':{'rotate':[-10,10]}}
- {'type':Resize,'args':{'size':[0.5,3]}}
- type: EastRandomCropData
args:
size: [640,640]
max_tries: 50
keep_ratio: true
- type: MakeBorderMap
args:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- type: MakeShrinkMap
args:
shrink_ratio: 0.4
min_text_size: 8
img_mode: BGR
filter_keys: [img_path,img_name,text_polys,texts,ignore_tags,shape]
ignore_tags: ['*', '###']
loader:
batch_size: 4
pin_memory: true
num_workers: 4
validate_dataset:
data_name: ICDAR2015Dataset
data_path: ['/data/ocdnet/test']
args:
pre_processes:
- type: Resize2D
args:
short_size:
- 1280
- 736
resize_text_polys: true
img_mode: BGR
filter_keys: []
ignore_tags: ['*', '###']
loader:
batch_size: 1
pin_memory: false
num_workers: 4
The top level description of the spec file is provided in the table below.
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
unsigned int |
1 |
The number of GPUs |
>0 |
Model
The model
parameter provides the list of parameters for the model.
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
bool |
|
A flag specifying whether to load the pruned graph. Set to True if train/evaluate/export/inference is being performed against pruned model. |
true/false |
|
string |
– |
The path to the pruned graph model (if |
unix path |
|
string |
– |
The path to the pretrained model |
unix path |
|
string |
deformable_resnet18 |
The backbone of the model |
deformable_resnet18 |
Train
The train
parameter provides the parameters for training.
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
string |
– |
The directory for saving training result |
unix path |
|
unsigned int |
50 |
The total number of epochs to run the experiment |
>0 |
|
unsigned int |
1 |
The interval at which to save the checkpoint file |
>0 |
|
unsigned int |
1 |
The interval of validation |
>0 |
|
dict config |
– |
The configuration for the optimizer |
– |
|
dict config |
– |
The configuration for the lr_scheduler |
– |
|
dict config |
– |
The configuration for post_processing. |
– |
|
dict config |
– |
The configuration for metric computing. QuadMetric is supported.
If |
– |
optimizer
optimizer:
type: Adam
args:
lr: 0.001
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
string |
Adam |
The optimizer type |
Adam |
|
float |
– |
The initial learning rate |
>=0.0 |
lr_scheduler
lr_scheduler:
type: WarmupPolyLR
args:
warmup_epoch: 3
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
string |
WarmupPolyLR |
Decays the learning rate via a polynomial function. The learning rate increases to initial value during warmup stage and is reduced from the initial value to zero during the training stage. |
WarmupPolyLR |
|
unsigned int |
3 |
The warmup epoch, which the learning rate increases to the intitial value (i.e.
|
>=0 |
post_processing
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.55
max_candidates: 1000
unclip_ratio: 1.5
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
string |
SegDetectorRepresenter |
The name of the post_processing. The post_processing will generate BBox or polygon. |
SegDetectorRepresenter |
|
float |
0.3 |
The threshold for binarization, which is used in generating an approximate binary map. |
0.0 ~ 1.0 |
|
float |
0.7 |
The BBox threshold. If the effective area is lower than this threshold, the prediction will be ignored, which means no text is detected. |
0.0 ~ 1.0 |
|
unsigned int |
1000 |
The maximum candidate output. Enlarge this parameter if characters are detected in one area but obviously not in the other area of the image. |
> 1 |
|
float |
1.5 |
The unclip ratio using the Vatti clipping algorithm in the probability map. The BBox will look larger if this ratio is set larger. |
>0.0 |
Dataset
The dataset is defined by two sections: train_dataset
and validate_dataset
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict config |
– |
The configuragtion for the training dataset |
– |
|
dict config |
– |
The configuragtion for the validation dataset |
– |
The parameters for train_dataset
is provided below.
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
string |
ICDAR2015Dataset |
The dataset name. For “ICDAR2015Dataset”, the label file is |
ICDAR2015Dataset |
|
string list |
– |
The list of paths that contain images used for training:
For example, |
– |
|
dict |
– |
The pre-processing configuration (see ) train_preprocess for more details |
– |
|
string |
BGR |
The image mode |
BGR, RGB, GRAY |
|
string list |
|
The keys to ignore |
– |
|
string list |
|
The labels that are not used to train |
– |
|
unsigned int |
False |
The batch size. Set to a lower value if you encounter out-of-memory errors. |
>0 |
|
bool |
False |
A flag specifying whether to enable pinned memory |
true/false |
|
unsigned int |
1 |
The threds used to load data |
>=0 |
train_preprocess
pre_processes:
- type: IaaAugment
args:
- {'type':Fliplr, 'args':{'p':0.5}}
- {'type': Affine, 'args':{'rotate':[-10,10]}}
- {'type':Resize,'args':{'size':[0.5,3]}}
- type: EastRandomCropData
args:
size: [640,640]
max_tries: 50
keep_ratio: true
- type: MakeBorderMap
args:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- type: MakeShrinkMap
args:
shrink_ratio: 0.4
min_text_size: 8
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict list |
|
Uses imgaug to perform augmentation. “Fliplr”, “Affine”, and “Resize” are used by default. |
|
|
dict config |
– – |
The ramdom crop after augmentation. |
|
|
dict config |
– |
Defines the parameter when generating a threshold map. |
0.0 ~ 1.0 |
|
dict config |
– |
Defines the parameter when generating a probability map. |
0.0 ~ 1.0 |
The parameters for validate_dataset
are similar to train_dataset
, except below validation_preprocess.
validation_preprocess
pre_processes:
- type: Resize2D
args:
short_size:
- 1280
- 736
resize_text_polys: true
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
string |
Resize2D |
Resize the images and labels before evaluation. |
Resize2D |
|
list |
– |
Resize the image to (width x height). |
>0, >0, and multiples of 32. |
|
bool |
– |
A flag specifying whether to resize the text coordinate |
true/false |
Evaluate
The following is an example spec file for training on the ICDAR2015 dataset.
model:
load_pruned_graph: False
pruned_graph_path: '/results/prune/pruned_0.1.pth'
backbone: deformable_resnet18
evaluate:
results_dir: /results/evaluate
checkpoint: /results/train/model_best.pth
gpu_id: 0
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.55
max_candidates: 1000
unclip_ratio: 1.5
metric:
type: QuadMetric
args:
is_output_polygon: false
dataset:
validate_dataset:
data_path: ['/data/ocdnet/test']
args:
pre_processes:
- type: Resize2D
args:
short_size:
- 1280
- 736
resize_text_polys: true
img_mode: BGR
filter_keys: []
ignore_tags: ['*', '###']
loader:
batch_size: 1
shuffle: false
pin_memory: false
num_workers: 4
Inference
The following is an example spec file for running infernce:
model:
load_pruned_graph: false
pruned_graph_path: '/results/prune/pruned_0.1.pth'
backbone: deformable_resnet18
inference:
checkpoint: '/results/train/model_best.pth'
input_folder: /data/ocdnet/test/img
width: 1280
height: 736
img_mode: BGR
polygon: false
results_dir: /results/inference
post_processing:
type: SegDetectorRepresenter
args:
thresh: 0.3
box_thresh: 0.55
max_candidates: 1000
unclip_ratio: 1.5
The inference
parameter defines the hyper-parameters of the inference process. Inference will
draw bounding boxes or polygons and visualize it in images.
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
– |
The path to the pth model |
Unix path |
|
string |
– |
The path to the input folder for inference |
Unix path |
|
unsigned int |
– |
The input width |
>=1 |
|
unsigned int |
– |
The input height |
>=1 |
|
string |
– |
The image mode |
BGR/RGB/GRAY |
|
bool |
– |
A True value specifies BBox, while a False value specifies polygon. |
true, false |
Use the following command to run OCDnet training:
tao model ocdnet train -e <experiment_spec_file>
-r <results_dir>
[model.pretrained_model_path=<path_to_pretrained_model_file>]
[train.resume_training_checkpoint_path=<path_to_resume_training_checkpoint>]
[num_gpus=<num_gpus>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file
Optional Arguments
-r, --results_dir
: The path to the folder where the experiment outputs should be writtennum_gpus
: The number of GPUs to be used in the training in a multi-GPU scenario. The default value is 1.
Here’s an example of running train
with a pretrained model:
tao model ocdnet train \
-e $SPECS_DIR/train.yaml \
-r $RESULTS_DIR/train \
model.pretrained_model_path=$DATA_DIR/ocdnet_deformable_resnet18.pth
Here’s an example of resuming training:
tao model ocdnet train \
-e $SPECS_DIR/train.yaml \
-r $RESULTS_DIR/train \
train.resume_training_checkpoint_path=$RESULTS_DIR/train/resume.pth
Here’s an example of running train
with multi-gpus:
tao model ocdnet train \
-e $SPECS_DIR/train.yaml \
-r $RESULTS_DIR/train \
model.pretrained_model_path=$DATA_DIR/ocdnet_deformable_resnet18.pth \
num_gpus=2
By default, the training is using DDP(distributed data parallel) strategy.
When train with multi-gpus, only if evaluation images are multiple of num_gpus * evaluate_batch_size
,
the hmean result during training will be the same as the hmean result of running tao model ocdnet evaluate`
.
Use the following command to run OCDNet evaluation:
tao model ocdnet evaluate -e <experiment_spec_file>
[evaluate.checkpoint=<path_to_checkpoint>]
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the evaluation experiment
Optional Arguments
-r, --results_dir
: The path to a folder where the experiment outputs should be written-h, --help
: Show this help message and exit.
Here’s an example of using the OCDNet evaluation command:
tao model ocdnet evaluate \
-e $SPECS_DIR/evaluate.yaml \
evaluate.checkpoint=$RESULTS_DIR/train/model_best.pth
tao ocdnet inference -e <experiment_spec_file>
-r <results_dir>
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the inference experiment.
Optional Arguments
-r, --results_dir
: The path to a folder where the experiment outputs should be written
Here’s an example of using the OCDNet inference command:
tao model ocdnet inference \
-e $SPECS_DIR/inference.yaml \
inference.checkpoint=$RESULTS_DIR/train/model_best.pth \
inference.input_folder=$DATA_DIR/test/img \
inference.results_dir=$RESULTS_DIR/infer
Currently, inference expects existing label files in the gt
folder. If there are not any label files, please
generate dummy labels under the gt
folder. Use the below script for reference:
#!/bin/bash
folder_path=/workspace/datasets/ICDAR2015/datasets/test
mkdir -p ${folder_path}/gt
for filename in `ls ${folder_path}/img`; do
touch "${folder_path}/gt/gt_${filename%.*}.txt"
echo "10,10,10,20,20,10,20,20,###" > "${folder_path}/gt/gt_${filename%.*}.txt"
done
Model pruning reduces model parameters to improve inference frames per second (FPS) while maintaining nearly the same hmean.
Pruning is applied to an already trained OCDNet model. After pruning, the pruned graph model is generated. It is a new model with fewer parameters. Once you have this pruned graph model, you will need to retrain it on the same dataset to bring back the hmean. During retraining, you need to enable loading this pruned graph model and setting the path to this model.
The prune
parameter defines the hyperparameters of the pruning process.
prune:
checkpoint: /results/train/model_best.pth
pruning_thresh: 0.1
results_dir: /results/prune
dataset:
validate_dataset:
data_path: ['/data/ocdnet/test']
args:
pre_processes:
- type: Resize2D
args:
short_size:
- 1280
- 736
resize_text_polys: true
img_mode: BGR
filter_keys: []
ignore_tags: ['*', '###']
loader:
batch_size: 1
shuffle: false
pin_memory: false
num_workers: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
The path to PyTorch model to prune |
unix path |
|
|
float |
The pruning threshold |
0.0 ~ 1.0 |
|
|
string |
The path to the results directory |
unix path |
Use the following command to run pruning on the OCDNet model.
tao model ocdnet prune -e $SPECS_DIR/prune.yaml \
prune.checkpoint=$RESULTS_DIR/train/model_best.pth \
prune.pruning_thresh=0.1 \
prune.results_dir=$RESULTS_DIR/prune
Required Arguments
-e, --experiment_spec_file
: The experiment spec file to set up the pruning experiment.
Optional Arguments
prune.pruning_thresh
: The pruning threshold, which should be a float number between 0.0 and 1.0. The default value is 0.1.
After pruning, the pruned model can be used for retraining (i.e. fine tuning). To start the retraining, you need to set
the load_pruned_graph
parameter to true
and set the pruned_graph_path
parameter to point to the
model that is generated from pruning.
When retraining, evaluating, performing inference on, or exporting a model that has a pruned structure, you need
to set load_pruned_graph
to true
so that the newly pruned model structure is imported. See the examples
for more details.
Here’s an example of running training with a pruned model:
tao model ocdnet train -e $SPECS_DIR/train.yaml \
-r $RESULTS_DIR/retrain \
model.load_pruned_graph=true \
model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth
Here’s an example of resuming training against a pruned model:
tao model ocdnet train \
-e $SPECS_DIR/train.yaml \
-r $RESULTS_DIR/retrain \
model.load_pruned_graph=true \
model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth
train.resume_training_checkpoint_path=$RESULTS_DIR/retrain/resume.pth
Here’s an example of running evalation against a pruned model:
tao model ocdnet evaluate \
-e $SPECS_DIR/evaluate.yaml \
-r $RESULTS_DIR/evaluate \
model.load_pruned_graph=true \
model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth
evaluate.checkpoint==$RESULTS_DIR/train/model_best.pth
Here’s an example of running inference against a pruned model:
tao model ocdnet inference \
-e $SPECS_DIR/inference.yaml \
model.load_pruned_graph=true \
model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth
inference.checkpoint=$RESULTS_DIR/train/model_best.pth \
inference.input_folder=$DATA_DIR/test/img \
inference.results_dir=$RESULTS_DIR/infer
Here’s an example of running export against a pruned model:
tao model ocdnet export \
-e $SPECS_DIR/export.yaml \
model.load_pruned_graph=true \
model.pruned_graph_path=$RESULTS_DIR/prune/pruned_$thresh.pth
export.checkpoint=$RESULTS_DIR/train/model_best.pth \
export.onnx_file=$RESULTS_DIR/export/model_best.onnx
The export
parameter defines the hyperparameters of the export process.
model:
load_pruned_graph: False
pruned_graph_path: '/results/prune/pruned_0.1.pth'
backbone: deformable_resnet18
export:
results_dir: /results/export
checkpoint: '/results/train/model_best.pth'
onnx_file: '/results/export/model_best.onnx'
width: 1280
height: 736
dataset:
validate_dataset:
data_path: ['/data/ocdnet/test']
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
string |
The path to PyTorch model to export |
Unix path |
|
|
string |
The path to onnx file |
Unix path |
|
|
unsigned int |
11 |
The opset version of the exported onnx |
>0 |
|
unsigned int |
1280 |
The input width |
>0 |
|
unsigned int |
736 |
The input height |
>0 |
tao ocdnet export -e $SPECS_DIR/export.yaml
export.checkpoint=<path_to_pth_file>
export.onnx_file=<path_to_onnx_file>
Required Arguments
-e, --experiment_spec
: The experiment spec file to set up exportexport.checkpoint
: The path to save the exported model toexport.onnx_file
: Show this help message and exit.
Here’s an example for using the OCDNet export
command:
tao model ocdnet export \
-e $SPECS_DIR/export.yaml \
export.checkpoint=$RESULTS_DIR/train/model_best.pth \
export.onnx_file=$RESULTS_DIR/export/model_best.onnx
For deployment, please refer to the TAO Deploy documentation.
If you are not running OCDNet TensorRT engine with tao deploy
, in other words, if there is no output when you run
nm -gDC /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin
in x86 platform or
nm -gDC /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so |grep ModulatedDeformableConvPlugin
in Jetson platform,
you need to compile/replace the TensorRT OSS plugin since OCDNet requires the modulatedDeformConvPlugin.
Get the TensorRT repository:
git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git cd TensorRT git submodule update --init --recursive
Compile the TensorRT libnvinfer_plugin.so file:
mkdir build && cd build # On X86 platform cmake .. # On Jetson platform cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/ make nvinfer_plugin -j12
The
libnvinfer_plugin.so.8.6.x
is generated under thebuild
folder. Note thatx
depends on the actual minor version.Replace the default plugin library. Note that the exact plugin name will depend on the TensorRT version installed in your system.
# On X86 platform, for example, if the default plugin is /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2, then cp libnvinfer_plugin.so.8.6.x /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.2 # On Jetson platform, for example, if the default plugin is /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2, then cp libnvinfer_plugin.so.8.6.x /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2
Refer to the nvOCDR page for more information about deploying an OCDNet model to DeepStream.
You can run nvOCDR with the DeepStream sample or Triton Inference Server. Specifically, nvOCDR Triton can support inference
against high resolution image. In short, it will resize the image while keeping aspect ratio and then tile the image to small patches,
and run OCDNet to get the output then merge the result. This is useful to improve hmean in case a model is trained with a smaller
resolution but will run inference against higher resolution images. For images which are not high resolution, you can also set
resize_keep_aspect_ratio:true
, this is useful to improve hmean because the images are resized without distortion.