OCRNet#
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model ocrnet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
Preparing the Dataset#
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use dataset_convert
to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The gt_list.txt
file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a characters_list.txt
file that contains all the
characters found in the dataset. Each character occupies one line.
Creating an Experiment Spec File#
The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export
).
Here is an example spec file used in the OCRNet get_started
notebook:
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
input_width: 100
input_height: 32
input_channel: 1
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict config |
– |
The configuration of the model architecture |
|
|
dict config |
– |
The configuration of the dataset |
|
|
dict config |
– |
The configuration of the training task |
|
|
dict config |
– |
The configuration of the evaluation task |
|
|
dict config |
– |
The configuration of the inference task |
|
|
string |
None |
The encryption key to encrypt and decrypt model files |
|
|
string |
/results |
The directory where experiment results are saved |
|
|
dict config |
– |
The configuration for the pruning |
|
|
dict config |
– |
The configuration of the export |
|
|
dict config |
– |
The configuration for the dataset convert |
model#
The model
parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Boolean |
False |
A flag that enables Thin-plate spline interpolation for the OCRNet input |
True/False |
|
Unsigned int |
20 |
The number of fiducial points for TPS |
>4 |
|
String |
ResNet |
The backbone of the OCRNet model |
ResNet, ResNet2X, FAN_tiny_2X |
|
Unsigned int |
512 |
The number of channels for the backbone output feature |
>0 |
|
String |
BiLSTM |
The sequence module of the OCRNet model |
BiLSTM |
|
Unsigned int |
256 |
The number of channels for the BiLSTM hidden layer |
>0 |
|
String |
CTC |
The method for encoding and decoding the output feature |
CTC, Attn |
|
Unsigned int |
100 |
The input image width |
>4 |
|
Unsigned int |
32 |
The input image height |
>32 |
|
Unsigned int |
1 |
The input image channel |
1,3 |
|
Boolean |
False |
A flag that enables quantize and dequantize nodes in the OCRNet backbone |
True/False |
dataset#
The dataset
parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
List of String |
None |
A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. |
List of String |
|
String |
None |
The absolute path to the evaluation dataset |
dataset absolute path |
|
String |
None |
The absolute path to character list file |
absolute file path |
|
Unsigned int |
25 |
The maximum length of the ground truth |
>0 |
|
Unsigned int |
32 |
The batch size for training |
>0 |
|
Unsigned int |
4 |
The number of workers to parallel preprocess the training data |
>=0 |
|
Dict config |
– |
The augmentation config. |
– |
augmentation#
The augmentation
parameter provides options to set augmentation pipeline during training.
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Bool |
False |
A flag to enable keeping aspect-ratio when resize the image to model input size |
False/True |
|
Float |
0.0 |
The probability to apply the following augmentation on the input image |
[0, 1] |
|
Float |
0.5 |
The probability to reverse the color of the input image |
[0, 1] |
|
Float |
0.5 |
The probability to random rotate the input image |
[0, 1] |
|
Float |
0.5 |
The maximum degree the image will be rotated |
>=0 |
|
Float |
0.5 |
The probability to blur the input image |
[0, 1] |
|
List of integer |
[1, 2, 3, 4] |
The list of radius when apply gaussian blur on the image |
– |
train#
The train
parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
unsigned int |
1 |
The number of GPUs to use for distributed training |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed training |
|
|
unsigned int |
1234 |
The random seed for random, numpy, and torch |
>0 |
|
unsigned int |
10 |
The total number of epochs to run the experiment |
>0 |
|
unsigned int |
1 |
The epoch interval at which the checkpoints are saved |
>0 |
|
unsigned int |
1 |
The epoch interval at which the validation is run |
>0 |
|
string |
The intermediate PyTorch Lightning checkpoint to resume training from |
||
|
string |
/results/train |
The directory to save training results |
|
|
Dict config |
– |
The configuration for the optimizer |
– |
|
Float |
5.0 |
The threshold value of magnitude of the gradient L2 norm to be clipped |
>4 |
|
String |
ddp |
The distributed strategy for multi-GPU training |
ddp |
|
String |
None |
The absolute path to pretrained weights |
– |
|
String |
None |
The absolute path to pretrained models for quantize-aware-training |
– |
|
Bool |
False |
Enable model exponential moving average in the training |
False/True |
optim#
The optim
provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
adadelta |
The optimizer type |
adadelta, adam |
|
Float |
1.0 |
The initial learning rate for the training |
>0.0 |
evaluate#
The evaluate
parameter provides options to set evaluation hyperparameters.
evaluate:
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for evaluation |
– |
|
String |
/results/evaluate |
The directory to save evaluation results |
|
|
Unsigned int |
1 |
The number of GPUs to use for distributed evaluation |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed evaluation |
|
|
String |
– |
The absolute path to the evaluation LMDB dataset |
– |
|
Unsigned int |
1 |
The evaluation batch size |
>0 |
prune#
The prune
parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for pruning |
– |
|
Unsigned int |
0 |
The GPU device index |
A valid gpu index |
|
String |
– |
The absolute path to the pruning log |
– |
|
String |
– |
The absolute path for storing the pruned model checkpoint |
– |
|
Dict config |
– |
The pruning hyperparameters |
– |
prune_setting#
The prune_setting
parameter contains options for the pruning algorithms:
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
amount |
The absolute path to the model checkpoint to be pruned:
|
amount, threshold, experimental_hybrid |
|
Float |
– |
The amount value for |
[0, 1] |
|
Float |
– |
The threshold value for threshold mode |
>=0 |
|
Unsigned int |
8 |
The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. |
>0 |
|
Dict config |
L1 |
The method for computing the importance of weights |
L1, L2 |
inference#
The inference
parameter provides options for inference.
inference:
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for inference |
– |
|
String |
/results/inference |
The directory to save inference results |
|
|
Unsigned int |
1 |
The number of GPUs to use for distributed inference |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed inference |
|
|
String |
– |
The absolute path to the inference images directory |
– |
|
Unsigned int |
1 |
The inference batch size |
>0 |
export#
The export
parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for export |
– |
|
Unsigned int |
0 |
The GPU device index |
Valid gpu index |
|
String |
– |
The absolute path to export ONNX file |
– |
|
String |
– |
The absolute path to the export output |
– |
dataset_convert#
The dataset_convert
parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to images directory |
– |
|
String |
– |
The absolute path to the ground truth file |
– |
|
String |
– |
The absolute path to |
– |
Converting dataset#
Use the following command to convert the raw dataset to LMDB format:
tao model ocrnet dataset_convert -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[dataset_convert.<dataset_convert_option>=<dataset_convert_value>]
Required Arguments#
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments#
You can set the optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. Thedataset_convert
results will be saved inresults_dir/dataset_convert
.dataset_convert.<dataset_convert_option>
: The dataset_convert options.
Training the Model#
Use the following command to start OCRNet training:
tao model ocrnet train -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.optim.<optim_option>=<optim_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments#
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Note
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training#
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint will also be saved as ocr_model_latest.pth
.
Training will automatically resume from ocr_model_latest.pth
if it exists in train.results_dir
.
This will be superseded by train.resume_training_checkpoint_path
if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Evaluating the Model#
Use the following command to start OCRNet evaluation:
tao model ocrnet evaluate -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for OCRNet.
Required Arguments#
-e, --experiment_spec_file
: THe xperiment spec file to set up the evaluation experiment. This should be the same as a training spec file.evaluate.checkpoint
: The.pth
model.
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.evaluate.<evaluate_option>
: The evaluate options.
Multi-GPU evaluation is currently not supported for OCRNet.
Pruning the Model#
Use the following command to start OCRNet pruning:
tao model ocrnet prune -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[prune.<prune_option>=<prune_option_value>]
[prune.prune_setting.<prune_setting_option>=<prune_setting_value>]
Required Arguments#
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. The pruning results will be saved inresults_dir/prune
.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.prune.<prune_option>
: The prune options.prune.<prune_setting_option>
: The prune setting options.
Note
If running training, evaluation, or inference on a pruned graph, you must provide the model.pruned_graph_path
parameter when running the respective task. It should be the same as the value provided for prune.pruned_file
in the prune task.
Inference with the Model#
Use the following command to start OCRNet inference:
tao model ocrnet inference -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[inference.<inference_option>=<inference_option_value>]
Multi-GPU inference is currently not supported for OCRNet.
Required Arguments#
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. The inference results will be saved inresults_dir/inference
.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.inference.<inference_option>
: The inference options.
Exporting the Model#
Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:
tao model ocrnet export -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[export.<export_option>=<export_option_value>]
Required Arguments#
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments#
You can set optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. The export results will be saved inresults_dir/export
.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.export.<export_option>
: The export options.
TensorRT Engine Generation and Validation#
For deployment, see TAO Deploy documentation.
Deploying to DeepStream#
For DeepStream integration, see Deploy nvOCDR to DeepStream.