OCRNet
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model ocrnet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use dataset_convert
to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The gt_list.txt
file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a characters_list.txt
file that contains all the
characters found in the dataset. Each character occupies one line.
The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export
).
Here is an example spec file used in the OCRNet get_started
notebook:
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
input_width: 100
input_height: 32
input_channel: 1
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Data Type |
Default |
Description |
|
String |
– |
The global results directory |
|
String |
– |
The key to encode or decode the checkpoint |
|
Dict config |
– |
The configuration for the model architecture |
|
Dict config |
– |
The configuration for the dataset |
|
Dict config |
– |
The configuration for the training process |
|
Dict config |
– |
The configuration for the evaluation |
|
Dict config |
– |
The configuration for the pruning |
|
Dict config |
– |
The configuration for the inference |
|
Dict config |
– |
The configuration of the export |
|
Dict config |
– |
The configuration for the dataset convert |
model
The model
parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Boolean |
False |
A flag that enables Thin-plate spline interpolation for the OCRNet input |
True/False |
|
Unsigned int |
20 |
The number of fiducial points for TPS |
>4 |
|
String |
ResNet |
The backbone of the OCRNet model |
ResNet |
|
Unsigned int |
512 |
The number of channels for the backbone output feature |
>0 |
|
String |
BiLSTM |
The sequence module of the OCRNet model |
BiLSTM |
|
Unsigned int |
256 |
The number of channels for the BiLSTM hidden layer |
>0 |
|
String |
CTC |
The method for encoding and decoding the output feature |
CTC |
|
Boolean |
False |
A flag that enables quantize and dequantize nodes in the OCRNet backbone |
True/False |
dataset
The dataset
parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
input_width: 100
input_height: 32
input_channel: 1
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
List of String |
None |
A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. |
List of String |
|
String |
None |
The absolute path to the evaluation dataset |
dataset absolute path |
|
String |
None |
The absolute path to character list file |
absolute file path |
|
Unsigned int |
100 |
The input image width |
>4 |
|
Unsigned int |
32 |
The input image height |
>32 |
|
Unsigned int |
1 |
The input image channel |
1,3 |
|
Unsigned int |
25 |
The maximum length of the ground truth |
>0 |
|
Unsigned int |
32 |
The batch size for training |
>0 |
|
Unsigned int |
4 |
The number of workers to parallel preprocess the training data |
>=0 |
|
Dict config |
– |
The augmentation config. Currently, only the |
– |
train
The train
parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
Unsigned int |
1111 |
The random seed for random, numpy, and torch |
>0 |
|
String |
– |
The absolute path to the train results and output (log, checkpoints) |
– |
|
List of Unsigned int |
[0] |
A list of GPU device indicies for training |
list of GPU index |
|
Dict config |
– |
The configuration for the optimizer |
– |
|
Float |
5.0 |
The threshold value of magnitude of the gradient L2 norm to be clipped |
>4 |
|
Unsigned int |
10 |
The number of training epochs |
>32 |
|
Unsigned int |
2 |
The interval for saving the checkpoint during training |
>0 |
|
Unsigned int |
25 |
The interval for performing validation during training |
>0 |
|
String |
ddp |
The distributed strategy for multi-GPU training |
ddp |
|
String |
None |
The absolute path to a checkpoint for resuming training |
– |
|
String |
None |
The absolute path to pretrained weights |
– |
|
String |
None |
The absolute path to pretrained models for quantize-aware-training |
– |
optim
The optim
provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
adadelta |
The optimizer type |
adadelta, adam |
|
Float |
1.0 |
The initial learning rate for the training |
>0.0 |
evaluate
The evaluate
parameter provides options to set evaluation hyperparameters.
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for evaluation |
– |
|
Unsigned int |
0 |
The GPU device index |
A valid gpu index |
|
String |
– |
The absolute path to the evaluation LMDB dataset |
– |
|
String |
– |
The absolute path to the evaluation output |
– |
|
Unsigned int |
1 |
The evaluation batch size |
>0 |
prune
The prune
parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for pruning |
– |
|
Unsigned int |
0 |
The GPU device index |
A valid gpu index |
|
String |
– |
The absolute path to the pruning log |
– |
|
String |
– |
The absolute path for storing the pruned model checkpoint |
– |
|
Dict config |
– |
The pruning hyperparameters |
– |
prune_setting
The prune_setting
parameter contains options for the pruning algorithms:
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
amount |
The absolute path to the model checkpoint to be pruned:
|
amount, threshold, experimental_hybrid |
|
Float |
– |
The amount value for |
[0, 1] |
|
Float |
– |
The threshold value for threshold mode |
>=0 |
|
Unsigned int |
8 |
The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. |
>0 |
|
Dict config |
L1 |
The method for computing the importance of weights |
L1, L2 |
inference
The inference
parameter provides options for inference.
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for inference |
– |
|
Unsigned int |
0 |
The GPU device index |
Valid gpu index |
|
String |
– |
The absolute path to the inference images directory |
– |
|
String |
– |
The absolute path to the inference output |
– |
|
Unsigned int |
1 |
The inference batch size |
>0 |
export
The export
parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for export |
– |
|
Unsigned int |
0 |
The GPU device index |
Valid gpu index |
|
String |
– |
The absolute path to export ONNX file |
– |
|
String |
– |
The absolute path to the export output |
– |
dataset_convert
The dataset_convert
parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
|
String |
– |
The absolute path to images directory |
– |
|
String |
– |
The absolute path to the ground truth file |
– |
|
String |
– |
The absolute path to |
– |
Use the following command to convert the raw dataset to LMDB format:
tao model ocrnet dataset_convert -e <experiment_spec_file>
results_dir=<global_results_dir>
[dataset_convert.<dataset_convert_option>=<dataset_convert_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. Thedataset_convert
results will be saved inresults_dir/dataset_convert
.
Optional Arguments
You can set the optional arguments to override the option values in the experiment spec file.
dataset_convert.<dataset_convert_option>
: The dataset_convert options.
Here’s an example of using the OCRNet dataset_convert
command:
tao model ocrnet dataset_convert -e $DEFAULT_SPEC \
dataset_convert.input_img_dir=$TRAIN_IMG_DIR \
dataset_convert.gt_file=$TRAIN_GT \
dataset_convert.results=$TRAIN_LMDB_PATH
Use the following command to start OCRNet training:
tao model ocrnet train -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.optim.<optim_option>=<optim_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The train results will be saved inresults_dir/train
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Here’s an example of using the OCRNet train command:
tao model ocrnet train -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
dataset.train_dataset_dir=$TRAIN_LMDB_PATH \
dataset.val_dataset_dir=$VAL_LMDB_PATH \
dataset.character_list_file=$CHARACTER_LIST
Use the following command to start OCRNet evaluation:
tao model ocrnet evaluate -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[evaluate.<evaluate_option>=<evaluate_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The evaluation results will be saved inresults_dir/evaluate
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.evaluate.<evaluate_option>
: The evaluate options.
Here’s an example of using the OCRNet evaluate command:
tao model ocrnet evaluate -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
evaluate.checkpoint=$TRAINED_TAO_MODEL \
evaluate.test_dataset_dir=$VAL_LMDB_PATH \
dataset.character_list_file=$CHARACTER_LIST
Use the following command to start OCRNet pruning:
tao model ocrnet prune -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[prune.<prune_option>=<prune_option_value>]
[prune.prune_setting.<prune_setting_option>=<prune_setting_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The pruning results will be saved inresults_dir/prune
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.prune.<prune_option>
: The prune options.prune.<prune_setting_option>
: The prune setting options.
Here’s an example of using the OCRNet prune command:
tao model ocrnet prune -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
prune.checkpoint=$TRAINED_TAO_MODEL \
prune.pruned_file=$PRUNED_TAO_MODEL
Use the following command to start OCRNet inference:
tao model ocrnet inference -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[inference.<inference_option>=<inference_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The inference results will be saved inresults_dir/inference
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.inference.<inference_option>
: The inference options.
Here’s an example of using the OCRNet inference command:
tao model ocrnet inference -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
inference.checkpoint=$TRAINED_TAO_MODEL \
inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR
Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:
tao model ocrnet export -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The export results will be saved inresults_dir/export
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.export.<export_option>
: The export options.
Here’s an example of using the OCRNet export command:
tao model ocrnet export -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
export.checkpoint=$TRAINED_TAO_MODEL \
export.onnx_file=$EXPORTED_ONNX_MODEL_PATH
For deployment, please refer to TAO Deploy documentation.
For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.