OCRNet
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model ocrnet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use dataset_convert
to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The gt_list.txt
file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a characters_list.txt
file that contains all the
characters found in the dataset. Each character occupies one line.
The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export
).
Here is an example spec file used in the OCRNet get_started
notebook:
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
input_width: 100
input_height: 32
input_channel: 1
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter | Data Type | Default | Description |
results_dir |
String | – | The global results directory |
encryption_key |
String | – | The key to encode or decode the checkpoint |
model |
Dict config | – | The configuration for the model architecture |
dataset |
Dict config | – | The configuration for the dataset |
train |
Dict config | – | The configuration for the training process |
evaluate |
Dict config | – | The configuration for the evaluation |
prune |
Dict config | – | The configuration for the pruning |
inference |
Dict config | – | The configuration for the inference |
export |
Dict config | – | The configuration of the export |
dataset_convert |
Dict config | – | The configuration for the dataset convert |
model
The model
parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
TPS |
Boolean | False | A flag that enables Thin-plate spline interpolation for the OCRNet input | True/False |
num_fiducial |
Unsigned int | 20 | The number of fiducial points for TPS | >4 |
backbone |
String | ResNet | The backbone of the OCRNet model | ResNet, ResNet2X, FAN_tiny_2X |
feature_channel |
Unsigned int | 512 | The number of channels for the backbone output feature | >0 |
sequence |
String | BiLSTM | The sequence module of the OCRNet model | BiLSTM |
hidden_size |
Unsigned int | 256 | The number of channels for the BiLSTM hidden layer | >0 |
prediction |
String | CTC | The method for encoding and decoding the output feature | CTC, Attn |
input_width |
Unsigned int | 100 | The input image width | >4 |
input_height |
Unsigned int | 32 | The input image height | >32 |
input_channel |
Unsigned int | 1 | The input image channel | 1,3 |
quantize |
Boolean | False | A flag that enables quantize and dequantize nodes in the OCRNet backbone | True/False |
dataset
The dataset
parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
train_dataset_dir |
List of String | None | A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. | List of String |
val_dataset_dir |
String | None | The absolute path to the evaluation dataset | dataset absolute path |
character_list_file |
String | None | The absolute path to character list file | absolute file path |
max_label_length |
Unsigned int | 25 | The maximum length of the ground truth | >0 |
batch_size |
Unsigned int | 32 | The batch size for training | >0 |
workers |
Unsigned int | 4 | The number of workers to parallel preprocess the training data | >=0 |
augmentation |
Dict config | – | The augmentation config. | – |
augmentation
The augmentation
parameter provides options to set augmentation pipeline during training.
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
keep_aspect_ratio |
Bool | False | A flag to enable keeping aspect-ratio when resize the image to model input size | False/True |
aug_prob |
Float | 0.0 | The probability to apply the following augmentation on the input image | [0, 1] |
reverse_color_prob |
Float | 0.5 | The probability to reverse the color of the input image | [0, 1] |
rotate_prob |
Float | 0.5 | The probability to random rotate the input image | [0, 1] |
max_rotation_degree |
Float | 0.5 | The maximum degree the image will be rotated | >=0 |
blur_prob |
Float | 0.5 | The probability to blur the input image | [0, 1] |
gaussian_radius_list |
List of integer | [1, 2, 3, 4] | The list of radius when apply gaussian blur on the image | – |
train
The train
parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
seed |
Unsigned int | 1111 | The random seed for random, numpy, and torch | >0 |
results_dir |
String | – | The absolute path to the train results and output (log, checkpoints) | – |
gpu_ids |
List of Unsigned int | [0] | A list of GPU device indicies for training | list of GPU index |
|
Unsigned int |
1 |
The number of gpus to be used for training. |
– |
optim |
Dict config | – | The configuration for the optimizer | – |
clip_grad_norm |
Float | 5.0 | The threshold value of magnitude of the gradient L2 norm to be clipped | >4 |
num_epochs |
Unsigned int | 10 | The number of training epochs | >32 |
checkpoint_interval |
Unsigned int | 2 | The interval for saving the checkpoint during training | >0 |
validation_interval |
Unsigned int | 25 | The interval for performing validation during training | >0 |
distributed_strategy |
String | ddp | The distributed strategy for multi-GPU training | ddp |
resume_training_checkpoint_path |
String | None | The absolute path to a checkpoint for resuming training | – |
pretrained_model_path |
String | None | The absolute path to pretrained weights | – |
quantize_model_path |
String | None | The absolute path to pretrained models for quantize-aware-training | – |
model_ema |
Bool | False | Enable model exponential moving average in the training | False/True |
optim
The optim
provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
name |
String | adadelta | The optimizer type | adadelta, adam |
lr |
Float | 1.0 | The initial learning rate for the training | >0.0 |
evaluate
The evaluate
parameter provides options to set evaluation hyperparameters.
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for evaluation | – |
gpu_id |
Unsigned int | 0 | The GPU device index | A valid gpu index |
test_dataset_dir |
String | – | The absolute path to the evaluation LMDB dataset | – |
results_dir |
String | – | The absolute path to the evaluation output | – |
batch_size |
Unsigned int | 1 | The evaluation batch size | >0 |
prune
The prune
parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for pruning | – |
gpu_id |
Unsigned int | 0 | The GPU device index | A valid gpu index |
results_dir |
String | – | The absolute path to the pruning log | – |
pruned_file |
String | – | The absolute path for storing the pruned model checkpoint | – |
prune_setting |
Dict config | – | The pruning hyperparameters | – |
prune_setting
The prune_setting
parameter contains options for the pruning algorithms:
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
mode |
String | amount | The absolute path to the model checkpoint to be pruned:
|
amount, threshold, experimental_hybrid |
amount |
Float | – | The amount value for amount and experimental_hybrid mode |
[0, 1] |
threshold |
Float | – | The threshold value for threshold mode | >=0 |
granularity |
Unsigned int | 8 | The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. | >0 |
raw_prune_score |
Dict config | L1 | The method for computing the importance of weights | L1, L2 |
inference
The inference
parameter provides options for inference.
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for inference | – |
gpu_id |
Unsigned int | 0 | The GPU device index | Valid gpu index |
inference_dataset_dir |
String | – | The absolute path to the inference images directory | – |
results_dir |
String | – | The absolute path to the inference output | – |
batch_size |
Unsigned int | 1 | The inference batch size | >0 |
export
The export
parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for export | – |
gpu_id |
Unsigned int | 0 | The GPU device index | Valid gpu index |
onnx_file |
String | – | The absolute path to export ONNX file | – |
results_dir |
String | – | The absolute path to the export output | – |
dataset_convert
The dataset_convert
parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
input_img_dir |
String | – | The absolute path to images directory | – |
gt_file |
String | – | The absolute path to the ground truth file | – |
results_dir |
String | – | The absolute path to dataset_convert (i.e. the LMDB dataset and log) |
– |
Use the following command to convert the raw dataset to LMDB format:
tao model ocrnet dataset_convert -e <experiment_spec_file>
results_dir=<global_results_dir>
[dataset_convert.<dataset_convert_option>=<dataset_convert_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. Thedataset_convert
results will be saved inresults_dir/dataset_convert
.
Optional Arguments
You can set the optional arguments to override the option values in the experiment spec file.
dataset_convert.<dataset_convert_option>
: The dataset_convert options.
Here’s an example of using the OCRNet dataset_convert
command:
tao model ocrnet dataset_convert -e $DEFAULT_SPEC \
dataset_convert.input_img_dir=$TRAIN_IMG_DIR \
dataset_convert.gt_file=$TRAIN_GT \
dataset_convert.results=$TRAIN_LMDB_PATH
Use the following command to start OCRNet training:
tao model ocrnet train -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.optim.<optim_option>=<optim_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The train results will be saved inresults_dir/train
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
Here’s an example of using the OCRNet train command:
tao model ocrnet train -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
dataset.train_dataset_dir=$TRAIN_LMDB_PATH \
dataset.val_dataset_dir=$VAL_LMDB_PATH \
dataset.character_list_file=$CHARACTER_LIST
Use the following command to start OCRNet evaluation:
tao model ocrnet evaluate -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[evaluate.<evaluate_option>=<evaluate_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The evaluation results will be saved inresults_dir/evaluate
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.evaluate.<evaluate_option>
: The evaluate options.
Here’s an example of using the OCRNet evaluate command:
tao model ocrnet evaluate -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
evaluate.checkpoint=$TRAINED_TAO_MODEL \
evaluate.test_dataset_dir=$VAL_LMDB_PATH \
dataset.character_list_file=$CHARACTER_LIST
Use the following command to start OCRNet pruning:
tao model ocrnet prune -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[prune.<prune_option>=<prune_option_value>]
[prune.prune_setting.<prune_setting_option>=<prune_setting_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The pruning results will be saved inresults_dir/prune
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.prune.<prune_option>
: The prune options.prune.<prune_setting_option>
: The prune setting options.
Here’s an example of using the OCRNet prune command:
tao model ocrnet prune -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
prune.checkpoint=$TRAINED_TAO_MODEL \
prune.pruned_file=$PRUNED_TAO_MODEL
Use the following command to start OCRNet inference:
tao model ocrnet inference -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[inference.<inference_option>=<inference_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The inference results will be saved inresults_dir/inference
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.inference.<inference_option>
: The inference options.
Here’s an example of using the OCRNet inference command:
tao model ocrnet inference -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
inference.checkpoint=$TRAINED_TAO_MODEL \
inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR
Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:
tao model ocrnet export -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.results_dir
: The global results directory. The export results will be saved inresults_dir/export
.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.export.<export_option>
: The export options.
Here’s an example of using the OCRNet export command:
tao model ocrnet export -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
export.checkpoint=$TRAINED_TAO_MODEL \
export.onnx_file=$EXPORTED_ONNX_MODEL_PATH
For deployment, please refer to TAO Deploy documentation.
For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.