OCRNet
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
These tasks can be invoked from the TAO Launcher using the following convention on the command-line:
tao model ocrnet <sub_task> <args_per_subtask>
Where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use dataset_convert
to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The gt_list.txt
file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a characters_list.txt
file that contains all the
characters found in the dataset. Each character occupies one line.
The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export
).
Here is an example spec file used in the OCRNet get_started
notebook:
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
input_width: 100
input_height: 32
input_channel: 1
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter | Data Type | Default | Description | Supported Values |
model |
dict config | – | The configuration of the model architecture | |
dataset |
dict config | – | The configuration of the dataset | |
train |
dict config | – | The configuration of the training task | |
evaluate |
dict config | – | The configuration of the evaluation task | |
inference |
dict config | – | The configuration of the inference task | |
encryption_key |
string | None | The encryption key to encrypt and decrypt model files | |
results_dir |
string | /results | The directory where experiment results are saved | |
prune |
dict config | – | The configuration for the pruning | |
export |
dict config | – | The configuration of the export | |
dataset_convert |
dict config | – | The configuration for the dataset convert |
model
The model
parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
TPS |
Boolean | False | A flag that enables Thin-plate spline interpolation for the OCRNet input | True/False |
num_fiducial |
Unsigned int | 20 | The number of fiducial points for TPS | >4 |
backbone |
String | ResNet | The backbone of the OCRNet model | ResNet, ResNet2X, FAN_tiny_2X |
feature_channel |
Unsigned int | 512 | The number of channels for the backbone output feature | >0 |
sequence |
String | BiLSTM | The sequence module of the OCRNet model | BiLSTM |
hidden_size |
Unsigned int | 256 | The number of channels for the BiLSTM hidden layer | >0 |
prediction |
String | CTC | The method for encoding and decoding the output feature | CTC, Attn |
input_width |
Unsigned int | 100 | The input image width | >4 |
input_height |
Unsigned int | 32 | The input image height | >32 |
input_channel |
Unsigned int | 1 | The input image channel | 1,3 |
quantize |
Boolean | False | A flag that enables quantize and dequantize nodes in the OCRNet backbone | True/False |
dataset
The dataset
parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
train_dataset_dir |
List of String | None | A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. | List of String |
val_dataset_dir |
String | None | The absolute path to the evaluation dataset | dataset absolute path |
character_list_file |
String | None | The absolute path to character list file | absolute file path |
max_label_length |
Unsigned int | 25 | The maximum length of the ground truth | >0 |
batch_size |
Unsigned int | 32 | The batch size for training | >0 |
workers |
Unsigned int | 4 | The number of workers to parallel preprocess the training data | >=0 |
augmentation |
Dict config | – | The augmentation config. | – |
augmentation
The augmentation
parameter provides options to set augmentation pipeline during training.
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
keep_aspect_ratio |
Bool | False | A flag to enable keeping aspect-ratio when resize the image to model input size | False/True |
aug_prob |
Float | 0.0 | The probability to apply the following augmentation on the input image | [0, 1] |
reverse_color_prob |
Float | 0.5 | The probability to reverse the color of the input image | [0, 1] |
rotate_prob |
Float | 0.5 | The probability to random rotate the input image | [0, 1] |
max_rotation_degree |
Float | 0.5 | The maximum degree the image will be rotated | >=0 |
blur_prob |
Float | 0.5 | The probability to blur the input image | [0, 1] |
gaussian_radius_list |
List of integer | [1, 2, 3, 4] | The list of radius when apply gaussian blur on the image | – |
train
The train
parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
Parameter | Datatype | Default | Description | Supported Values |
num_gpus |
unsigned int | 1 | The number of GPUs to use for distributed training | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed training | |
seed |
unsigned int | 1234 | The random seed for random, numpy, and torch | >0 |
num_epochs |
unsigned int | 10 | The total number of epochs to run the experiment | >0 |
checkpoint_interval |
unsigned int | 1 | The epoch interval at which the checkpoints are saved | >0 |
validation_interval |
unsigned int | 1 | The epoch interval at which the validation is run | >0 |
resume_training_checkpoint_path |
string | The intermediate PyTorch Lightning checkpoint to resume training from | ||
results_dir |
string | /results/train | The directory to save training results | |
optim |
Dict config | – | The configuration for the optimizer | – |
clip_grad_norm |
Float | 5.0 | The threshold value of magnitude of the gradient L2 norm to be clipped | >4 |
distributed_strategy |
String | ddp | The distributed strategy for multi-GPU training | ddp |
pretrained_model_path |
String | None | The absolute path to pretrained weights | – |
quantize_model_path |
String | None | The absolute path to pretrained models for quantize-aware-training | – |
model_ema |
Bool | False | Enable model exponential moving average in the training | False/True |
optim
The optim
provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
name |
String | adadelta | The optimizer type | adadelta, adam |
lr |
Float | 1.0 | The initial learning rate for the training | >0.0 |
evaluate
The evaluate
parameter provides options to set evaluation hyperparameters.
evaluate:
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for evaluation | – |
results_dir |
String | /results/evaluate | The directory to save evaluation results | |
num_gpus |
Unsigned int | 1 | The number of GPUs to use for distributed evaluation | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed evaluation | |
test_dataset_dir |
String | – | The absolute path to the evaluation LMDB dataset | – |
batch_size |
Unsigned int | 1 | The evaluation batch size | >0 |
prune
The prune
parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for pruning | – |
gpu_id |
Unsigned int | 0 | The GPU device index | A valid gpu index |
results_dir |
String | – | The absolute path to the pruning log | – |
pruned_file |
String | – | The absolute path for storing the pruned model checkpoint | – |
prune_setting |
Dict config | – | The pruning hyperparameters | – |
prune_setting
The prune_setting
parameter contains options for the pruning algorithms:
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
mode |
String | amount | The absolute path to the model checkpoint to be pruned:
|
amount, threshold, experimental_hybrid |
amount |
Float | – | The amount value for amount and experimental_hybrid mode |
[0, 1] |
threshold |
Float | – | The threshold value for threshold mode | >=0 |
granularity |
Unsigned int | 8 | The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. | >0 |
raw_prune_score |
Dict config | L1 | The method for computing the importance of weights | L1, L2 |
inference
The inference
parameter provides options for inference.
inference:
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for inference | – |
results_dir |
String | /results/inference | The directory to save inference results | |
num_gpus |
Unsigned int | 1 | The number of GPUs to use for distributed inference | >0 |
gpu_ids |
List[int] | [0] | The indices of the GPU’s to use for distributed inference | |
inference_dataset_dir |
String | – | The absolute path to the inference images directory | – |
batch_size |
Unsigned int | 1 | The inference batch size | >0 |
export
The export
parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
checkpoint |
String | – | The absolute path to the model checkpoint for export | – |
gpu_id |
Unsigned int | 0 | The GPU device index | Valid gpu index |
onnx_file |
String | – | The absolute path to export ONNX file | – |
results_dir |
String | – | The absolute path to the export output | – |
dataset_convert
The dataset_convert
parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Datatype |
Default |
Description |
Supported Values |
---|---|---|---|---|
input_img_dir |
String | – | The absolute path to images directory | – |
gt_file |
String | – | The absolute path to the ground truth file | – |
results_dir |
String | – | The absolute path to dataset_convert (i.e. the LMDB dataset and log) |
– |
Use the following command to convert the raw dataset to LMDB format:
tao model ocrnet dataset_convert -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[dataset_convert.<dataset_convert_option>=<dataset_convert_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments
You can set the optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. Thedataset_convert
results will be saved inresults_dir/dataset_convert
.dataset_convert.<dataset_convert_option>
: The dataset_convert options.
Use the following command to start OCRNet training:
tao model ocrnet train -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.optim.<optim_option>=<optim_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
-e, --experiment_spec
: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help
: Show this help message and exit.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.train.<train_option>
: The train options.train.optim.<optim_option>
: The optimizer options
For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus
and gpu_ids
, which
default to 1
and [0]
, respectively. If both are passed, but inconsistent, for example num_gpus = 1
,
gpu_ids = [0, 1]
, then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2
.
Checkpointing and Resuming Training
At every train.checkpoint_interval
, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth
.
These are saved in train.results_dir
, like so:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint will also be saved as ocr_model_latest.pth
.
Training will automatically resume from ocr_model_latest.pth
if it exists in train.results_dir
.
This will be superseded by train.resume_training_checkpoint_path
if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Use the following command to start OCRNet evaluation:
tao model ocrnet evaluate -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for OCRNet.
Required Arguments
-e, --experiment_spec_file
: THe xperiment spec file to set up the evaluation experiment. This should be the same as a training spec file.evaluate.checkpoint
: The.pth
model.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.evaluate.<evaluate_option>
: The evaluate options.
Multi-GPU evaluation is currently not supported for OCRNet.
Use the following command to start OCRNet pruning:
tao model ocrnet prune -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[prune.<prune_option>=<prune_option_value>]
[prune.prune_setting.<prune_setting_option>=<prune_setting_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. The pruning results will be saved inresults_dir/prune
.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.prune.<prune_option>
: The prune options.prune.<prune_setting_option>
: The prune setting options.
If running training, evaluation, or inference on a pruned graph, you must provide the model.pruned_graph_path
parameter when running the respective task. It should be the same as the value provided for prune.pruned_file
in the prune task.
Use the following command to start OCRNet inference:
tao model ocrnet inference -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[inference.<inference_option>=<inference_option_value>]
Multi-GPU inference is currently not supported for OCRNet.
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. The inference results will be saved inresults_dir/inference
.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.inference.<inference_option>
: The inference options.
Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:
tao model ocrnet export -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec_file
: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
results_dir
: The global results directory. The export results will be saved inresults_dir/export
.model.<model_option>
: The model options.dataset.<dataset_option>
: The dataset options.export.<export_option>
: The export options.
For deployment, see TAO Deploy documentation.
For DeepStream integration, see Deploy nvOCDR to DeepStream.