OCRNet
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:
tao model ocrnet <sub_task> <args_per_subtask>
Where
args_per_subtask are the command-line arguments required for a given subtask. Each
subtask is explained in detail in the following sections.
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use
dataset_convert to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The
gt_list.txt file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a
characters_list.txt file that contains all the
characters found in the dataset. Each character occupies one line.
The experiment spec file includes arguments for all the tasks supported by OCRNet (
train/evaluate/inference/prune/export).
Here is an example spec file used in the OCRNet
get_started notebook:
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
input_width: 100
input_height: 32
input_channel: 1
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
|Parameter
|Data Type
|Default
|Description
|
results_dir
|String
|–
|The global results directory
|
encryption_key
|String
|–
|The key to encode or decode the checkpoint
|
model
|Dict config
|–
|The configuration for the model architecture
|
dataset
|Dict config
|–
|The configuration for the dataset
|
train
|Dict config
|–
|The configuration for the training process
|
evaluate
|Dict config
|–
|The configuration for the evaluation
|
prune
|Dict config
|–
|The configuration for the pruning
|
inference
|Dict config
|–
|The configuration for the inference
|
export
|Dict config
|–
|The configuration of the export
|
dataset_convert
|Dict config
|–
|The configuration for the dataset convert
model
The
model parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
TPS
|Boolean
|False
|A flag that enables Thin-plate spline interpolation for the OCRNet input
|True/False
|
num_fiducial
|Unsigned int
|20
|The number of fiducial points for TPS
|>4
|
backbone
|String
|ResNet
|The backbone of the OCRNet model
|ResNet, ResNet2X, FAN_tiny_2X
|
feature_channel
|Unsigned int
|512
|The number of channels for the backbone output feature
|>0
|
sequence
|String
|BiLSTM
|The sequence module of the OCRNet model
|BiLSTM
|
hidden_size
|Unsigned int
|256
|The number of channels for the BiLSTM hidden layer
|>0
|
prediction
|String
|CTC
|The method for encoding and decoding the output feature
|CTC, Attn
|
input_width
|Unsigned int
|100
|The input image width
|>4
|
input_height
|Unsigned int
|32
|The input image height
|>32
|
input_channel
|Unsigned int
|1
|The input image channel
|1,3
|
quantize
|Boolean
|False
|A flag that enables quantize and dequantize nodes in the OCRNet backbone
|True/False
dataset
The
dataset parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
train_dataset_dir
|List of String
|None
|A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported.
|List of String
|
val_dataset_dir
|String
|None
|The absolute path to the evaluation dataset
|dataset absolute path
|
character_list_file
|String
|None
|The absolute path to character list file
|absolute file path
|
max_label_length
|Unsigned int
|25
|The maximum length of the ground truth
|>0
|
batch_size
|Unsigned int
|32
|The batch size for training
|>0
|
workers
|Unsigned int
|4
|The number of workers to parallel preprocess the training data
|>=0
|
augmentation
|Dict config
|–
|The augmentation config.
|–
augmentation
The
augmentation parameter provides options to set augmentation pipeline during training.
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
keep_aspect_ratio
|Bool
|False
|A flag to enable keeping aspect-ratio when resize the image to model input size
|False/True
|
aug_prob
|Float
|0.0
|The probability to apply the following augmentation on the input image
|[0, 1]
|
reverse_color_prob
|Float
|0.5
|The probability to reverse the color of the input image
|[0, 1]
|
rotate_prob
|Float
|0.5
|The probability to random rotate the input image
|[0, 1]
|
max_rotation_degree
|Float
|0.5
|The maximum degree the image will be rotated
|>=0
|
blur_prob
|Float
|0.5
|The probability to blur the input image
|[0, 1]
|
gaussian_radius_list
|List of integer
|[1, 2, 3, 4]
|The list of radius when apply gaussian blur on the image
|–
train
The
train parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
seed
|Unsigned int
|1111
|The random seed for random, numpy, and torch
|>0
|
results_dir
|String
|–
|The absolute path to the train results and output (log, checkpoints)
|–
|
gpu_ids
|List of Unsigned int
|[0]
|A list of GPU device indicies for training
|list of GPU index
|
|
Unsigned int
|
1
|
The number of gpus to be used for training.
|
–
|
optim
|Dict config
|–
|The configuration for the optimizer
|–
|
clip_grad_norm
|Float
|5.0
|The threshold value of magnitude of the gradient L2 norm to be clipped
|>4
|
num_epochs
|Unsigned int
|10
|The number of training epochs
|>32
|
checkpoint_interval
|Unsigned int
|2
|The interval for saving the checkpoint during training
|>0
|
validation_interval
|Unsigned int
|25
|The interval for performing validation during training
|>0
|
distributed_strategy
|String
|ddp
|The distributed strategy for multi-GPU training
|ddp
|
resume_training_checkpoint_path
|String
|None
|The absolute path to a checkpoint for resuming training
|–
|
pretrained_model_path
|String
|None
|The absolute path to pretrained weights
|–
|
quantize_model_path
|String
|None
|The absolute path to pretrained models for quantize-aware-training
|–
|
model_ema
|Bool
|False
|Enable model exponential moving average in the training
|False/True
optim
The
optim provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
name
|String
|adadelta
|The optimizer type
|adadelta, adam
|
lr
|Float
|1.0
|The initial learning rate for the training
|>0.0
evaluate
The
evaluate parameter provides options to set evaluation hyperparameters.
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
checkpoint
|String
|–
|The absolute path to the model checkpoint for evaluation
|–
|
gpu_id
|Unsigned int
|0
|The GPU device index
|A valid gpu index
|
test_dataset_dir
|String
|–
|The absolute path to the evaluation LMDB dataset
|–
|
results_dir
|String
|–
|The absolute path to the evaluation output
|–
|
batch_size
|Unsigned int
|1
|The evaluation batch size
|>0
prune
The
prune parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
checkpoint
|String
|–
|The absolute path to the model checkpoint for pruning
|–
|
gpu_id
|Unsigned int
|0
|The GPU device index
|A valid gpu index
|
results_dir
|String
|–
|The absolute path to the pruning log
|–
|
pruned_file
|String
|–
|The absolute path for storing the pruned model checkpoint
|–
|
prune_setting
|Dict config
|–
|The pruning hyperparameters
|–
prune_setting
The
prune_setting parameter contains options for the pruning algorithms:
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
mode
|String
|amount
|The absolute path to the model checkpoint to be pruned:
|amount, threshold, experimental_hybrid
|
amount
|Float
|–
|The amount value for
amount and
experimental_hybrid mode
|[0, 1]
|
threshold
|Float
|–
|The threshold value for threshold mode
|>=0
|
granularity
|Unsigned int
|8
|The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity.
|>0
|
raw_prune_score
|Dict config
|L1
|The method for computing the importance of weights
|L1, L2
inference
The
inference parameter provides options for inference.
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
checkpoint
|String
|–
|The absolute path to the model checkpoint for inference
|–
|
gpu_id
|Unsigned int
|0
|The GPU device index
|Valid gpu index
|
inference_dataset_dir
|String
|–
|The absolute path to the inference images directory
|–
|
results_dir
|String
|–
|The absolute path to the inference output
|–
|
batch_size
|Unsigned int
|1
|The inference batch size
|>0
export
The
export parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
checkpoint
|String
|–
|The absolute path to the model checkpoint for export
|–
|
gpu_id
|Unsigned int
|0
|The GPU device index
|Valid gpu index
|
onnx_file
|String
|–
|The absolute path to export ONNX file
|–
|
results_dir
|String
|–
|The absolute path to the export output
|–
dataset_convert
The
dataset_convert parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
input_img_dir
|String
|–
|The absolute path to images directory
|–
|
gt_file
|String
|–
|The absolute path to the ground truth file
|–
|
results_dir
|String
|–
|The absolute path to
dataset_convert (i.e. the LMDB dataset and log)
|–
Use the following command to convert the raw dataset to LMDB format:
tao model ocrnet dataset_convert -e <experiment_spec_file>
results_dir=<global_results_dir>
[dataset_convert.<dataset_convert_option>=<dataset_convert_value>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The
dataset_convertresults will be saved in
results_dir/dataset_convert.
Optional Arguments
You can set the optional arguments to override the option values in the experiment spec file.
dataset_convert.<dataset_convert_option>: The dataset_convert options.
Here’s an example of using the OCRNet
dataset_convert command:
tao model ocrnet dataset_convert -e $DEFAULT_SPEC \
dataset_convert.input_img_dir=$TRAIN_IMG_DIR \
dataset_convert.gt_file=$TRAIN_GT \
dataset_convert.results=$TRAIN_LMDB_PATH
Use the following command to start OCRNet training:
tao model ocrnet train -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.optim.<optim_option>=<optim_option_value>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The train results will be saved in
results_dir/train.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options
Here’s an example of using the OCRNet train command:
tao model ocrnet train -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
dataset.train_dataset_dir=$TRAIN_LMDB_PATH \
dataset.val_dataset_dir=$VAL_LMDB_PATH \
dataset.character_list_file=$CHARACTER_LIST
Use the following command to start OCRNet evaluation:
tao model ocrnet evaluate -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[evaluate.<evaluate_option>=<evaluate_option_value>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The evaluation results will be saved in
results_dir/evaluate.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
evaluate.<evaluate_option>: The evaluate options.
Here’s an example of using the OCRNet evaluate command:
tao model ocrnet evaluate -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
evaluate.checkpoint=$TRAINED_TAO_MODEL \
evaluate.test_dataset_dir=$VAL_LMDB_PATH \
dataset.character_list_file=$CHARACTER_LIST
Use the following command to start OCRNet pruning:
tao model ocrnet prune -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[prune.<prune_option>=<prune_option_value>]
[prune.prune_setting.<prune_setting_option>=<prune_setting_value>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The pruning results will be saved in
results_dir/prune.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
prune.<prune_option>: The prune options.
prune.<prune_setting_option>: The prune setting options.
Here’s an example of using the OCRNet prune command:
tao model ocrnet prune -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
prune.checkpoint=$TRAINED_TAO_MODEL \
prune.pruned_file=$PRUNED_TAO_MODEL
Use the following command to start OCRNet inference:
tao model ocrnet inference -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[inference.<inference_option>=<inference_option_value>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The inference results will be saved in
results_dir/inference.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
inference.<inference_option>: The inference options.
Here’s an example of using the OCRNet inference command:
tao model ocrnet inference -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
inference.checkpoint=$TRAINED_TAO_MODEL \
inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR
Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:
tao model ocrnet export -e <experiment_spec_file>
results_dir=<global_results_dir>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[export.<export_option>=<export_option_value>]
Required Arguments
-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The export results will be saved in
results_dir/export.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
export.<export_option>: The export options.
Here’s an example of using the OCRNet export command:
tao model ocrnet export -e $DEFAULT_SPEC \
results_dir=$RESULTS_DIR \
export.checkpoint=$TRAINED_TAO_MODEL \
export.onnx_file=$EXPORTED_ONNX_MODEL_PATH
For deployment, please refer to TAO Deploy documentation.
For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.