NVIDIA TAO Toolkit v5.2.0
v5.2.0

OCRNet

OCRNet is a model to recognize characters in an image. It supports the following tasks:

  • dataset_convert

  • train

  • evaluate

  • prune

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model ocrnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

The train dataset and evaluation dataset for OCRNet is in LMDB format. You can use dataset_convert to convert the original images and labels to LMDB format. The original dataset should be organized in the following structure:

Copy
Copied!
            

/Dataset /images 0000.jpg 0001.jpg 0002.jpg ... gt_list.txt characters_list.txt

The gt_list.txt file contains all the ground truth text for the images, and each image and its corresponding label is specified with one line of text:

Copy
Copied!
            

0000.jpg abc 0001.jpg defg 0002.jpg zxv ...

There is a characters_list.txt file that contains all the characters found in the dataset. Each character occupies one line.

The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export). Here is an example spec file used in the OCRNet get_started notebook:

Copy
Copied!
            

results_dir: /results encryption_key: nvidia_tao model: TPS: True backbone: ResNet feature_channel: 512 sequence: BiLSTM hidden_size: 256 prediction: CTC quantize: False input_width: 100 input_height: 32 input_channel: 1 dataset: train_dataset_dir: [] val_dataset_dir: /data/test/lmdb character_list_file: /data/character_list max_label_length: 25 batch_size: 32 workers: 4 augmentation: keep_aspect_ratio: False train: seed: 1111 gpu_ids: [0] optim: name: "adadelta" lr: 1.0 clip_grad_norm: 5.0 num_epochs: 10 checkpoint_interval: 2 validation_interval: 1 evaluate: gpu_id: 0 checkpoint: "??" test_dataset_dir: "??" results_dir: "${results_dir}/evaluate" prune: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/prune" prune_setting: mode: experimental_hybrid amount: 0.4 granularity: 8 raw_prune_score: L1 inference: gpu_id: 0 checkpoint: "??" inference_dataset_dir: "??" results_dir: "${results_dir}/inference" export: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/export" dataset_convert: input_img_dir: "??" gt_file: "??" results_dir: "${results_dir}/convert_dataset" gen_trt_engine: onnx_file: "??" results_dir: "${results_dir}/convert_dataset"

Parameter Data Type Default Description
results_dir String – The global results directory
encryption_key String – The key to encode or decode the checkpoint
model Dict config – The configuration for the model architecture
dataset Dict config – The configuration for the dataset
train Dict config – The configuration for the training process
evaluate Dict config – The configuration for the evaluation
prune Dict config – The configuration for the pruning
inference Dict config – The configuration for the inference
export Dict config – The configuration of the export
dataset_convert Dict config – The configuration for the dataset convert

model

The model parameter provides options to change the architecture of OCRNet.

Copy
Copied!
            

model: TPS: True backbone: ResNet feature_channel: 512 sequence: BiLSTM hidden_size: 256 prediction: CTC quantize: False

Parameter

Datatype

Default

Description

Supported Values

TPS Boolean False A flag that enables Thin-plate spline interpolation for the OCRNet input True/False
num_fiducial Unsigned int 20 The number of fiducial points for TPS >4
backbone String ResNet The backbone of the OCRNet model ResNet, ResNet2X, FAN_tiny_2X
feature_channel Unsigned int 512 The number of channels for the backbone output feature >0
sequence String BiLSTM The sequence module of the OCRNet model BiLSTM
hidden_size Unsigned int 256 The number of channels for the BiLSTM hidden layer >0
prediction String CTC The method for encoding and decoding the output feature CTC, Attn
input_width Unsigned int 100 The input image width >4
input_height Unsigned int 32 The input image height >32
input_channel Unsigned int 1 The input image channel 1,3
quantize Boolean False A flag that enables quantize and dequantize nodes in the OCRNet backbone True/False

dataset

The dataset parameter provides options to set the dataset consumed in training and evaluation.

Copy
Copied!
            

dataset: train_dataset_dir: [/data/train/lmdb] val_dataset_dir: /data/test/lmdb character_list_file: /data/character_list max_label_length: 25 batch_size: 32 workers: 4 augmentation: keep_aspect_ratio: False aug_prob: 0.3 reverse_color_prob: 0.5 rotate_prob: 0.5 max_rotation_degree: 5 blur_prob: 0.5 gaussian_radius_list: [1, 2, 3, 4]

Parameter

Datatype

Default

Description

Supported Values

train_dataset_dir List of String None A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. List of String
val_dataset_dir String None The absolute path to the evaluation dataset dataset absolute path
character_list_file String None The absolute path to character list file absolute file path
max_label_length Unsigned int 25 The maximum length of the ground truth >0
batch_size Unsigned int 32 The batch size for training >0
workers Unsigned int 4 The number of workers to parallel preprocess the training data >=0
augmentation Dict config – The augmentation config. –

augmentation

The augmentation parameter provides options to set augmentation pipeline during training.

Copy
Copied!
            

augmentation: keep_aspect_ratio: False aug_prob: 0.3 reverse_color_prob: 0.5 rotate_prob: 0.5 max_rotation_degree: 5 blur_prob: 0.5 gaussian_radius_list: [1, 2, 3, 4]

Parameter

Datatype

Default

Description

Supported Values

keep_aspect_ratio Bool False A flag to enable keeping aspect-ratio when resize the image to model input size False/True
aug_prob Float 0.0 The probability to apply the following augmentation on the input image [0, 1]
reverse_color_prob Float 0.5 The probability to reverse the color of the input image [0, 1]
rotate_prob Float 0.5 The probability to random rotate the input image [0, 1]
max_rotation_degree Float 0.5 The maximum degree the image will be rotated >=0
blur_prob Float 0.5 The probability to blur the input image [0, 1]
gaussian_radius_list List of integer [1, 2, 3, 4] The list of radius when apply gaussian blur on the image –

train

The train parameter provides options to set training hyperparameters.

Copy
Copied!
            

train: seed: 1111 gpu_ids: [0] optim: name: "adadelta" lr: 1.0 clip_grad_norm: 5.0 num_epochs: 10 checkpoint_interval: 2 validation_interval: 1

Parameter

Datatype

Default

Description

Supported Values

seed Unsigned int 1111 The random seed for random, numpy, and torch >0
results_dir String – The absolute path to the train results and output (log, checkpoints) –
gpu_ids List of Unsigned int [0] A list of GPU device indicies for training list of GPU index

num_gpus

Unsigned int

1

The number of gpus to be used for training.
When setting num_gpus to enable multi-gpu (>1) training, gpu_ids will not take effect

–

optim Dict config – The configuration for the optimizer –
clip_grad_norm Float 5.0 The threshold value of magnitude of the gradient L2 norm to be clipped >4
num_epochs Unsigned int 10 The number of training epochs >32
checkpoint_interval Unsigned int 2 The interval for saving the checkpoint during training >0
validation_interval Unsigned int 25 The interval for performing validation during training >0
distributed_strategy String ddp The distributed strategy for multi-GPU training ddp
resume_training_checkpoint_path String None The absolute path to a checkpoint for resuming training –
pretrained_model_path String None The absolute path to pretrained weights –
quantize_model_path String None The absolute path to pretrained models for quantize-aware-training –
model_ema Bool False Enable model exponential moving average in the training False/True

optim

The optim provides the options to set the optimizer for the training.

Copy
Copied!
            

optim: name: "adadelta" lr: 1.0

Parameter

Datatype

Default

Description

Supported Values

name String adadelta The optimizer type adadelta, adam
lr Float 1.0 The initial learning rate for the training >0.0

evaluate

The evaluate parameter provides options to set evaluation hyperparameters.

Copy
Copied!
            

evaluate: gpu_id: 0 checkpoint: "??" test_dataset_dir: "??" results_dir: "${results_dir}/evaluate"

Parameter

Datatype

Default

Description

Supported Values

checkpoint String – The absolute path to the model checkpoint for evaluation –
gpu_id Unsigned int 0 The GPU device index A valid gpu index
test_dataset_dir String – The absolute path to the evaluation LMDB dataset –
results_dir String – The absolute path to the evaluation output –
batch_size Unsigned int 1 The evaluation batch size >0

prune

The prune parameter provides options to set prune hyperparameters.

Copy
Copied!
            

gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/prune" prune_setting: mode: experimental_hybrid amount: 0.4 granularity: 8 raw_prune_score: L1

Parameter

Datatype

Default

Description

Supported Values

checkpoint String – The absolute path to the model checkpoint for pruning –
gpu_id Unsigned int 0 The GPU device index A valid gpu index
results_dir String – The absolute path to the pruning log –
pruned_file String – The absolute path for storing the pruned model checkpoint –
prune_setting Dict config – The pruning hyperparameters –

prune_setting

The prune_setting parameter contains options for the pruning algorithms:

Parameter

Datatype

Default

Description

Supported Values

mode String amount The absolute path to the model checkpoint to be pruned:
  • amount: Prune the amount ratio of weights according to the importance
  • threshold: Prune weights with importance smaller than the threshold value
  • experimental_hybrid: Prune weights using a hybrid of threshold and amount
amount, threshold, experimental_hybrid
amount Float – The amount value for amount and experimental_hybrid mode [0, 1]
threshold Float – The threshold value for threshold mode >=0
granularity Unsigned int 8 The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. >0
raw_prune_score Dict config L1 The method for computing the importance of weights L1, L2

inference

The inference parameter provides options for inference.

Copy
Copied!
            

inference: gpu_id: 0 checkpoint: "??" inference_dataset_dir: "??" results_dir: "${results_dir}/inference"

Parameter

Datatype

Default

Description

Supported Values

checkpoint String – The absolute path to the model checkpoint for inference –
gpu_id Unsigned int 0 The GPU device index Valid gpu index
inference_dataset_dir String – The absolute path to the inference images directory –
results_dir String – The absolute path to the inference output –
batch_size Unsigned int 1 The inference batch size >0

export

The export parameter provides export options.

Copy
Copied!
            

export: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/export"

Parameter

Datatype

Default

Description

Supported Values

checkpoint String – The absolute path to the model checkpoint for export –
gpu_id Unsigned int 0 The GPU device index Valid gpu index
onnx_file String – The absolute path to export ONNX file –
results_dir String – The absolute path to the export output –

dataset_convert

The dataset_convert parameter provides options to set dataset conversion.

Copy
Copied!
            

dataset_convert: input_img_dir: "??" gt_file: "??" results_dir: "${results_dir}/convert_dataset"

Parameter

Datatype

Default

Description

Supported Values

input_img_dir String – The absolute path to images directory –
gt_file String – The absolute path to the ground truth file –
results_dir String – The absolute path to dataset_convert (i.e. the LMDB dataset and log) –

Use the following command to convert the raw dataset to LMDB format:

Copy
Copied!
            

tao model ocrnet dataset_convert -e <experiment_spec_file> results_dir=<global_results_dir> [dataset_convert.<dataset_convert_option>=<dataset_convert_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The dataset_convert results will be saved in results_dir/dataset_convert.

Optional Arguments

You can set the optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet dataset_convert command:

Copy
Copied!
            

tao model ocrnet dataset_convert -e $DEFAULT_SPEC \ dataset_convert.input_img_dir=$TRAIN_IMG_DIR \ dataset_convert.gt_file=$TRAIN_GT \ dataset_convert.results=$TRAIN_LMDB_PATH

Use the following command to start OCRNet training:

Copy
Copied!
            

tao model ocrnet train -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.optim.<optim_option>=<optim_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The train results will be saved in results_dir/train.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet train command:

Copy
Copied!
            

tao model ocrnet train -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ dataset.train_dataset_dir=$TRAIN_LMDB_PATH \ dataset.val_dataset_dir=$VAL_LMDB_PATH \ dataset.character_list_file=$CHARACTER_LIST

Use the following command to start OCRNet evaluation:

Copy
Copied!
            

tao model ocrnet evaluate -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [evaluate.<evaluate_option>=<evaluate_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The evaluation results will be saved in results_dir/evaluate.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet evaluate command:

Copy
Copied!
            

tao model ocrnet evaluate -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ evaluate.checkpoint=$TRAINED_TAO_MODEL \ evaluate.test_dataset_dir=$VAL_LMDB_PATH \ dataset.character_list_file=$CHARACTER_LIST

Use the following command to start OCRNet pruning:

Copy
Copied!
            

tao model ocrnet prune -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [prune.<prune_option>=<prune_option_value>] [prune.prune_setting.<prune_setting_option>=<prune_setting_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The pruning results will be saved in results_dir/prune.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet prune command:

Copy
Copied!
            

tao model ocrnet prune -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ prune.checkpoint=$TRAINED_TAO_MODEL \ prune.pruned_file=$PRUNED_TAO_MODEL

Use the following command to start OCRNet inference:

Copy
Copied!
            

tao model ocrnet inference -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [inference.<inference_option>=<inference_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The inference results will be saved in results_dir/inference.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet inference command:

Copy
Copied!
            

tao model ocrnet inference -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ inference.checkpoint=$TRAINED_TAO_MODEL \ inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR

Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:

Copy
Copied!
            

tao model ocrnet export -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The export results will be saved in results_dir/export.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet export command:

Copy
Copied!
            

tao model ocrnet export -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ export.checkpoint=$TRAINED_TAO_MODEL \ export.onnx_file=$EXPORTED_ONNX_MODEL_PATH

For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.

Previous LPRNet
Next ActionRecognitionNet
© Copyright 2024, NVIDIA. Last updated on Mar 18, 2024.