NVIDIA TAO Toolkit v5.3.0
OCRNet

OCRNet is a model to recognize characters in an image. It supports the following tasks:

  • dataset_convert

  • train

  • evaluate

  • prune

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

tao model ocrnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Preparing the Dataset

The train dataset and evaluation dataset for OCRNet is in LMDB format. You can use dataset_convert to convert the original images and labels to LMDB format. The original dataset should be organized in the following structure:

/Dataset
    /images
        0000.jpg
        0001.jpg
        0002.jpg
        ...
    gt_list.txt
    characters_list.txt

The gt_list.txt file contains all the ground truth text for the images, and each image and its corresponding label is specified with one line of text:

0000.jpg abc
0001.jpg defg
0002.jpg zxv
...

There is a characters_list.txt file that contains all the characters found in the dataset. Each character occupies one line.

Creating an Experiment Spec File

The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export). Here is an example spec file used in the OCRNet get_started notebook:

results_dir: /results
encryption_key: nvidia_tao
model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False
  input_width: 100
  input_height: 32
  input_channel: 1
dataset:
  train_dataset_dir: []
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 1.0
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1
evaluate:
  gpu_id: 0
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"
prune:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/prune"
  prune_setting:
    mode: experimental_hybrid
    amount: 0.4
    granularity: 8
    raw_prune_score: L1
inference:
  gpu_id: 0
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${results_dir}/inference"
export:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/export"
dataset_convert:
  input_img_dir: "??"
  gt_file: "??"
  results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
  onnx_file: "??"
  results_dir: "${results_dir}/convert_dataset"

Parameter Data Type Default Description
results_dir String The global results directory
encryption_key String The key to encode or decode the checkpoint
model Dict config The configuration for the model architecture
dataset Dict config The configuration for the dataset
train Dict config The configuration for the training process
evaluate Dict config The configuration for the evaluation
prune Dict config The configuration for the pruning
inference Dict config The configuration for the inference
export Dict config The configuration of the export
dataset_convert Dict config The configuration for the dataset convert

model

The model parameter provides options to change the architecture of OCRNet.

model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False

Parameter

Datatype

Default

Description

Supported Values
TPS Boolean False A flag that enables Thin-plate spline interpolation for the OCRNet input True/False
num_fiducial Unsigned int 20 The number of fiducial points for TPS >4
backbone String ResNet The backbone of the OCRNet model ResNet, ResNet2X, FAN_tiny_2X
feature_channel Unsigned int 512 The number of channels for the backbone output feature >0
sequence String BiLSTM The sequence module of the OCRNet model BiLSTM
hidden_size Unsigned int 256 The number of channels for the BiLSTM hidden layer >0
prediction String CTC The method for encoding and decoding the output feature CTC, Attn
input_width Unsigned int 100 The input image width >4
input_height Unsigned int 32 The input image height >32
input_channel Unsigned int 1 The input image channel 1,3
quantize Boolean False A flag that enables quantize and dequantize nodes in the OCRNet backbone True/False

dataset

The dataset parameter provides options to set the dataset consumed in training and evaluation.

dataset:
  train_dataset_dir: [/data/train/lmdb]
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
    aug_prob: 0.3
    reverse_color_prob: 0.5
    rotate_prob: 0.5
    max_rotation_degree: 5
    blur_prob: 0.5
    gaussian_radius_list: [1, 2, 3, 4]

Parameter

Datatype

Default

Description

Supported Values
train_dataset_dir List of String None A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. List of String
val_dataset_dir String None The absolute path to the evaluation dataset dataset absolute path
character_list_file String None The absolute path to character list file absolute file path
max_label_length Unsigned int 25 The maximum length of the ground truth >0
batch_size Unsigned int 32 The batch size for training >0
workers Unsigned int 4 The number of workers to parallel preprocess the training data >=0
augmentation Dict config The augmentation config.

augmentation

The augmentation parameter provides options to set augmentation pipeline during training.

augmentation:
  keep_aspect_ratio: False
  aug_prob: 0.3
  reverse_color_prob: 0.5
  rotate_prob: 0.5
  max_rotation_degree: 5
  blur_prob: 0.5
  gaussian_radius_list: [1, 2, 3, 4]

Parameter

Datatype

Default

Description

Supported Values
keep_aspect_ratio Bool False A flag to enable keeping aspect-ratio when resize the image to model input size False/True
aug_prob Float 0.0 The probability to apply the following augmentation on the input image [0, 1]
reverse_color_prob Float 0.5 The probability to reverse the color of the input image [0, 1]
rotate_prob Float 0.5 The probability to random rotate the input image [0, 1]
max_rotation_degree Float 0.5 The maximum degree the image will be rotated >=0
blur_prob Float 0.5 The probability to blur the input image [0, 1]
gaussian_radius_list List of integer [1, 2, 3, 4] The list of radius when apply gaussian blur on the image

train

The train parameter provides options to set training hyperparameters.

train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 1.0
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1

Parameter

Datatype

Default

Description

Supported Values
seed Unsigned int 1111 The random seed for random, numpy, and torch >0
results_dir String The absolute path to the train results and output (log, checkpoints)
gpu_ids List of Unsigned int [0] A list of GPU device indicies for training list of GPU index

num_gpus

Unsigned int

1

The number of gpus to be used for training.
When setting num_gpus to enable multi-gpu (>1) training, gpu_ids will not take effect

optim Dict config The configuration for the optimizer
clip_grad_norm Float 5.0 The threshold value of magnitude of the gradient L2 norm to be clipped >4
num_epochs Unsigned int 10 The number of training epochs >32
checkpoint_interval Unsigned int 2 The interval for saving the checkpoint during training >0
validation_interval Unsigned int 25 The interval for performing validation during training >0
distributed_strategy String ddp The distributed strategy for multi-GPU training ddp
resume_training_checkpoint_path String None The absolute path to a checkpoint for resuming training
pretrained_model_path String None The absolute path to pretrained weights
quantize_model_path String None The absolute path to pretrained models for quantize-aware-training
model_ema Bool False Enable model exponential moving average in the training False/True

optim

The optim provides the options to set the optimizer for the training.

optim:
  name: "adadelta"
  lr: 1.0

Parameter

Datatype

Default

Description

Supported Values
name String adadelta The optimizer type adadelta, adam
lr Float 1.0 The initial learning rate for the training >0.0

evaluate

The evaluate parameter provides options to set evaluation hyperparameters.

evaluate:
  gpu_id: 0
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"

Parameter

Datatype

Default

Description

Supported Values
checkpoint String The absolute path to the model checkpoint for evaluation
gpu_id Unsigned int 0 The GPU device index A valid gpu index
test_dataset_dir String The absolute path to the evaluation LMDB dataset
results_dir String The absolute path to the evaluation output
batch_size Unsigned int 1 The evaluation batch size >0

prune

The prune parameter provides options to set prune hyperparameters.

gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
  mode: experimental_hybrid
  amount: 0.4
  granularity: 8
  raw_prune_score: L1

Parameter

Datatype

Default

Description

Supported Values
checkpoint String The absolute path to the model checkpoint for pruning
gpu_id Unsigned int 0 The GPU device index A valid gpu index
results_dir String The absolute path to the pruning log
pruned_file String The absolute path for storing the pruned model checkpoint
prune_setting Dict config The pruning hyperparameters

prune_setting

The prune_setting parameter contains options for the pruning algorithms:

Parameter

Datatype

Default

Description

Supported Values
mode String amount The absolute path to the model checkpoint to be pruned:
  • amount: Prune the amount ratio of weights according to the importance
  • threshold: Prune weights with importance smaller than the threshold value
  • experimental_hybrid: Prune weights using a hybrid of threshold and amount
 amount, threshold, experimental_hybrid
amount Float The amount value for amount and experimental_hybrid mode [0, 1]
threshold Float The threshold value for threshold mode >=0
granularity Unsigned int 8 The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. >0
raw_prune_score Dict config L1 The method for computing the importance of weights L1, L2

inference

The inference parameter provides options for inference.

inference:
  gpu_id: 0
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${results_dir}/inference"

Parameter

Datatype

Default

Description

Supported Values
checkpoint String The absolute path to the model checkpoint for inference
gpu_id Unsigned int 0 The GPU device index Valid gpu index
inference_dataset_dir String The absolute path to the inference images directory
results_dir String The absolute path to the inference output
batch_size Unsigned int 1 The inference batch size >0

export

The export parameter provides export options.

export:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/export"

Parameter

Datatype

Default

Description

Supported Values
checkpoint String The absolute path to the model checkpoint for export
gpu_id Unsigned int 0 The GPU device index Valid gpu index
onnx_file String The absolute path to export ONNX file
results_dir String The absolute path to the export output

dataset_convert

The dataset_convert parameter provides options to set dataset conversion.

dataset_convert:
  input_img_dir: "??"
  gt_file: "??"
  results_dir: "${results_dir}/convert_dataset"

Parameter

Datatype

Default

Description

Supported Values
input_img_dir String The absolute path to images directory
gt_file String The absolute path to the ground truth file
results_dir String The absolute path to dataset_convert (i.e. the LMDB dataset and log)

Converting dataset

Use the following command to convert the raw dataset to LMDB format:

tao model ocrnet dataset_convert -e <experiment_spec_file>
                           results_dir=<global_results_dir>
                           [dataset_convert.<dataset_convert_option>=<dataset_convert_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The dataset_convert results will be saved in results_dir/dataset_convert.

Optional Arguments

You can set the optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet dataset_convert command:

tao model ocrnet dataset_convert -e $DEFAULT_SPEC \
                           dataset_convert.input_img_dir=$TRAIN_IMG_DIR \
                           dataset_convert.gt_file=$TRAIN_GT \
                           dataset_convert.results=$TRAIN_LMDB_PATH


Training the Model

Use the following command to start OCRNet training:

tao model ocrnet train -e <experiment_spec_file>
                 results_dir=<global_results_dir>
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [train.<train_option>=<train_option_value>]
                 [train.optim.<optim_option>=<optim_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The train results will be saved in results_dir/train.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet train command:

tao model ocrnet train -e $DEFAULT_SPEC \
                 results_dir=$RESULTS_DIR \
                 dataset.train_dataset_dir=$TRAIN_LMDB_PATH \
                 dataset.val_dataset_dir=$VAL_LMDB_PATH \
                 dataset.character_list_file=$CHARACTER_LIST


Evaluating the Model

Use the following command to start OCRNet evaluation:

tao model ocrnet evaluate -e <experiment_spec_file>
                    results_dir=<global_results_dir>
                    [model.<model_option>=<model_option_value>]
                    [dataset.<dataset_option>=<dataset_option_value>]
                    [evaluate.<evaluate_option>=<evaluate_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The evaluation results will be saved in results_dir/evaluate.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet evaluate command:

tao model ocrnet evaluate -e $DEFAULT_SPEC \
                    results_dir=$RESULTS_DIR \
                    evaluate.checkpoint=$TRAINED_TAO_MODEL \
                    evaluate.test_dataset_dir=$VAL_LMDB_PATH \
                    dataset.character_list_file=$CHARACTER_LIST


Pruning the Model

Use the following command to start OCRNet pruning:

tao model ocrnet prune -e <experiment_spec_file>
                 results_dir=<global_results_dir>
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [prune.<prune_option>=<prune_option_value>]
                 [prune.prune_setting.<prune_setting_option>=<prune_setting_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The pruning results will be saved in results_dir/prune.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet prune command:

tao model ocrnet prune -e $DEFAULT_SPEC \
                 results_dir=$RESULTS_DIR \
                 prune.checkpoint=$TRAINED_TAO_MODEL \
                 prune.pruned_file=$PRUNED_TAO_MODEL


Inference with the Model

Use the following command to start OCRNet inference:

tao model ocrnet inference -e <experiment_spec_file>
                     results_dir=<global_results_dir>
                     [model.<model_option>=<model_option_value>]
                     [dataset.<dataset_option>=<dataset_option_value>]
                     [inference.<inference_option>=<inference_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The inference results will be saved in results_dir/inference.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet inference command:

tao model ocrnet inference -e $DEFAULT_SPEC \
                     results_dir=$RESULTS_DIR \
                     inference.checkpoint=$TRAINED_TAO_MODEL \
                     inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR


Exporting the Model

Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:

tao model ocrnet export -e <experiment_spec_file>
                  results_dir=<global_results_dir>
                  [model.<model_option>=<model_option_value>]
                  [dataset.<dataset_option>=<dataset_option_value>]
                  [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The export results will be saved in results_dir/export.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet export command:

tao model ocrnet export -e $DEFAULT_SPEC \
                  results_dir=$RESULTS_DIR \
                  export.checkpoint=$TRAINED_TAO_MODEL \
                  export.onnx_file=$EXPORTED_ONNX_MODEL_PATH


TensorRT engine generation, validation

For deployment, please refer to TAO Deploy documentation.

Deploying to DeepStream

For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.
