OCRNet#

OCRNet is a model to recognize characters in an image. It supports the following tasks:

  • dataset_convert

  • train

  • evaluate

  • prune

  • inference

  • export

These tasks can be invoked from the TAO Launcher using the following convention on the command-line:

tao model ocrnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Preparing the Dataset#

The train dataset and evaluation dataset for OCRNet is in LMDB format. You can use dataset_convert to convert the original images and labels to LMDB format. The original dataset should be organized in the following structure:

/Dataset
    /images
        0000.jpg
        0001.jpg
        0002.jpg
        ...
    gt_list.txt
    characters_list.txt

The gt_list.txt file contains all the ground truth text for the images, and each image and its corresponding label is specified with one line of text:

0000.jpg abc
0001.jpg defg
0002.jpg zxv
...

There is a characters_list.txt file that contains all the characters found in the dataset. Each character occupies one line.

Creating an Experiment Spec File#

The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export). Here is an example spec file used in the OCRNet get_started notebook:

results_dir: /results
encryption_key: nvidia_tao
model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False
  input_width: 100
  input_height: 32
  input_channel: 1
dataset:
  train_dataset_dir: []
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 1.0
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1
evaluate:
  gpu_id: 0
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"
prune:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/prune"
  prune_setting:
    mode: experimental_hybrid
    amount: 0.4
    granularity: 8
    raw_prune_score: L1
inference:
  gpu_id: 0
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${results_dir}/inference"
export:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/export"
dataset_convert:
  input_img_dir: "??"
  gt_file: "??"
  results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
  onnx_file: "??"
  results_dir: "${results_dir}/convert_dataset"

Parameter

Data Type

Default

Description

Supported Values

model

dict config

The configuration of the model architecture

dataset

dict config

The configuration of the dataset

train

dict config

The configuration of the training task

evaluate

dict config

The configuration of the evaluation task

inference

dict config

The configuration of the inference task

encryption_key

string

None

The encryption key to encrypt and decrypt model files

results_dir

string

/results

The directory where experiment results are saved

prune

dict config

The configuration for the pruning

export

dict config

The configuration of the export

dataset_convert

dict config

The configuration for the dataset convert

model#

The model parameter provides options to change the architecture of OCRNet.

model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False

Parameter

Datatype

Default

Description

Supported Values

TPS

Boolean

False

A flag that enables Thin-plate spline interpolation for the OCRNet input

True/False

num_fiducial

Unsigned int

20

The number of fiducial points for TPS

>4

backbone

String

ResNet

The backbone of the OCRNet model

ResNet, ResNet2X, FAN_tiny_2X

feature_channel

Unsigned int

512

The number of channels for the backbone output feature

>0

sequence

String

BiLSTM

The sequence module of the OCRNet model

BiLSTM

hidden_size

Unsigned int

256

The number of channels for the BiLSTM hidden layer

>0

prediction

String

CTC

The method for encoding and decoding the output feature

CTC, Attn

input_width

Unsigned int

100

The input image width

>4

input_height

Unsigned int

32

The input image height

>32

input_channel

Unsigned int

1

The input image channel

1,3

quantize

Boolean

False

A flag that enables quantize and dequantize nodes in the OCRNet backbone

True/False

dataset#

The dataset parameter provides options to set the dataset consumed in training and evaluation.

dataset:
  train_dataset_dir: [/data/train/lmdb]
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
    aug_prob: 0.3
    reverse_color_prob: 0.5
    rotate_prob: 0.5
    max_rotation_degree: 5
    blur_prob: 0.5
    gaussian_radius_list: [1, 2, 3, 4]

Parameter

Datatype

Default

Description

Supported Values

train_dataset_dir

List of String

None

A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported.

List of String

val_dataset_dir

String

None

The absolute path to the evaluation dataset

dataset absolute path

character_list_file

String

None

The absolute path to character list file

absolute file path

max_label_length

Unsigned int

25

The maximum length of the ground truth

>0

batch_size

Unsigned int

32

The batch size for training

>0

workers

Unsigned int

4

The number of workers to parallel preprocess the training data

>=0

augmentation

Dict config

The augmentation config.

augmentation#

The augmentation parameter provides options to set augmentation pipeline during training.

augmentation:
  keep_aspect_ratio: False
  aug_prob: 0.3
  reverse_color_prob: 0.5
  rotate_prob: 0.5
  max_rotation_degree: 5
  blur_prob: 0.5
  gaussian_radius_list: [1, 2, 3, 4]

Parameter

Datatype

Default

Description

Supported Values

keep_aspect_ratio

Bool

False

A flag to enable keeping aspect-ratio when resize the image to model input size

False/True

aug_prob

Float

0.0

The probability to apply the following augmentation on the input image

[0, 1]

reverse_color_prob

Float

0.5

The probability to reverse the color of the input image

[0, 1]

rotate_prob

Float

0.5

The probability to random rotate the input image

[0, 1]

max_rotation_degree

Float

0.5

The maximum degree the image will be rotated

>=0

blur_prob

Float

0.5

The probability to blur the input image

[0, 1]

gaussian_radius_list

List of integer

[1, 2, 3, 4]

The list of radius when apply gaussian blur on the image

train#

The train parameter provides options to set training hyperparameters.

train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 1.0
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1

Parameter

Datatype

Default

Description

Supported Values

num_gpus

unsigned int

1

The number of GPUs to use for distributed training

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed training

seed

unsigned int

1234

The random seed for random, numpy, and torch

>0

num_epochs

unsigned int

10

The total number of epochs to run the experiment

>0

checkpoint_interval

unsigned int

1

The epoch interval at which the checkpoints are saved

>0

validation_interval

unsigned int

1

The epoch interval at which the validation is run

>0

resume_training_checkpoint_path

string

The intermediate PyTorch Lightning checkpoint to resume training from

results_dir

string

/results/train

The directory to save training results

optim

Dict config

The configuration for the optimizer

clip_grad_norm

Float

5.0

The threshold value of magnitude of the gradient L2 norm to be clipped

>4

distributed_strategy

String

ddp

The distributed strategy for multi-GPU training

ddp

pretrained_model_path

String

None

The absolute path to pretrained weights

quantize_model_path

String

None

The absolute path to pretrained models for quantize-aware-training

model_ema

Bool

False

Enable model exponential moving average in the training

False/True

optim#

The optim provides the options to set the optimizer for the training.

optim:
  name: "adadelta"
  lr: 1.0

Parameter

Datatype

Default

Description

Supported Values

name

String

adadelta

The optimizer type

adadelta, adam

lr

Float

1.0

The initial learning rate for the training

>0.0

evaluate#

The evaluate parameter provides options to set evaluation hyperparameters.

evaluate:
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for evaluation

results_dir

String

/results/evaluate

The directory to save evaluation results

num_gpus

Unsigned int

1

The number of GPUs to use for distributed evaluation

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed evaluation

test_dataset_dir

String

The absolute path to the evaluation LMDB dataset

batch_size

Unsigned int

1

The evaluation batch size

>0

prune#

The prune parameter provides options to set prune hyperparameters.

gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
  mode: experimental_hybrid
  amount: 0.4
  granularity: 8
  raw_prune_score: L1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for pruning

gpu_id

Unsigned int

0

The GPU device index

A valid gpu index

results_dir

String

The absolute path to the pruning log

pruned_file

String

The absolute path for storing the pruned model checkpoint

prune_setting

Dict config

The pruning hyperparameters

prune_setting#

The prune_setting parameter contains options for the pruning algorithms:

Parameter

Datatype

Default

Description

Supported Values

mode

String

amount

The absolute path to the model checkpoint to be pruned:

  • amount: Prune the amount ratio of weights according to the importance

  • threshold: Prune weights with importance smaller than the threshold value

  • experimental_hybrid: Prune weights using a hybrid of threshold and amount

amount, threshold, experimental_hybrid

amount

Float

The amount value for amount and experimental_hybrid mode

[0, 1]

threshold

Float

The threshold value for threshold mode

>=0

granularity

Unsigned int

8

The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity.

>0

raw_prune_score

Dict config

L1

The method for computing the importance of weights

L1, L2

inference#

The inference parameter provides options for inference.

inference:
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${results_dir}/inference"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for inference

results_dir

String

/results/inference

The directory to save inference results

num_gpus

Unsigned int

1

The number of GPUs to use for distributed inference

>0

gpu_ids

List[int]

[0]

The indices of the GPU’s to use for distributed inference

inference_dataset_dir

String

The absolute path to the inference images directory

batch_size

Unsigned int

1

The inference batch size

>0

export#

The export parameter provides export options.

export:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/export"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for export

gpu_id

Unsigned int

0

The GPU device index

Valid gpu index

onnx_file

String

The absolute path to export ONNX file

results_dir

String

The absolute path to the export output

dataset_convert#

The dataset_convert parameter provides options to set dataset conversion.

dataset_convert:
  input_img_dir: "??"
  gt_file: "??"
  results_dir: "${results_dir}/convert_dataset"

Parameter

Datatype

Default

Description

Supported Values

input_img_dir

String

The absolute path to images directory

gt_file

String

The absolute path to the ground truth file

results_dir

String

The absolute path to dataset_convert (i.e. the LMDB dataset and log)

Converting dataset#

Use the following command to convert the raw dataset to LMDB format:

tao model ocrnet dataset_convert -e <experiment_spec_file>
                 [results_dir=<global_results_dir>]
                 [dataset_convert.<dataset_convert_option>=<dataset_convert_value>]

Required Arguments#

  • -e, --experiment_spec_file: The path to the experiment spec file.

Optional Arguments#

You can set the optional arguments to override the option values in the experiment spec file.

  • results_dir: The global results directory. The dataset_convert results will be saved in results_dir/dataset_convert.

  • dataset_convert.<dataset_convert_option>: The dataset_convert options.

Training the Model#

Use the following command to start OCRNet training:

tao model ocrnet train -e <experiment_spec_file>
                 [results_dir=<global_results_dir>]
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [train.<train_option>=<train_option_value>]
                 [train.optim.<optim_option>=<optim_option_value>]
                 [train.gpu_ids=<gpu indices>]
                 [train.num_gpus=<number of gpus>]

Required Arguments#

  • -e, --experiment_spec: The experiment specification file to set up the training experiment

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

For training, evaluation, and inference, we expose 2 variables for each respective task: num_gpus and gpu_ids, which default to 1 and [0], respectively. If both are passed, but inconsistent, for example num_gpus = 1, gpu_ids = [0, 1], then they are modified to follow the setting with more GPUs, for example num_gpus = 1 -> num_gpus = 2.

Checkpointing and Resuming Training#

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

$ ls /results/train

'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'

The latest checkpoint will also be saved as ocr_model_latest.pth. Training will automatically resume from ocr_model_latest.pth if it exists in train.results_dir. This will be superseded by train.resume_training_checkpoint_path if it is provided.

The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:

  • Specify a new, empty results directory (Recommended)

  • Remove the latest checkpoint from the results directory

Evaluating the Model#

Use the following command to start OCRNet evaluation:

tao model ocrnet evaluate -e <experiment_spec_file>
                 evaluate.checkpoint=<model to be evaluated>
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [evaluate.<evaluate_option>=<evaluate_option_value>]
                 [evaluate.gpu_ids=<gpu indices>]
                 [evaluate.num_gpus=<number of gpus>]

Multi-GPU evaluation is currently not supported for OCRNet.

Required Arguments#

  • -e, --experiment_spec_file: THe xperiment spec file to set up the evaluation experiment. This should be the same as a training spec file.

  • evaluate.checkpoint: The .pth model.

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Multi-GPU evaluation is currently not supported for OCRNet.

Pruning the Model#

Use the following command to start OCRNet pruning:

tao model ocrnet prune -e <experiment_spec_file>
                 [results_dir=<global_results_dir>]
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [prune.<prune_option>=<prune_option_value>]
                 [prune.prune_setting.<prune_setting_option>=<prune_setting_value>]

Required Arguments#

  • -e, --experiment_spec_file: The path to the experiment spec file.

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

Note

If running training, evaluation, or inference on a pruned graph, you must provide the model.pruned_graph_path parameter when running the respective task. It should be the same as the value provided for prune.pruned_file in the prune task.

Inference with the Model#

Use the following command to start OCRNet inference:

tao model ocrnet inference -e <experiment_spec_file>
                 [results_dir=<global_results_dir>]
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [inference.<inference_option>=<inference_option_value>]

Multi-GPU inference is currently not supported for OCRNet.

Required Arguments#

  • -e, --experiment_spec_file: The path to the experiment spec file.

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

  • results_dir: The global results directory. The inference results will be saved in results_dir/inference.

  • model.<model_option>: The model options.

  • dataset.<dataset_option>: The dataset options.

  • inference.<inference_option>: The inference options.

Exporting the Model#

Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:

tao model ocrnet export -e <experiment_spec_file>
                 [results_dir=<global_results_dir>]
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [export.<export_option>=<export_option_value>]

Required Arguments#

  • -e, --experiment_spec_file: The path to the experiment spec file.

Optional Arguments#

You can set optional arguments to override the option values in the experiment spec file.

  • results_dir: The global results directory. The export results will be saved in results_dir/export.

  • model.<model_option>: The model options.

  • dataset.<dataset_option>: The dataset options.

  • export.<export_option>: The export options.

TensorRT Engine Generation and Validation#

For deployment, see TAO Deploy documentation.

Deploying to DeepStream#

For DeepStream integration, see Deploy nvOCDR to DeepStream.