NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

OCRNet

OCRNet is a model to recognize characters in an image. It supports the following tasks:

  • dataset_convert

  • train

  • evaluate

  • prune

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model ocrnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

The train dataset and evaluation dataset for OCRNet is in LMDB format. You can use dataset_convert to convert the original images and labels to LMDB format. The original dataset should be organized in the following structure:

Copy
Copied!
            

/Dataset /images 0000.jpg 0001.jpg 0002.jpg ... gt_list.txt characters_list.txt

The gt_list.txt file contains all the ground truth text for the images, and each image and its corresponding label is specified with one line of text:

Copy
Copied!
            

0000.jpg abc 0001.jpg defg 0002.jpg zxv ...

There is a characters_list.txt file that contains all the characters found in the dataset. Each character occupies one line.

The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export). Here is an example spec file used in the OCRNet get_started notebook:

Copy
Copied!
            

results_dir: /results encryption_key: nvidia_tao model: TPS: True backbone: ResNet feature_channel: 512 sequence: BiLSTM hidden_size: 256 prediction: CTC quantize: False dataset: train_dataset_dir: [] val_dataset_dir: /data/test/lmdb character_list_file: /data/character_list input_width: 100 input_height: 32 input_channel: 1 max_label_length: 25 batch_size: 32 workers: 4 augmentation: keep_aspect_ratio: False train: seed: 1111 gpu_ids: [0] optim: name: "adadelta" lr: 1.0 clip_grad_norm: 5.0 num_epochs: 10 checkpoint_interval: 2 validation_interval: 1 evaluate: gpu_id: 0 checkpoint: "??" test_dataset_dir: "??" results_dir: "${results_dir}/evaluate" prune: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/prune" prune_setting: mode: experimental_hybrid amount: 0.4 granularity: 8 raw_prune_score: L1 inference: gpu_id: 0 checkpoint: "??" inference_dataset_dir: "??" results_dir: "${results_dir}/inference" export: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/export" dataset_convert: input_img_dir: "??" gt_file: "??" results_dir: "${results_dir}/convert_dataset" gen_trt_engine: onnx_file: "??" results_dir: "${results_dir}/convert_dataset"

Parameter

Data Type

Default

Description

results_dir

String

The global results directory

encryption_key

String

The key to encode or decode the checkpoint

model

Dict config

The configuration for the model architecture

dataset

Dict config

The configuration for the dataset

train

Dict config

The configuration for the training process

evaluate

Dict config

The configuration for the evaluation

prune

Dict config

The configuration for the pruning

inference

Dict config

The configuration for the inference

export

Dict config

The configuration of the export

dataset_convert

Dict config

The configuration for the dataset convert

model

The model parameter provides options to change the architecture of OCRNet.

Copy
Copied!
            

model: TPS: True backbone: ResNet feature_channel: 512 sequence: BiLSTM hidden_size: 256 prediction: CTC quantize: False

Parameter

Datatype

Default

Description

Supported Values

TPS

Boolean

False

A flag that enables Thin-plate spline interpolation for the OCRNet input

True/False

num_fiducial

Unsigned int

20

The number of fiducial points for TPS

>4

backbone

String

ResNet

The backbone of the OCRNet model

ResNet

feature_channel

Unsigned int

512

The number of channels for the backbone output feature

>0

sequence

String

BiLSTM

The sequence module of the OCRNet model

BiLSTM

hidden_size

Unsigned int

256

The number of channels for the BiLSTM hidden layer

>0

prediction

String

CTC

The method for encoding and decoding the output feature

CTC

quantize

Boolean

False

A flag that enables quantize and dequantize nodes in the OCRNet backbone

True/False

dataset

The dataset parameter provides options to set the dataset consumed in training and evaluation.

Copy
Copied!
            

dataset: train_dataset_dir: [/data/train/lmdb] val_dataset_dir: /data/test/lmdb character_list_file: /data/character_list input_width: 100 input_height: 32 input_channel: 1 max_label_length: 25 batch_size: 32 workers: 4 augmentation: keep_aspect_ratio: False

Parameter

Datatype

Default

Description

Supported Values

train_dataset_dir

List of String

None

A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported.

List of String

val_dataset_dir

String

None

The absolute path to the evaluation dataset

dataset absolute path

character_list_file

String

None

The absolute path to character list file

absolute file path

input_width

Unsigned int

100

The input image width

>4

input_height

Unsigned int

32

The input image height

>32

input_channel

Unsigned int

1

The input image channel

1,3

max_label_length

Unsigned int

25

The maximum length of the ground truth

>0

batch_size

Unsigned int

32

The batch size for training

>0

workers

Unsigned int

4

The number of workers to parallel preprocess the training data

>=0

augmentation

Dict config

The augmentation config. Currently, only the keep_aspect_ratio parameter is supported.

train

The train parameter provides options to set training hyperparameters.

Copy
Copied!
            

train: seed: 1111 gpu_ids: [0] optim: name: "adadelta" lr: 1.0 clip_grad_norm: 5.0 num_epochs: 10 checkpoint_interval: 2 validation_interval: 1

Parameter

Datatype

Default

Description

Supported Values

seed

Unsigned int

1111

The random seed for random, numpy, and torch

>0

results_dir

String

The absolute path to the train results and output (log, checkpoints)

gpu_ids

List of Unsigned int

[0]

A list of GPU device indicies for training

list of GPU index

optim

Dict config

The configuration for the optimizer

clip_grad_norm

Float

5.0

The threshold value of magnitude of the gradient L2 norm to be clipped

>4

num_epochs

Unsigned int

10

The number of training epochs

>32

checkpoint_interval

Unsigned int

2

The interval for saving the checkpoint during training

>0

validation_interval

Unsigned int

25

The interval for performing validation during training

>0

distributed_strategy

String

ddp

The distributed strategy for multi-GPU training

ddp

resume_training_checkpoint_path

String

None

The absolute path to a checkpoint for resuming training

pretrained_model_path

String

None

The absolute path to pretrained weights

quantize_model_path

String

None

The absolute path to pretrained models for quantize-aware-training

optim

The optim provides the options to set the optimizer for the training.

Copy
Copied!
            

optim: name: "adadelta" lr: 1.0

Parameter

Datatype

Default

Description

Supported Values

name

String

adadelta

The optimizer type

adadelta, adam

lr

Float

1.0

The initial learning rate for the training

>0.0

evaluate

The evaluate parameter provides options to set evaluation hyperparameters.

Copy
Copied!
            

evaluate: gpu_id: 0 checkpoint: "??" test_dataset_dir: "??" results_dir: "${results_dir}/evaluate"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for evaluation

gpu_id

Unsigned int

0

The GPU device index

A valid gpu index

test_dataset_dir

String

The absolute path to the evaluation LMDB dataset

results_dir

String

The absolute path to the evaluation output

batch_size

Unsigned int

1

The evaluation batch size

>0

prune

The prune parameter provides options to set prune hyperparameters.

Copy
Copied!
            

gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/prune" prune_setting: mode: experimental_hybrid amount: 0.4 granularity: 8 raw_prune_score: L1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for pruning

gpu_id

Unsigned int

0

The GPU device index

A valid gpu index

results_dir

String

The absolute path to the pruning log

pruned_file

String

The absolute path for storing the pruned model checkpoint

prune_setting

Dict config

The pruning hyperparameters

prune_setting

The prune_setting parameter contains options for the pruning algorithms:

Parameter

Datatype

Default

Description

Supported Values

mode

String

amount

The absolute path to the model checkpoint to be pruned:

  • amount: Prune the amount ratio of weights according to the importance

  • threshold: Prune weights with importance smaller than the threshold value

  • experimental_hybrid: Prune weights using a hybrid of threshold and amount

amount, threshold, experimental_hybrid

amount

Float

The amount value for amount and experimental_hybrid mode

[0, 1]

threshold

Float

The threshold value for threshold mode

>=0

granularity

Unsigned int

8

The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity.

>0

raw_prune_score

Dict config

L1

The method for computing the importance of weights

L1, L2

inference

The inference parameter provides options for inference.

Copy
Copied!
            

inference: gpu_id: 0 checkpoint: "??" inference_dataset_dir: "??" results_dir: "${results_dir}/inference"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for inference

gpu_id

Unsigned int

0

The GPU device index

Valid gpu index

inference_dataset_dir

String

The absolute path to the inference images directory

results_dir

String

The absolute path to the inference output

batch_size

Unsigned int

1

The inference batch size

>0

export

The export parameter provides export options.

Copy
Copied!
            

export: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/export"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

The absolute path to the model checkpoint for export

gpu_id

Unsigned int

0

The GPU device index

Valid gpu index

onnx_file

String

The absolute path to export ONNX file

results_dir

String

The absolute path to the export output

dataset_convert

The dataset_convert parameter provides options to set dataset conversion.

Copy
Copied!
            

dataset_convert: input_img_dir: "??" gt_file: "??" results_dir: "${results_dir}/convert_dataset"

Parameter

Datatype

Default

Description

Supported Values

input_img_dir

String

The absolute path to images directory

gt_file

String

The absolute path to the ground truth file

results_dir

String

The absolute path to dataset_convert (i.e. the LMDB dataset and log)

Use the following command to convert the raw dataset to LMDB format:

Copy
Copied!
            

tao model ocrnet dataset_convert -e <experiment_spec_file> results_dir=<global_results_dir> [dataset_convert.<dataset_convert_option>=<dataset_convert_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The dataset_convert results will be saved in results_dir/dataset_convert.

Optional Arguments

You can set the optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet dataset_convert command:

Copy
Copied!
            

tao model ocrnet dataset_convert -e $DEFAULT_SPEC \ dataset_convert.input_img_dir=$TRAIN_IMG_DIR \ dataset_convert.gt_file=$TRAIN_GT \ dataset_convert.results=$TRAIN_LMDB_PATH


Use the following command to start OCRNet training:

Copy
Copied!
            

tao model ocrnet train -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.optim.<optim_option>=<optim_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The train results will be saved in results_dir/train.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet train command:

Copy
Copied!
            

tao model ocrnet train -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ dataset.train_dataset_dir=$TRAIN_LMDB_PATH \ dataset.val_dataset_dir=$VAL_LMDB_PATH \ dataset.character_list_file=$CHARACTER_LIST


Use the following command to start OCRNet evaluation:

Copy
Copied!
            

tao model ocrnet evaluate -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [evaluate.<evaluate_option>=<evaluate_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The evaluation results will be saved in results_dir/evaluate.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet evaluate command:

Copy
Copied!
            

tao model ocrnet evaluate -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ evaluate.checkpoint=$TRAINED_TAO_MODEL \ evaluate.test_dataset_dir=$VAL_LMDB_PATH \ dataset.character_list_file=$CHARACTER_LIST


Use the following command to start OCRNet pruning:

Copy
Copied!
            

tao model ocrnet prune -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [prune.<prune_option>=<prune_option_value>] [prune.prune_setting.<prune_setting_option>=<prune_setting_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The pruning results will be saved in results_dir/prune.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet prune command:

Copy
Copied!
            

tao model ocrnet prune -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ prune.checkpoint=$TRAINED_TAO_MODEL \ prune.pruned_file=$PRUNED_TAO_MODEL


Use the following command to start OCRNet inference:

Copy
Copied!
            

tao model ocrnet inference -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [inference.<inference_option>=<inference_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The inference results will be saved in results_dir/inference.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet inference command:

Copy
Copied!
            

tao model ocrnet inference -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ inference.checkpoint=$TRAINED_TAO_MODEL \ inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR


Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:

Copy
Copied!
            

tao model ocrnet export -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The export results will be saved in results_dir/export.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet export command:

Copy
Copied!
            

tao model ocrnet export -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ export.checkpoint=$TRAINED_TAO_MODEL \ export.onnx_file=$EXPORTED_ONNX_MODEL_PATH


For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.