NVIDIA TAO Toolkit v4.0.1
NVIDIA TAO Release 4.0.1

OCRNet

OCRNet is a model to recognize characters in an image. It supports the following tasks:

  • dataset_convert

  • train

  • evaluate

  • prune

  • inference

  • export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!
            

tao model ocrnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

The train dataset and evaluation dataset for OCRNet is in LMDB format. You can use dataset_convert to convert the original images and labels to LMDB format. The original dataset should be organized in the following structure:

Copy
Copied!
            

/Dataset /images 0000.jpg 0001.jpg 0002.jpg ... gt_list.txt characters_list.txt

The gt_list.txt file contains all the ground truth text for the images, and each image and its corresponding label is specified with one line of text:

Copy
Copied!
            

0000.jpg abc 0001.jpg defg 0002.jpg zxv ...

There is a characters_list.txt file that contains all the characters found in the dataset. Each character occupies one line.

The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export). Here is an example spec file used in the OCRNet get_started notebook:

Copy
Copied!
            

results_dir: /results encryption_key: nvidia_tao model: TPS: True backbone: ResNet feature_channel: 512 sequence: BiLSTM hidden_size: 256 prediction: CTC quantize: False dataset: train_dataset_dir: [] val_dataset_dir: /data/test/lmdb character_list_file: /data/character_list input_width: 100 input_height: 32 input_channel: 1 max_label_length: 25 batch_size: 32 workers: 4 augmentation: keep_aspect_ratio: False train: seed: 1111 gpu_ids: [0] optim: name: "adadelta" lr: 1.0 clip_grad_norm: 5.0 num_epochs: 10 checkpoint_interval: 2 validation_interval: 1 evaluate: gpu_id: 0 checkpoint: "??" test_dataset_dir: "??" results_dir: "${results_dir}/evaluate" prune: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/prune" prune_setting: mode: experimental_hybrid amount: 0.4 granularity: 8 raw_prune_score: L1 inference: gpu_id: 0 checkpoint: "??" inference_dataset_dir: "??" results_dir: "${results_dir}/inference" export: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/export" dataset_convert: input_img_dir: "??" gt_file: "??" results_dir: "${results_dir}/convert_dataset" gen_trt_engine: onnx_file: "??" results_dir: "${results_dir}/convert_dataset"

Parameter

Data Type

Default

Description

results_dir

String

–

The global results directory

encryption_key

String

–

The key to encode or decode the checkpoint

model

Dict config

–

The configuration for the model architecture

dataset

Dict config

–

The configuration for the dataset

train

Dict config

–

The configuration for the training process

evaluate

Dict config

–

The configuration for the evaluation

prune

Dict config

–

The configuration for the pruning

inference

Dict config

–

The configuration for the inference

export

Dict config

–

The configuration of the export

dataset_convert

Dict config

–

The configuration for the dataset convert

model

The model parameter provides options to change the architecture of OCRNet.

Copy
Copied!
            

model: TPS: True backbone: ResNet feature_channel: 512 sequence: BiLSTM hidden_size: 256 prediction: CTC quantize: False

Parameter

Datatype

Default

Description

Supported Values

TPS

Boolean

False

A flag that enables Thin-plate spline interpolation for the OCRNet input

True/False

num_fiducial

Unsigned int

20

The number of fiducial points for TPS

>4

backbone

String

ResNet

The backbone of the OCRNet model

ResNet

feature_channel

Unsigned int

512

The number of channels for the backbone output feature

>0

sequence

String

BiLSTM

The sequence module of the OCRNet model

BiLSTM

hidden_size

Unsigned int

256

The number of channels for the BiLSTM hidden layer

>0

prediction

String

CTC

The method for encoding and decoding the output feature

CTC

quantize

Boolean

False

A flag that enables quantize and dequantize nodes in the OCRNet backbone

True/False

dataset

The dataset parameter provides options to set the dataset consumed in training and evaluation.

Copy
Copied!
            

dataset: train_dataset_dir: [/data/train/lmdb] val_dataset_dir: /data/test/lmdb character_list_file: /data/character_list input_width: 100 input_height: 32 input_channel: 1 max_label_length: 25 batch_size: 32 workers: 4 augmentation: keep_aspect_ratio: False

Parameter

Datatype

Default

Description

Supported Values

train_dataset_dir

List of String

None

A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported.

List of String

val_dataset_dir

String

None

The absolute path to the evaluation dataset

dataset absolute path

character_list_file

String

None

The absolute path to character list file

absolute file path

input_width

Unsigned int

100

The input image width

>4

input_height

Unsigned int

32

The input image height

>32

input_channel

Unsigned int

1

The input image channel

1,3

max_label_length

Unsigned int

25

The maximum length of the ground truth

>0

batch_size

Unsigned int

32

The batch size for training

>0

workers

Unsigned int

4

The number of workers to parallel preprocess the training data

>=0

augmentation

Dict config

–

The augmentation config. Currently, only the keep_aspect_ratio parameter is supported.

–

train

The train parameter provides options to set training hyperparameters.

Copy
Copied!
            

train: seed: 1111 gpu_ids: [0] optim: name: "adadelta" lr: 1.0 clip_grad_norm: 5.0 num_epochs: 10 checkpoint_interval: 2 validation_interval: 1

Parameter

Datatype

Default

Description

Supported Values

seed

Unsigned int

1111

The random seed for random, numpy, and torch

>0

results_dir

String

–

The absolute path to the train results and output (log, checkpoints)

–

gpu_ids

List of Unsigned int

[0]

A list of GPU device indicies for training

list of GPU index

optim

Dict config

–

The configuration for the optimizer

–

clip_grad_norm

Float

5.0

The threshold value of magnitude of the gradient L2 norm to be clipped

>4

num_epochs

Unsigned int

10

The number of training epochs

>32

checkpoint_interval

Unsigned int

2

The interval for saving the checkpoint during training

>0

validation_interval

Unsigned int

25

The interval for performing validation during training

>0

distributed_strategy

String

ddp

The distributed strategy for multi-GPU training

ddp

resume_training_checkpoint_path

String

None

The absolute path to a checkpoint for resuming training

–

pretrained_model_path

String

None

The absolute path to pretrained weights

–

quantize_model_path

String

None

The absolute path to pretrained models for quantize-aware-training

–

optim

The optim provides the options to set the optimizer for the training.

Copy
Copied!
            

optim: name: "adadelta" lr: 1.0

Parameter

Datatype

Default

Description

Supported Values

name

String

adadelta

The optimizer type

adadelta, adam

lr

Float

1.0

The initial learning rate for the training

>0.0

evaluate

The evaluate parameter provides options to set evaluation hyperparameters.

Copy
Copied!
            

evaluate: gpu_id: 0 checkpoint: "??" test_dataset_dir: "??" results_dir: "${results_dir}/evaluate"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

–

The absolute path to the model checkpoint for evaluation

–

gpu_id

Unsigned int

0

The GPU device index

A valid gpu index

test_dataset_dir

String

–

The absolute path to the evaluation LMDB dataset

–

results_dir

String

–

The absolute path to the evaluation output

–

batch_size

Unsigned int

1

The evaluation batch size

>0

prune

The prune parameter provides options to set prune hyperparameters.

Copy
Copied!
            

gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/prune" prune_setting: mode: experimental_hybrid amount: 0.4 granularity: 8 raw_prune_score: L1

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

–

The absolute path to the model checkpoint for pruning

–

gpu_id

Unsigned int

0

The GPU device index

A valid gpu index

results_dir

String

–

The absolute path to the pruning log

–

pruned_file

String

–

The absolute path for storing the pruned model checkpoint

–

prune_setting

Dict config

–

The pruning hyperparameters

–

prune_setting

The prune_setting parameter contains options for the pruning algorithms:

Parameter

Datatype

Default

Description

Supported Values

mode

String

amount

The absolute path to the model checkpoint to be pruned:

  • amount: Prune the amount ratio of weights according to the importance

  • threshold: Prune weights with importance smaller than the threshold value

  • experimental_hybrid: Prune weights using a hybrid of threshold and amount

amount, threshold, experimental_hybrid

amount

Float

–

The amount value for amount and experimental_hybrid mode

[0, 1]

threshold

Float

–

The threshold value for threshold mode

>=0

granularity

Unsigned int

8

The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity.

>0

raw_prune_score

Dict config

L1

The method for computing the importance of weights

L1, L2

inference

The inference parameter provides options for inference.

Copy
Copied!
            

inference: gpu_id: 0 checkpoint: "??" inference_dataset_dir: "??" results_dir: "${results_dir}/inference"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

–

The absolute path to the model checkpoint for inference

–

gpu_id

Unsigned int

0

The GPU device index

Valid gpu index

inference_dataset_dir

String

–

The absolute path to the inference images directory

–

results_dir

String

–

The absolute path to the inference output

–

batch_size

Unsigned int

1

The inference batch size

>0

export

The export parameter provides export options.

Copy
Copied!
            

export: gpu_id: 0 checkpoint: "??" results_dir: "${results_dir}/export"

Parameter

Datatype

Default

Description

Supported Values

checkpoint

String

–

The absolute path to the model checkpoint for export

–

gpu_id

Unsigned int

0

The GPU device index

Valid gpu index

onnx_file

String

–

The absolute path to export ONNX file

–

results_dir

String

–

The absolute path to the export output

–

dataset_convert

The dataset_convert parameter provides options to set dataset conversion.

Copy
Copied!
            

dataset_convert: input_img_dir: "??" gt_file: "??" results_dir: "${results_dir}/convert_dataset"

Parameter

Datatype

Default

Description

Supported Values

input_img_dir

String

–

The absolute path to images directory

–

gt_file

String

–

The absolute path to the ground truth file

–

results_dir

String

–

The absolute path to dataset_convert (i.e. the LMDB dataset and log)

–

Use the following command to convert the raw dataset to LMDB format:

Copy
Copied!
            

tao model ocrnet dataset_convert -e <experiment_spec_file> results_dir=<global_results_dir> [dataset_convert.<dataset_convert_option>=<dataset_convert_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The dataset_convert results will be saved in results_dir/dataset_convert.

Optional Arguments

You can set the optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet dataset_convert command:

Copy
Copied!
            

tao model ocrnet dataset_convert -e $DEFAULT_SPEC \ dataset_convert.input_img_dir=$TRAIN_IMG_DIR \ dataset_convert.gt_file=$TRAIN_GT \ dataset_convert.results=$TRAIN_LMDB_PATH


Use the following command to start OCRNet training:

Copy
Copied!
            

tao model ocrnet train -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [train.<train_option>=<train_option_value>] [train.optim.<optim_option>=<optim_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The train results will be saved in results_dir/train.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet train command:

Copy
Copied!
            

tao model ocrnet train -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ dataset.train_dataset_dir=$TRAIN_LMDB_PATH \ dataset.val_dataset_dir=$VAL_LMDB_PATH \ dataset.character_list_file=$CHARACTER_LIST


Use the following command to start OCRNet evaluation:

Copy
Copied!
            

tao model ocrnet evaluate -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [evaluate.<evaluate_option>=<evaluate_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The evaluation results will be saved in results_dir/evaluate.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet evaluate command:

Copy
Copied!
            

tao model ocrnet evaluate -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ evaluate.checkpoint=$TRAINED_TAO_MODEL \ evaluate.test_dataset_dir=$VAL_LMDB_PATH \ dataset.character_list_file=$CHARACTER_LIST


Use the following command to start OCRNet pruning:

Copy
Copied!
            

tao model ocrnet prune -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [prune.<prune_option>=<prune_option_value>] [prune.prune_setting.<prune_setting_option>=<prune_setting_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The pruning results will be saved in results_dir/prune.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet prune command:

Copy
Copied!
            

tao model ocrnet prune -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ prune.checkpoint=$TRAINED_TAO_MODEL \ prune.pruned_file=$PRUNED_TAO_MODEL


Use the following command to start OCRNet inference:

Copy
Copied!
            

tao model ocrnet inference -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [inference.<inference_option>=<inference_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The inference results will be saved in results_dir/inference.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet inference command:

Copy
Copied!
            

tao model ocrnet inference -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ inference.checkpoint=$TRAINED_TAO_MODEL \ inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR


Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:

Copy
Copied!
            

tao model ocrnet export -e <experiment_spec_file> results_dir=<global_results_dir> [model.<model_option>=<model_option_value>] [dataset.<dataset_option>=<dataset_option_value>] [export.<export_option>=<export_option_value>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file.

  • results_dir: The global results directory. The export results will be saved in results_dir/export.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

Here’s an example of using the OCRNet export command:

Copy
Copied!
            

tao model ocrnet export -e $DEFAULT_SPEC \ results_dir=$RESULTS_DIR \ export.checkpoint=$TRAINED_TAO_MODEL \ export.onnx_file=$EXPORTED_ONNX_MODEL_PATH


For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.

© Copyright 2023, NVIDIA.. Last updated on Jul 27, 2023.