OCRNet - NVIDIA Docs

OCRNet is a model to recognize characters in an image. It supports the following tasks:

dataset_convert
train
evaluate
prune
inference
export

These tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line:

Copy
Copied!

            
            tao model ocrnet <sub_task> <args_per_subtask>

Where args_per_subtask are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections.

Preparing the Dataset

The train dataset and evaluation dataset for OCRNet is in LMDB format. You can use dataset_convert to convert the original images and labels to LMDB format. The original dataset should be organized in the following structure:

Copy
Copied!

            
            /Dataset
    /images
        0000.jpg
        0001.jpg
        0002.jpg
        ...
    gt_list.txt
    characters_list.txt

The gt_list.txt file contains all the ground truth text for the images, and each image and its corresponding label is specified with one line of text:

Copy
Copied!

            
            0000.jpg abc
0001.jpg defg
0002.jpg zxv
...

There is a characters_list.txt file that contains all the characters found in the dataset. Each character occupies one line.

Creating an Experiment Spec File

The experiment spec file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export). Here is an example spec file used in the OCRNet get_started notebook:

Copy
Copied!

            
            results_dir: /results
encryption_key: nvidia_tao
model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False
  input_width: 100
  input_height: 32
  input_channel: 1
dataset:
  train_dataset_dir: []
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 1.0
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1
evaluate:
  gpu_id: 0
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"
prune:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/prune"
  prune_setting:
    mode: experimental_hybrid
    amount: 0.4
    granularity: 8
    raw_prune_score: L1
inference:
  gpu_id: 0
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${results_dir}/inference"
export:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/export"
dataset_convert:
  input_img_dir: "??"
  gt_file: "??"
  results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
  onnx_file: "??"
  results_dir: "${results_dir}/convert_dataset"

Parameter	Data Type	Default	Description
`results_dir`	String	–	The global results directory
`encryption_key`	String	–	The key to encode or decode the checkpoint
`model`	Dict config	–	The configuration for the model architecture
`dataset`	Dict config	–	The configuration for the dataset
`train`	Dict config	–	The configuration for the training process
`evaluate`	Dict config	–	The configuration for the evaluation
`prune`	Dict config	–	The configuration for the pruning
`inference`	Dict config	–	The configuration for the inference
`export`	Dict config	–	The configuration of the export
`dataset_convert`	Dict config	–	The configuration for the dataset convert

model

The model parameter provides options to change the architecture of OCRNet.

Copy
Copied!

            
            model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False

Parameter	Datatype	Default	Description	Supported Values
`TPS`	Boolean	False	A flag that enables Thin-plate spline interpolation for the OCRNet input	True/False
`num_fiducial`	Unsigned int	20	The number of fiducial points for TPS	>4
`backbone`	String	ResNet	The backbone of the OCRNet model	ResNet, ResNet2X, FAN_tiny_2X
`feature_channel`	Unsigned int	512	The number of channels for the backbone output feature	>0
`sequence`	String	BiLSTM	The sequence module of the OCRNet model	BiLSTM
`hidden_size`	Unsigned int	256	The number of channels for the BiLSTM hidden layer	>0
`prediction`	String	CTC	The method for encoding and decoding the output feature	CTC, Attn
`input_width`	Unsigned int	100	The input image width	>4
`input_height`	Unsigned int	32	The input image height	>32
`input_channel`	Unsigned int	1	The input image channel	1,3
`quantize`	Boolean	False	A flag that enables quantize and dequantize nodes in the OCRNet backbone	True/False

dataset

The dataset parameter provides options to set the dataset consumed in training and evaluation.

Copy
Copied!

            
            dataset:
  train_dataset_dir: [/data/train/lmdb]
  val_dataset_dir: /data/test/lmdb
  character_list_file: /data/character_list
  max_label_length: 25
  batch_size: 32
  workers: 4
  augmentation:
    keep_aspect_ratio: False
    aug_prob: 0.3
    reverse_color_prob: 0.5
    rotate_prob: 0.5
    max_rotation_degree: 5
    blur_prob: 0.5
    gaussian_radius_list: [1, 2, 3, 4]

Parameter	Datatype	Default	Description	Supported Values
`train_dataset_dir`	List of String	None	A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported.	List of String
`val_dataset_dir`	String	None	The absolute path to the evaluation dataset	dataset absolute path
`character_list_file`	String	None	The absolute path to character list file	absolute file path
`max_label_length`	Unsigned int	25	The maximum length of the ground truth	>0
`batch_size`	Unsigned int	32	The batch size for training	>0
`workers`	Unsigned int	4	The number of workers to parallel preprocess the training data	>=0
`augmentation`	Dict config	–	The augmentation config.	–

augmentation

The augmentation parameter provides options to set augmentation pipeline during training.

Copy
Copied!

            
            augmentation:
  keep_aspect_ratio: False
  aug_prob: 0.3
  reverse_color_prob: 0.5
  rotate_prob: 0.5
  max_rotation_degree: 5
  blur_prob: 0.5
  gaussian_radius_list: [1, 2, 3, 4]

Parameter	Datatype	Default	Description	Supported Values
`keep_aspect_ratio`	Bool	False	A flag to enable keeping aspect-ratio when resize the image to model input size	False/True
`aug_prob`	Float	0.0	The probability to apply the following augmentation on the input image	[0, 1]
`reverse_color_prob`	Float	0.5	The probability to reverse the color of the input image	[0, 1]
`rotate_prob`	Float	0.5	The probability to random rotate the input image	[0, 1]
`max_rotation_degree`	Float	0.5	The maximum degree the image will be rotated	>=0
`blur_prob`	Float	0.5	The probability to blur the input image	[0, 1]
`gaussian_radius_list`	List of integer	[1, 2, 3, 4]	The list of radius when apply gaussian blur on the image	–

train

The train parameter provides options to set training hyperparameters.

Copy
Copied!

            
            train:
  seed: 1111
  gpu_ids: [0]
  optim:
    name: "adadelta"
    lr: 1.0
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1

Parameter	Datatype	Default	Description	Supported Values
`seed`	Unsigned int	1111	The random seed for random, numpy, and torch	>0
`results_dir`	String	–	The absolute path to the train results and output (log, checkpoints)	–
`gpu_ids`	List of Unsigned int	[0]	A list of GPU device indicies for training	list of GPU index
`num_gpus`	Unsigned int	1	The number of gpus to be used for training. When setting num_gpus to enable multi-gpu (>1) training, `gpu_ids` will not take effect	–
`optim`	Dict config	–	The configuration for the optimizer	–
`clip_grad_norm`	Float	5.0	The threshold value of magnitude of the gradient L2 norm to be clipped	>4
`num_epochs`	Unsigned int	10	The number of training epochs	>32
`checkpoint_interval`	Unsigned int	2	The interval for saving the checkpoint during training	>0
`validation_interval`	Unsigned int	25	The interval for performing validation during training	>0
`distributed_strategy`	String	ddp	The distributed strategy for multi-GPU training	ddp
`resume_training_checkpoint_path`	String	None	The absolute path to a checkpoint for resuming training	–
`pretrained_model_path`	String	None	The absolute path to pretrained weights	–
`quantize_model_path`	String	None	The absolute path to pretrained models for quantize-aware-training	–
`model_ema`	Bool	False	Enable model exponential moving average in the training	False/True

optim

The optim provides the options to set the optimizer for the training.

Copy
Copied!

            
            optim:
  name: "adadelta"
  lr: 1.0

Parameter	Datatype	Default	Description	Supported Values
`name`	String	adadelta	The optimizer type	adadelta, adam
`lr`	Float	1.0	The initial learning rate for the training	>0.0

evaluate

The evaluate parameter provides options to set evaluation hyperparameters.

Copy
Copied!

            
            evaluate:
  gpu_id: 0
  checkpoint: "??"
  test_dataset_dir: "??"
  results_dir: "${results_dir}/evaluate"

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	String	–	The absolute path to the model checkpoint for evaluation	–
`gpu_id`	Unsigned int	0	The GPU device index	A valid gpu index
`test_dataset_dir`	String	–	The absolute path to the evaluation LMDB dataset	–
`results_dir`	String	–	The absolute path to the evaluation output	–
`batch_size`	Unsigned int	1	The evaluation batch size	>0

prune

The prune parameter provides options to set prune hyperparameters.

Copy
Copied!

            
            gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
  mode: experimental_hybrid
  amount: 0.4
  granularity: 8
  raw_prune_score: L1

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	String	–	The absolute path to the model checkpoint for pruning	–
`gpu_id`	Unsigned int	0	The GPU device index	A valid gpu index
`results_dir`	String	–	The absolute path to the pruning log	–
`pruned_file`	String	–	The absolute path for storing the pruned model checkpoint	–
`prune_setting`	Dict config	–	The pruning hyperparameters	–

prune_setting

The prune_setting parameter contains options for the pruning algorithms:

Parameter	Datatype	Default	Description	Supported Values
`mode`	String	amount	The absolute path to the model checkpoint to be pruned: `amount`: Prune the amount ratio of weights according to the importance `threshold`: Prune weights with importance smaller than the threshold value `experimental_hybrid`: Prune weights using a hybrid of `threshold` and `amount`	amount, threshold, experimental_hybrid
`amount`	Float	–	The amount value for `amount` and `experimental_hybrid` mode	[0, 1]
`threshold`	Float	–	The threshold value for threshold mode	>=0
`granularity`	Unsigned int	8	The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity.	>0
`raw_prune_score`	Dict config	L1	The method for computing the importance of weights	L1, L2

inference

The inference parameter provides options for inference.

Copy
Copied!

            
            inference:
  gpu_id: 0
  checkpoint: "??"
  inference_dataset_dir: "??"
  results_dir: "${results_dir}/inference"

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	String	–	The absolute path to the model checkpoint for inference	–
`gpu_id`	Unsigned int	0	The GPU device index	Valid gpu index
`inference_dataset_dir`	String	–	The absolute path to the inference images directory	–
`results_dir`	String	–	The absolute path to the inference output	–
`batch_size`	Unsigned int	1	The inference batch size	>0

export

The export parameter provides export options.

Copy
Copied!

            
            export:
  gpu_id: 0
  checkpoint: "??"
  results_dir: "${results_dir}/export"

Parameter	Datatype	Default	Description	Supported Values
`checkpoint`	String	–	The absolute path to the model checkpoint for export	–
`gpu_id`	Unsigned int	0	The GPU device index	Valid gpu index
`onnx_file`	String	–	The absolute path to export ONNX file	–
`results_dir`	String	–	The absolute path to the export output	–

dataset_convert

The dataset_convert parameter provides options to set dataset conversion.

Copy
Copied!

            
            dataset_convert:
  input_img_dir: "??"
  gt_file: "??"
  results_dir: "${results_dir}/convert_dataset"

Parameter	Datatype	Default	Description	Supported Values
`input_img_dir`	String	–	The absolute path to images directory	–
`gt_file`	String	–	The absolute path to the ground truth file	–
`results_dir`	String	–	The absolute path to `dataset_convert` (i.e. the LMDB dataset and log)	–

Converting dataset

Use the following command to convert the raw dataset to LMDB format:

Copy
Copied!

            
            tao model ocrnet dataset_convert -e <experiment_spec_file>
                           results_dir=<global_results_dir>
                           [dataset_convert.<dataset_convert_option>=<dataset_convert_value>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The dataset_convert results will be saved in results_dir/dataset_convert.

Optional Arguments

You can set the optional arguments to override the option values in the experiment spec file.

dataset_convert.<dataset_convert_option>: The dataset_convert options.

Here’s an example of using the OCRNet dataset_convert command:

Copy
Copied!

            
            tao model ocrnet dataset_convert -e $DEFAULT_SPEC \
                           dataset_convert.input_img_dir=$TRAIN_IMG_DIR \
                           dataset_convert.gt_file=$TRAIN_GT \
                           dataset_convert.results=$TRAIN_LMDB_PATH

Training the Model

Use the following command to start OCRNet training:

Copy
Copied!

            
            tao model ocrnet train -e <experiment_spec_file>
                 results_dir=<global_results_dir>
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [train.<train_option>=<train_option_value>]
                 [train.optim.<optim_option>=<optim_option_value>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The train results will be saved in results_dir/train.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options

Here’s an example of using the OCRNet train command:

Copy
Copied!

            
            tao model ocrnet train -e $DEFAULT_SPEC \
                 results_dir=$RESULTS_DIR \
                 dataset.train_dataset_dir=$TRAIN_LMDB_PATH \
                 dataset.val_dataset_dir=$VAL_LMDB_PATH \
                 dataset.character_list_file=$CHARACTER_LIST

Evaluating the Model

Use the following command to start OCRNet evaluation:

Copy
Copied!

            
            tao model ocrnet evaluate -e <experiment_spec_file>
                    results_dir=<global_results_dir>
                    [model.<model_option>=<model_option_value>]
                    [dataset.<dataset_option>=<dataset_option_value>]
                    [evaluate.<evaluate_option>=<evaluate_option_value>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The evaluation results will be saved in results_dir/evaluate.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
evaluate.<evaluate_option>: The evaluate options.

Here’s an example of using the OCRNet evaluate command:

Copy
Copied!

            
            tao model ocrnet evaluate -e $DEFAULT_SPEC \
                    results_dir=$RESULTS_DIR \
                    evaluate.checkpoint=$TRAINED_TAO_MODEL \
                    evaluate.test_dataset_dir=$VAL_LMDB_PATH \
                    dataset.character_list_file=$CHARACTER_LIST

Pruning the Model

Use the following command to start OCRNet pruning:

Copy
Copied!

            
            tao model ocrnet prune -e <experiment_spec_file>
                 results_dir=<global_results_dir>
                 [model.<model_option>=<model_option_value>]
                 [dataset.<dataset_option>=<dataset_option_value>]
                 [prune.<prune_option>=<prune_option_value>]
                 [prune.prune_setting.<prune_setting_option>=<prune_setting_value>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The pruning results will be saved in results_dir/prune.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
prune.<prune_option>: The prune options.
prune.<prune_setting_option>: The prune setting options.

Here’s an example of using the OCRNet prune command:

Copy
Copied!

            
            tao model ocrnet prune -e $DEFAULT_SPEC \
                 results_dir=$RESULTS_DIR \
                 prune.checkpoint=$TRAINED_TAO_MODEL \
                 prune.pruned_file=$PRUNED_TAO_MODEL

Inference with the Model

Use the following command to start OCRNet inference:

Copy
Copied!

            
            tao model ocrnet inference -e <experiment_spec_file>
                     results_dir=<global_results_dir>
                     [model.<model_option>=<model_option_value>]
                     [dataset.<dataset_option>=<dataset_option_value>]
                     [inference.<inference_option>=<inference_option_value>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The inference results will be saved in results_dir/inference.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
inference.<inference_option>: The inference options.

Here’s an example of using the OCRNet inference command:

Copy
Copied!

            
            tao model ocrnet inference -e $DEFAULT_SPEC \
                     results_dir=$RESULTS_DIR \
                     inference.checkpoint=$TRAINED_TAO_MODEL \
                     inference.inference_dataset_dir=$SAMPLE_IMAGES_DIR

Exporting the Model

Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:

Copy
Copied!

            
            tao model ocrnet export -e <experiment_spec_file>
                  results_dir=<global_results_dir>
                  [model.<model_option>=<model_option_value>]
                  [dataset.<dataset_option>=<dataset_option_value>]
                  [export.<export_option>=<export_option_value>]

Required Arguments

-e, --experiment_spec_file: The path to the experiment spec file.
results_dir: The global results directory. The export results will be saved in results_dir/export.

Optional Arguments

You can set optional arguments to override the option values in the experiment spec file.

model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
export.<export_option>: The export options.

Here’s an example of using the OCRNet export command:

Copy
Copied!

            
            tao model ocrnet export -e $DEFAULT_SPEC \
                  results_dir=$RESULTS_DIR \
                  export.checkpoint=$TRAINED_TAO_MODEL \
                  export.onnx_file=$EXPORTED_ONNX_MODEL_PATH

TensorRT engine generation, validation

For deployment, please refer to TAO Deploy documentation.

Deploying to DeepStream

For DeepStream integration, please refer to Deploy nvOCDR to DeepStream.