OCRNet#
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
quantize
Each task is explained in detail in the following sections.
Preparing the Dataset#
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use dataset_convert to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The gt_list.txt file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a characters_list.txt file that contains all the
characters found in the dataset. Each character occupies one line.
Creating an Experiment Specification File#
The experiment specification file includes arguments for all the tasks supported by OCRNet (train/evaluate/inference/prune/export).
Here is an example specification file used in the OCRNet get_started notebook:
Parameter |
Data Type |
Default |
Description |
Supported Values |
|
dict config |
– |
The configuration of the model architecture |
|
|
dict config |
– |
The configuration of the dataset |
|
|
dict config |
– |
The configuration of the training task |
|
|
dict config |
– |
The configuration of the evaluation task |
|
|
dict config |
– |
The configuration of the inference task |
|
|
string |
None |
The encryption key to encrypt and decrypt model files |
|
|
string |
/results |
The directory where experiment results are saved |
|
|
dict config |
– |
The configuration for the pruning |
|
|
dict config |
– |
The configuration of the export |
|
|
dict config |
– |
The configuration for the dataset convert |
model#
The model parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
Boolean |
False |
A flag that enables Thin-plate spline interpolation for the OCRNet input |
True/False |
|
Unsigned int |
20 |
The number of fiducial points for TPS |
>4 |
|
String |
ResNet |
The backbone of the OCRNet model |
ResNet, ResNet2X, FAN_tiny_2X |
|
Unsigned int |
512 |
The number of channels for the backbone output feature |
>0 |
|
String |
BiLSTM |
The sequence module of the OCRNet model |
BiLSTM |
|
Unsigned int |
256 |
The number of channels for the BiLSTM hidden layer |
>0 |
|
String |
CTC |
The method for encoding and decoding the output feature |
CTC, Attn |
|
Unsigned int |
100 |
The input image width |
>4 |
|
Unsigned int |
32 |
The input image height |
>32 |
|
Unsigned int |
1 |
The input image channel |
1,3 |
|
Boolean |
False |
A flag that enables quantize and dequantize nodes in the OCRNet backbone |
True/False |
dataset#
The dataset parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
List of String |
None |
A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported. |
List of String |
|
String |
None |
The absolute path to the evaluation dataset |
dataset absolute path |
|
String |
None |
The absolute path to character list file |
absolute file path |
|
Unsigned int |
25 |
The maximum length of the ground truth |
>0 |
|
Unsigned int |
32 |
The batch size for training |
>0 |
|
Unsigned int |
4 |
The number of workers to parallel preprocess the training data |
>=0 |
|
Dict config |
– |
The augmentation config. |
– |
augmentation#
The augmentation parameter provides options to set augmentation pipeline during training.
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
Bool |
False |
A flag to enable keeping aspect-ratio when resize the image to model input size |
False/True |
|
Float |
0.0 |
The probability to apply the following augmentation on the input image |
[0, 1] |
|
Float |
0.5 |
The probability to reverse the color of the input image |
[0, 1] |
|
Float |
0.5 |
The probability to random rotate the input image |
[0, 1] |
|
Float |
0.5 |
The maximum degree the image will be rotated |
>=0 |
|
Float |
0.5 |
The probability to blur the input image |
[0, 1] |
|
List of integer |
[1, 2, 3, 4] |
The list of radius when apply gaussian blur on the image |
– |
train#
The train parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|
unsigned int |
1 |
The number of GPUs to use for distributed training |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed training |
|
|
unsigned int |
1234 |
The random seed for random, numpy, and torch |
>0 |
|
unsigned int |
10 |
The total number of epochs to run the experiment |
>0 |
|
unsigned int |
1 |
The epoch interval at which the checkpoints are saved |
>0 |
|
unsigned int |
1 |
The epoch interval at which the validation is run |
>0 |
|
string |
The intermediate PyTorch Lightning checkpoint to resume training from |
||
|
string |
/results/train |
The directory to save training results |
|
|
Dict config |
– |
The configuration for the optimizer |
– |
|
Float |
5.0 |
The threshold value of magnitude of the gradient L2 norm to be clipped |
>4 |
|
String |
ddp |
The distributed strategy for multi-GPU training |
ddp |
|
String |
None |
The absolute path to pretrained weights |
– |
|
String |
None |
The absolute path to pretrained models for quantize-aware-training |
– |
|
Bool |
False |
Enable model exponential moving average in the training |
False/True |
optim#
The optim provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
adadelta |
The optimizer type |
adadelta, adam |
|
Float |
1.0 |
The initial learning rate for the training |
>0.0 |
evaluate#
The evaluate parameter provides options to set evaluation hyperparameters.
evaluate:
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for evaluation |
– |
|
String |
/results/evaluate |
The directory to save evaluation results |
|
|
Unsigned int |
1 |
The number of GPUs to use for distributed evaluation |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed evaluation |
|
|
String |
– |
The absolute path to the evaluation LMDB dataset |
– |
|
Unsigned int |
1 |
The evaluation batch size |
>0 |
prune#
The prune parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for pruning |
– |
|
Unsigned int |
0 |
The GPU device index |
A valid gpu index |
|
String |
– |
The absolute path to the pruning log |
– |
|
String |
– |
The absolute path for storing the pruned model checkpoint |
– |
|
Dict config |
– |
The pruning hyperparameters |
– |
prune_setting#
The prune_setting parameter contains options for the pruning algorithms:
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
amount |
The absolute path to the model checkpoint to be pruned:
|
amount, threshold, experimental_hybrid |
|
Float |
– |
The amount value for |
[0, 1] |
|
Float |
– |
The threshold value for threshold mode |
>=0 |
|
Unsigned int |
8 |
The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity. |
>0 |
|
Dict config |
L1 |
The method for computing the importance of weights |
L1, L2 |
inference#
The inference parameter provides options for inference.
inference:
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for inference |
– |
|
String |
/results/inference |
The directory to save inference results |
|
|
Unsigned int |
1 |
The number of GPUs to use for distributed inference |
>0 |
|
List[int] |
[0] |
The indices of the GPU’s to use for distributed inference |
|
|
String |
– |
The absolute path to the inference images directory |
– |
|
Unsigned int |
1 |
The inference batch size |
>0 |
export#
The export parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
– |
The absolute path to the model checkpoint for export |
– |
|
Unsigned int |
0 |
The GPU device index |
Valid gpu index |
|
String |
– |
The absolute path to export ONNX file |
– |
|
String |
– |
The absolute path to the export output |
– |
dataset_convert#
The dataset_convert parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
Parameter |
Datatype |
Default |
Description |
Supported Values |
|---|---|---|---|---|
|
String |
– |
The absolute path to images directory |
– |
|
String |
– |
The absolute path to the ground truth file |
– |
|
String |
– |
The absolute path to |
– |
Evaluating the Model#
Multi-GPU evaluation is currently not supported for OCRNet.
Inference with the Model#
Multi-GPU inference is currently not supported for OCRNet.
Quantization#
OCRNet supports PTQ via TAO Quant using either the torchao (weight-only) or modelopt (static PTQ) backends.
Add a
quantizesection to your experiment specification (see TAO Quant documentation for schema and backend options).Use the quantized checkpoint by setting
evaluate.is_quantized: trueorinference.is_quantized: trueand pointing to the artifact saved underresults_dir(for example,quantized_model_torchao.pthorquantized_model_modelopt.pth). For ModelOpt artifacts, the model weights are stored undermodel_state_dict.
Notes#
For
modeloptstatic PTQ, ensure that your dataset configuration provides a representative calibration loader.For
torchao, activation settings in the configuration are ignored.
Calibration Dataset (ModelOpt)#
When you use the modelopt backend (static PTQ), provide a calibration dataset via dataset.quant_calibration_dataset.
Minimal example:
quantize:
backend: "modelopt"
mode: "static_ptq"
algorithm: "minmax"
dataset:
quant_calibration_dataset:
images_dir: "/path/to/calib/images"
See also: TAO Quant overview and its Configuration and backend pages.
Deploying to DeepStream#
For DeepStream integration, see Deploy nvOCDR to DeepStream.