OCRNet#
OCRNet is a model to recognize characters in an image. It supports the following tasks:
dataset_convert
train
evaluate
prune
inference
export
Each task is explained in detail in the following sections.
Note
Throughout this documentation are references to
$EXPERIMENT_IDand
$DATASET_IDin the FTMS Client sections.
For instructions on creating a dataset using the remote client, refer to the Creating a dataset section in the Remote Client documentation.
For instructions on creating an experiment using the remote client, refer to the Creating an experiment section in the Remote Client documentation.
The spec format is YAML for TAO Launcher, and JSON for FTMS Client.
File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher, not for FTMS Client.
Preparing the Dataset#
The train dataset and evaluation dataset for OCRNet is in LMDB format.
You can use
dataset_convert to convert the original images and labels to LMDB format.
The original dataset should be organized in the following structure:
/Dataset
/images
0000.jpg
0001.jpg
0002.jpg
...
gt_list.txt
characters_list.txt
The
gt_list.txt file contains all the ground truth text for the images, and each image
and its corresponding label is specified with one line of text:
0000.jpg abc
0001.jpg defg
0002.jpg zxv
...
There is a
characters_list.txt file that contains all the
characters found in the dataset. Each character occupies one line.
Creating an Experiment Spec File#
The experiment spec file includes arguments for all the tasks supported by OCRNet (
train/evaluate/inference/prune/export).
Here is an example spec file used in the OCRNet
get_started notebook:
BASE_EXPERIMENT_ID=$(tao ocrnet list-base-experiments | jq -r '.[0].id')
SPECS=$(tao ocrnet get-job-schema --action train --base-experiment-id $BASE_EXPERIMENT_ID | jq -r '.default')
results_dir: /results
encryption_key: nvidia_tao
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
input_width: 100
input_height: 32
input_channel: 1
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
|
Parameter
|
Data Type
|
Default
|
Description
|
Supported Values
|
|
dict config
|
–
|
The configuration of the model architecture
|
|
dict config
|
–
|
The configuration of the dataset
|
|
dict config
|
–
|
The configuration of the training task
|
|
dict config
|
–
|
The configuration of the evaluation task
|
|
dict config
|
–
|
The configuration of the inference task
|
|
string
|
None
|
The encryption key to encrypt and decrypt model files
|
|
string
|
/results
|
The directory where experiment results are saved
|
|
dict config
|
–
|
The configuration for the pruning
|
|
dict config
|
–
|
The configuration of the export
|
|
dict config
|
–
|
The configuration for the dataset convert
model#
The
model parameter provides options to change the architecture of OCRNet.
model:
TPS: True
backbone: ResNet
feature_channel: 512
sequence: BiLSTM
hidden_size: 256
prediction: CTC
quantize: False
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
Boolean
|
False
|
A flag that enables Thin-plate spline interpolation for the OCRNet input
|
True/False
|
|
Unsigned int
|
20
|
The number of fiducial points for TPS
|
>4
|
|
String
|
ResNet
|
The backbone of the OCRNet model
|
ResNet, ResNet2X, FAN_tiny_2X
|
|
Unsigned int
|
512
|
The number of channels for the backbone output feature
|
>0
|
|
String
|
BiLSTM
|
The sequence module of the OCRNet model
|
BiLSTM
|
|
Unsigned int
|
256
|
The number of channels for the BiLSTM hidden layer
|
>0
|
|
String
|
CTC
|
The method for encoding and decoding the output feature
|
CTC, Attn
|
|
Unsigned int
|
100
|
The input image width
|
>4
|
|
Unsigned int
|
32
|
The input image height
|
>32
|
|
Unsigned int
|
1
|
The input image channel
|
1,3
|
|
Boolean
|
False
|
A flag that enables quantize and dequantize nodes in the OCRNet backbone
|
True/False
dataset#
The
dataset parameter provides options to set the dataset consumed in training and evaluation.
dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
List of String
|
None
|
A list of absolute paths to the training datasets. Currently, only a list length of 1 is supported.
|
List of String
|
|
String
|
None
|
The absolute path to the evaluation dataset
|
dataset absolute path
|
|
String
|
None
|
The absolute path to character list file
|
absolute file path
|
|
Unsigned int
|
25
|
The maximum length of the ground truth
|
>0
|
|
Unsigned int
|
32
|
The batch size for training
|
>0
|
|
Unsigned int
|
4
|
The number of workers to parallel preprocess the training data
|
>=0
|
|
Dict config
|
–
|
The augmentation config.
|
–
augmentation#
The
augmentation parameter provides options to set augmentation pipeline during training.
augmentation:
keep_aspect_ratio: False
aug_prob: 0.3
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 5
blur_prob: 0.5
gaussian_radius_list: [1, 2, 3, 4]
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
Bool
|
False
|
A flag to enable keeping aspect-ratio when resize the image to model input size
|
False/True
|
|
Float
|
0.0
|
The probability to apply the following augmentation on the input image
|
[0, 1]
|
|
Float
|
0.5
|
The probability to reverse the color of the input image
|
[0, 1]
|
|
Float
|
0.5
|
The probability to random rotate the input image
|
[0, 1]
|
|
Float
|
0.5
|
The maximum degree the image will be rotated
|
>=0
|
|
Float
|
0.5
|
The probability to blur the input image
|
[0, 1]
|
|
List of integer
|
[1, 2, 3, 4]
|
The list of radius when apply gaussian blur on the image
|
–
train#
The
train parameter provides options to set training hyperparameters.
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 1.0
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
unsigned int
|
1
|
The number of GPUs to use for distributed training
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed training
|
|
unsigned int
|
1234
|
The random seed for random, numpy, and torch
|
>0
|
|
unsigned int
|
10
|
The total number of epochs to run the experiment
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the checkpoints are saved
|
>0
|
|
unsigned int
|
1
|
The epoch interval at which the validation is run
|
>0
|
|
string
|
The intermediate PyTorch Lightning checkpoint to resume training from
|
|
string
|
/results/train
|
The directory to save training results
|
|
Dict config
|
–
|
The configuration for the optimizer
|
–
|
|
Float
|
5.0
|
The threshold value of magnitude of the gradient L2 norm to be clipped
|
>4
|
|
String
|
ddp
|
The distributed strategy for multi-GPU training
|
ddp
|
|
String
|
None
|
The absolute path to pretrained weights
|
–
|
|
String
|
None
|
The absolute path to pretrained models for quantize-aware-training
|
–
|
|
Bool
|
False
|
Enable model exponential moving average in the training
|
False/True
optim#
The
optim provides the options to set the optimizer for the training.
optim:
name: "adadelta"
lr: 1.0
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
adadelta
|
The optimizer type
|
adadelta, adam
|
|
Float
|
1.0
|
The initial learning rate for the training
|
>0.0
evaluate#
The
evaluate parameter provides options to set evaluation hyperparameters.
evaluate:
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
–
|
The absolute path to the model checkpoint for evaluation
|
–
|
|
String
|
/results/evaluate
|
The directory to save evaluation results
|
|
Unsigned int
|
1
|
The number of GPUs to use for distributed evaluation
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed evaluation
|
|
String
|
–
|
The absolute path to the evaluation LMDB dataset
|
–
|
|
Unsigned int
|
1
|
The evaluation batch size
|
>0
prune#
The
prune parameter provides options to set prune hyperparameters.
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: experimental_hybrid
amount: 0.4
granularity: 8
raw_prune_score: L1
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
–
|
The absolute path to the model checkpoint for pruning
|
–
|
|
Unsigned int
|
0
|
The GPU device index
|
A valid gpu index
|
|
String
|
–
|
The absolute path to the pruning log
|
–
|
|
String
|
–
|
The absolute path for storing the pruned model checkpoint
|
–
|
|
Dict config
|
–
|
The pruning hyperparameters
|
–
prune_setting#
The
prune_setting parameter contains options for the pruning algorithms:
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
amount
|
The absolute path to the model checkpoint to be pruned:
|
amount, threshold, experimental_hybrid
|
|
Float
|
–
|
The amount value for
|
[0, 1]
|
|
Float
|
–
|
The threshold value for threshold mode
|
>=0
|
|
Unsigned int
|
8
|
The granularity of the pruned layer. The number of pruned-layer output channels will be a multiple of the granularity.
|
>0
|
|
Dict config
|
L1
|
The method for computing the importance of weights
|
L1, L2
inference#
The
inference parameter provides options for inference.
inference:
checkpoint: "??"
inference_dataset_dir: "??"
results_dir: "${results_dir}/inference"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
–
|
The absolute path to the model checkpoint for inference
|
–
|
|
String
|
/results/inference
|
The directory to save inference results
|
|
Unsigned int
|
1
|
The number of GPUs to use for distributed inference
|
>0
|
|
List[int]
|
[0]
|
The indices of the GPU’s to use for distributed inference
|
|
String
|
–
|
The absolute path to the inference images directory
|
–
|
|
Unsigned int
|
1
|
The inference batch size
|
>0
export#
The
export parameter provides export options.
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
–
|
The absolute path to the model checkpoint for export
|
–
|
|
Unsigned int
|
0
|
The GPU device index
|
Valid gpu index
|
|
String
|
–
|
The absolute path to export ONNX file
|
–
|
|
String
|
–
|
The absolute path to the export output
|
–
dataset_convert#
The
dataset_convert parameter provides options to set dataset conversion.
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
|
Parameter
|
Datatype
|
Default
|
Description
|
Supported Values
|
|
String
|
–
|
The absolute path to images directory
|
–
|
|
String
|
–
|
The absolute path to the ground truth file
|
–
|
|
String
|
–
|
The absolute path to
|
–
Converting dataset#
Use the following command to convert the raw dataset
DATASET_CONVERT_JOB_ID=$(tao ocrnet create-job \
--kind dataset \
--dataset-id $DATASET_ID \
--action dataset_convert \
--specs "$SCHEMA" | jq -r '.id')
tao model ocrnet dataset_convert -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[dataset_convert.<dataset_convert_option>=<dataset_convert_value>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: The path to the experiment spec file.
Optional Arguments
You can set the optional arguments to override the option values in the experiment spec file.
results_dir: The global results directory. The
dataset_convertresults will be saved in
results_dir/dataset_convert.
dataset_convert.<dataset_convert_option>: The dataset_convert options.
Training the Model#
Use the following command to start OCRNet training:
TRAIN_JOB_ID=$(tao ocrnet create-job \
--kind experiment \
--name "ocrnet_train" \
--action train \
--workspace-id $WORKSPACE_ID \
--specs "$TRAIN_SPECS" \
--train-datasets '["'$DATASET_ID'"]' \
--eval-dataset "$DATASET_ID" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model ocrnet train -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[train.<train_option>=<train_option_value>]
[train.optim.<optim_option>=<optim_option_value>]
[train.gpu_ids=<gpu indices>]
[train.num_gpus=<number of gpus>]
Required Arguments
The following arguments are required.
-e, --experiment_spec: The experiment specification file to set up the training experiment
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
-h, --help: Show this help message and exit.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
train.<train_option>: The train options.
train.optim.<optim_option>: The optimizer options
Note
For training, evaluation, and inference, we expose two variables for each task:
num_gpus and
gpu_ids, which
default to
1 and
[0], respectively. If both are passed, but are inconsistent, for example
num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example
num_gpus is modified from 1 to 2.
In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by
setting the enviroment variable
OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set
this variable:
CLI Launcher:
You may set the environment variable by adding the following fields to the
Envsfield of your
~/.tao_mounts.jsonfile as mentioned in bullet 3 in ths section Running the launcher.
{ "Envs": [ { "variable": "OMP_NUM_THREADSR", "value": "1" } }
Docker:
You may set environment variables in Docker by setting the
-eflag in the Docker command line.
docker run -it --rm --gpus all \ -e OMP_NUM_THREADS=1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e
Checkpointing and Resuming Training
At every
train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called
model_epoch_<epoch_num>.pth.
Checkpoints are saved in
train.results_dir, like this:
$ ls /results/train
'model_epoch_000.pth'
'model_epoch_001.pth'
'model_epoch_002.pth'
'model_epoch_003.pth'
'model_epoch_004.pth'
The latest checkpoint will also be saved as
ocr_model_latest.pth.
Training will automatically resume from
ocr_model_latest.pth if it exists in
train.results_dir.
This will be superseded by
train.resume_training_checkpoint_path if it is provided.
The major implication of this logic is that, if you wish to trigger fresh training from scratch, either:
Specify a new, empty results directory (Recommended)
Remove the latest checkpoint from the results directory
Evaluating the Model#
Use the following command to start OCRNet evaluation:
EVALUATE_JOB_ID=$(tao ocrnet create-job \
--kind experiment \
--name "ocrnet_evaluate" \
--action evaluate \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--eval-dataset "$DATASET_ID" \
--specs "$EVALUATE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model ocrnet evaluate -e <experiment_spec_file>
evaluate.checkpoint=<model to be evaluated>
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[evaluate.<evaluate_option>=<evaluate_option_value>]
[evaluate.gpu_ids=<gpu indices>]
[evaluate.num_gpus=<number of gpus>]
Multi-GPU evaluation is currently not supported for OCRNet.
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: THe xperiment spec file to set up the evaluation experiment. This should be the same as a training spec file.
evaluate.checkpoint: The
.pthmodel.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
evaluate.<evaluate_option>: The evaluate options.
Multi-GPU evaluation is currently not supported for OCRNet.
Pruning the Model#
Use the following command to start OCRNet pruning:
PRUNE_JOB_ID=$(tao ocrnet create-job \
--kind experiment \
--name "ocrnet_prune" \
--action prune \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--specs "$PRUNE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model ocrnet prune -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[prune.<prune_option>=<prune_option_value>]
[prune.prune_setting.<prune_setting_option>=<prune_setting_value>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
results_dir: The global results directory. The pruning results will be saved in
results_dir/prune.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
prune.<prune_option>: The prune options.
prune.<prune_setting_option>: The prune setting options.
Note
If running training, evaluation, or inference on a pruned graph, you must provide the
model.pruned_graph_path
parameter when running the respective task. It should be the same as the value provided for
prune.pruned_file
in the prune task.
Inference with the Model#
Use the following command to start OCRNet inference:
INFERENCE_JOB_ID=$(tao ocrnet create-job \
--kind experiment \
--name "ocrnet_inference" \
--action inference \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--inference-dataset "$DATASET_ID" \
--specs "$INFERENCE_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model ocrnet inference -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[inference.<inference_option>=<inference_option_value>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
results_dir: The global results directory. The inference results will be saved in
results_dir/inference.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
inference.<inference_option>: The inference options.
Multi-GPU inference is currently not supported for OCRNet.
Exporting the Model#
Use the following command to export an OCRNet PyTorch checkpoint to an ONNX model:
EXPORT_JOB_ID=$(tao ocrnet create-job \
--kind experiment \
--name "ocrnet_export" \
--action export \
--workspace-id $WORKSPACE_ID \
--parent-job-id $TRAIN_JOB_ID \
--specs "$EXPORT_SPECS" \
--base-experiment-ids '["'$BASE_EXPERIMENT_ID'"]' \
--encryption-key "nvidia_tlt" | jq -r '.id')
tao model ocrnet export -e <experiment_spec_file>
[results_dir=<global_results_dir>]
[model.<model_option>=<model_option_value>]
[dataset.<dataset_option>=<dataset_option_value>]
[export.<export_option>=<export_option_value>]
Required Arguments
The following arguments are required.
-e, --experiment_spec_file: The path to the experiment spec file.
Optional Arguments
You can set optional arguments to override the option values in the experiment spec file.
results_dir: The global results directory. The export results will be saved in
results_dir/export.
model.<model_option>: The model options.
dataset.<dataset_option>: The dataset options.
export.<export_option>: The export options.
TensorRT Engine Generation and Validation#
For deployment, see TAO Deploy documentation.
Deploying to DeepStream#
For DeepStream integration, see Deploy nvOCDR to DeepStream.