TAO v5.5.0
NVIDIA TAO v5.5.0

LPRNet

LPRNet (License Plate Recognition Net) takes images as network input and predicts sequences of license plates characters.

The dataset for LPRNet contains cropped license plates images and corresponding label files.

The data structure must be in the following format:

Copy
Copied!
            

/Dataset_01 /images 0000.jpg 0001.jpg 0002.jpg ... ... ... N.jpg /labels 0000.txt 0001.txt 0002.txt ... ... ... N.txt /characters_list.txt

Each cropped license plate image has a corresponding label text file that contains one line of characters in the specific license plate. There is a characters_list.txt which has all the characters found in license plate dataset. Each character occupies one line.

The spec file for LPRNet includes the random_seed, lpr_config, training_config, eval_config, augmentation_config, and dataset_config parameters. Here is an example for training on the NVIDIA license plate dataset:

Copy
Copied!
            

random_seed: 42 lpr_config { hidden_units: 512 max_label_length: 8 arch: "baseline" nlayers: 10 } training_config { batch_size_per_gpu: 32 num_epochs: 100 learning_rate { soft_start_annealing_schedule { min_learning_rate: 1e-6 max_learning_rate: 1e-4 soft_start: 0.001 annealing: 0.7 } } regularizer { type: L2 weight: 5e-4 } } eval_config { validation_period_during_training: 5 batch_size: 1 } augmentation_config { output_width: 96 output_height: 48 output_channel: 3 max_rotate_degree: 5 rotate_prob: 0.5 gaussian_kernel_size: 5 gaussian_kernel_size: 7 gaussian_kernel_size: 15 blur_prob: 0.5 reverse_color_prob: 0.5 keep_original_prob: 0.3 } dataset_config { data_sources: { label_directory_path: "/path/to/train/labels" image_directory_path: "/path/to/train/images" } characters_list_file: "/path/to/lp_characters" validation_data_sources: { label_directory_path: "/path/to/test/labels" image_directory_path: "/path/to/test/images" } }

Parameter Data Type Default Description
random_seed Unsigned int 42 The random seed for the experiment
lpr_config proto message – The configuration of the model architecture
training_config proto message – The configuration of the training process
eval_config proto message – The configuration of the evaluation process
augmentation_config proto message – The configuration for data augmentation
dataset_config proto message – The configuration for the dataset

lpr_config

The lpr_config parameter provides options to change the LPRNet architecture.

Copy
Copied!
            

lpr_config { hidden_units: 512 max_label_length: 8 arch: "baseline" nlayers: 10 }

Parameter

Datatype

Default

Description

Supported Values

hidden_units Unsigned int 512 The number of hidden units in the LSTM layers of LPRNet
max_label_length Unsigned int 8 The maximum length of license plates in the dataset
arch String baseline The architecture of LPRNet baseline
nlayers Unsigned int 10 The number of convolution layers in LPRNet 10, 18

training_config

The training_config parameter defines the hyperparameters of the training process.

Copy
Copied!
            

training_config { checkpoint_interval: 5 batch_size_per_gpu: 32 num_epochs: 100 learning_rate { soft_start_annealing_schedule { min_learning_rate: 1e-6 max_learning_rate: 1e-4 soft_start: 0.001 annealing: 0.7 } } regularizer { type: L2 weight: 5e-4 } }

Parameter Datatype Default Description Supported Values
batch_size_per_gpu int 32 The number of images per batch per GPU >1
num_epochs int 120 The total number of epochs used to run in the experiment
checkpoint_interval int 5 The interval at which the checkpoints are saved >0

learning rate

learning rate scheduler proto

soft_start
_annealing
_schedule

The learning rate schedule for the trainer. Currently,
LPRNet only supports the soft-start annealing learning rate schedule. It may be
configured using the following parameters:

* soft_start (float): The time to ramp up the learning rate from the minimum learning rate to the maximum learning rate
* annealing (float): The time to cool down the learning rate from the maximum learning rate to the minimum learning rate
* minimum_learning_rate (float): The minimum learning rate in the learning rate schedule.
* maximum_learning_rate (float): The maximum learning rate in the learning rate schedule.

annealing: 0.0-1.0 and greater than soft_start – Soft_start: 0.0 - 1.0

A sample lr plot for a soft start of 0.3 and annealing of 0.1 is shown
in the figure below.

regularizer

regularizer proto config

This parameter configures the type and the weight of the regularizer to be used during training. This config contains two
parameters:

* type: The type of regularizer being used
* weight: The floating point weight of the regularizer

The following are supported values for type are:

* NO_REG
* L1
* L2

visualizer Message type Training visualization config
early_stopping Message type Early stopping config

visualizer

Visualization during training is configured using the visualizer parameter. The parameters are described in the table below.

Parameter Datatype Default Description Supported Values
enabled boolean false A Boolean flag to enable or disable this feature
num_images int 3 The maximum number of images to be visualized in TensorBoard. >0

If visualization is enabled, the TensorBoard log will be produced during training, including the graphs for learning rate, training loss, and validation accuracy. The augmented images will also be produced in the TensorBoard.

early_stopping

The parameters for early stopping are described in the table below.

Parameter Datatype Default Description Supported Value
monitor string The metric to monitor in order to enable early stopping loss
patience int 0 The number of checks for the monitor value before stopping the training
min_delta float 0.0 The delta of the minimum value of monitor, below which it is regarded as not decreasing.

eval_config

The eval_config parameter defines the hyperparameters of the evaluation process. The metric for evaluation is the license plate recognition accuracy. The recognition will be regarded as correct if all the characters in a license plated are classified correctly.

Copy
Copied!
            

eval_config { validation_period_during_training: 5 batch_size: 1 }

Parameter Datatype Default/Suggested value Description Supported Values
validation_period_during_training int 5 The interval at which evaluation is run during training. The evaluation is run at this interval starting from the value of the first validation epoch parameter as specified below. 1 - total number of epochs
batch_size int 1 The number of samples to do a single inference on >0

augmentation_config

The augmentation_config parameter contains the hyperparameters for augmentation during training. It also defines the spatial size of the network input.

Copy
Copied!
            

augmentation_config { output_width: 96 output_height: 48 output_channel: 3 max_rotate_degree: 5 rotate_prob: 0.5 gaussian_kernel_size: 5 gaussian_kernel_size: 7 gaussian_kernel_size: 15 blur_prob: 0.5 reverse_color_prob: 0.5 keep_original_prob: 0.3 }

Parameter Datatype Default/Suggested value Description Supported Values
output_width unsigned int 96 The width of preprocessed images. The width of network input >0
output_height unsigned int 48 The height of preprocessed images. The height of network input >0
output_channel unsigned int 3 The channel of preprocessed images 1, 3
keep_original_prob float 0.3 The probability for keeping original images. Only resized will be applied to am image with this probability 0.0 ~ 1.0
max_rotate_degree unsigned int 5 The maximum rotation angle for augmentation 0.0 ~ 90.0
rotate_prob float 0.5 The probability for rotating the image 0.0 ~ 1.0
gaussian_kernel_size unsigned int 5, 7, 15 The kernel size of the Gaussian blur >0
blur_prob float 0.5 The probability for blurring the image 0.0 ~ 1.0
reverse_color_prob float 0.5 The probability for reversing the color of the image 0.0 ~ 1.0

dataset_config

The dataset_config parameter defines the path to training dataset, validation dataset, and characters list file.

Copy
Copied!
            

dataset_config { data_sources: { label_directory_path: "/path/to/train/labels" image_directory_path: "/path/to/train/images" } characters_list_file: "/path/to/lp_characters" validation_data_sources: { label_directory_path: "/path/to/test/labels" image_directory_path: "/path/to/test/images" } }

Parameter Datatype Default/Suggested value Description Supported Values
data_sources dataset proto

The path to the training dataset images and labels:

  • label_directory_path: The path to the label directory
  • image_directory_path: The path to the image directory
validation_data_sources dataset proto The path to the validation dataset images and labels
characters_list_file string The path to the characters list file The characters in this file should be in unicode format.
Note

data_sources and validation_data_sources are both repeated fields. Multiple datasets can be added to sources.

Use the following command to run LPRNet training:

Copy
Copied!
            

tao model lprnet train -e <experiment_spec_file> -r <results_dir> -k <key> [--gpus <num_gpus>] [--gpu_index <gpu_index>] [--use_amp] [--log_file <log_file>] [-m <resume_model_path>] [--initial_epoch <initial_epoch>]

Required Arguments

  • -e, --experiment_spec_file: The path to the experiment spec file

  • -r, --results_dir: The path to a folder where the experiment outputs should be written.

  • -k, --key: The user-specific encoding key to save or load a .tlt model.

Optional Arguments

  • --gpus: The number of GPUs to be used in the training in a multi-GPU scenario (default: 1).

  • --gpu_index: The GPU indices used to run the training. We can specify the GPU indices used to run training when the machine has multiple GPUs installed.

  • --use_amp: A flag to enable AMP training.

  • --log_file: The path to the log file. Defaults to stdout.

  • -m, --resume_model_weights: The path to a pretrained model or a model to continue training.

  • --initial_epoch: The epoch to start training.

Here’s an example of using the LPRNet training command:

Copy
Copied!
            

tao model lprnet train --gpu_index=0 -e $DEFAULT_SPEC -r $RESULTS_DIR -k $YOUR_KEY

Note

To resume training from a .tlt model, set -m to the model and set --initial_epoch to the starting epoch number.

The evaluation metric of LPRNet is recognition accuracy. A recognition will be regarded accurate if all the characters in the license plate are correct.

Use the following command to run LPRNet evaluation:

Copy
Copied!
            

tao model lprnet evaluate -m <model> -e <experiment_spec_file> [-k <key>] [--gpu_index <gpu_index>] [--log_file <log_file>] [--trt]

Required Arguments

  • -m, --model: .tlt model or TRT engine to be evaluated.

  • -e, --experiment_spec_file: Experiment spec file to set up the evaluation experiment. This should be the same as a training spec file.

Optional Arguments

  • -h, --help: Show this help message and exit.

  • -k, --key:The encoding key for the .tlt model.

  • --gpu_index: The GPU index used to run the evaluation. You can specify the GPU index used to run evaluation when the machine has multiple GPUs installed. Note that evaluation can only run on a single GPU.

  • --log_file: The path to the log file. Defaults to stdout.

  • --trt: Evaluate the TRT engine.

Here’s an example of using the LPRNet evaluation command:

Copy
Copied!
            

tao model lprnet evaluate --gpu_index=0 -m $TRAINED_TLT_MODEL -e $DEFAULT_SPEC -k $YOUR_KEY

Use the following command to run inference on LPRNet with .tlt model or TensorRT engine:

Copy
Copied!
            

tao model lprnet inference -m <model> -i <in_image_path> -e <experiment_spec> [-k <key>] [--gpu_index <gpu_index>] [--log_file <log_file>] [--trt]

Required Arguments

  • -m, --model: The .tlt model or TensorRT engine to do inference with

  • -i, --in_image_path: The path to the license plate images to do inference with.

  • -e, --experiment_spec: The experiment spec file to set up export. Can be the same as the training spec.

Optional Arguments

  • -h, --help: Show this help message and exit.

  • -k, --key:The encoding key for the .tlt model.

  • --gpu_index: The GPU index used to run the inference. We can specify the GPU index used to run inference when the machine has multiple GPUs installed. Note that inference can only run on a single GPU.

  • --log_file: Path to the log file. Defaults to stdout.

  • --trt: Run inference with the TRT engine.

Here’s an example of using the LPRNet inference command:

Copy
Copied!
            

tao model lprnet inference --gpu_index=0 -m $SAVED_TRT_ENGINE -i $PATH_TO_TEST_IMAGES -e $DEFAULT_SPEC --trt

Use the following command to export LPRNet to .etlt format for deployment:

Copy
Copied!
            

tao model lprnet export -m <model> -k <key> -e <experiment_spec> [--gpu_index <gpu_index>] [--log_file <log_file>] [-o <output_file>] [--data_type {fp32,fp16}] [--max_workspace_size <max_workspace_size>] [--max_batch_size <max_batch_size>] [--engine_file <engine_file>] [-v]

Required Arguments

  • -m, --model: The .tlt model to be exported.

  • -k, --key: The encoding key of the .tlt model.

  • -e, --experiment_spec: Experiment spec file to set up export. Can be the same as the training spec.

Optional Arguments

  • --gpu_index: The index of (discrete) GPUs used for exporting the model. We can specify the GPU index to run export if the machine has multiple GPUs installed. Note that export can only run on a single GPU.

  • --log_file: The path to the log file. Defaults to stdout.

  • -o, --output_file: The path to save the exported model to. The default is ./<input_file>.etlt.

  • --data_type: The desired engine data type to generate the calibration cache if in INT8 mode. The options are fp32 or fp16. The default value is fp32.

You can use the following optional arguments to save the TRT engine that is generated to verify export:

  • --max_batch_size: The maximum batch size of TensorRT engine. The default value is 16.

  • --max_workspace_size: The maximum workspace size of the TensorRT engine. The default value is 1073741824(1<<30).

  • --engine_file: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Useful to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU.

Here’s an example for using the LPRNet export command:

Copy
Copied!
            

tao model lprnet export --gpu_index=0 -m $TRAINED_TAO_MODEL -e $DEFAULT_SPEC -k $YOUR_KEY

The deep learning and computer vision models that you trained can be deployed on edge devices, such as a Jetson Xavier, Jetson Nano, or Tesla, or in the cloud with NVIDIA GPUs.

DeepStream SDK is a streaming analytic toolkit to accelerate building AI-based video analytic applications. TAO is integrated with DeepStream SDK, so models trained with TAO will work out of the box with Deepstream.

Note

LPRNet .etlt cannot be parsed by DeepStream directly. You should use tao-converter to convert the .etlt model to optimized TensorRT engine and then integrate the engine into DeepStream pipeline.

Using tao-converter

The tao-converter is a tool that is provided with TAO to facilitate the deployment of TAO trained models on TensorRT and/or Deepstream. For deployment platforms with an x86 based CPU and discrete GPUs, the tao-converter is distributed within the TAO docker. Therefore, it is suggested to use the docker to generate the engine. However, this requires that the user adhere to the same minor version of TensorRT as distributed with the docker. The TAO docker includes TensorRT version 7.1. In order to use the engine with a different minor version of TensorRT, copy the converter from /opt/nvidia/tools/tao-converter to the target machine and follow the instructions for x86 to run it and generate a TensorRT engine.

For the aarch64 platform, the tao-converter is available to download in the dev zone.

Here is a sample command to generate LPRNet engine through tao-converter:

Copy
Copied!
            

tao-converter <etlt_model> -k <key_to_etlt_model> -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 -e <path_to_generated_trt_engine>

Through this command, an optimized TensorRT engine with dynamic input shape will be generated. (Dynamic shape of this engine: min_shape=[1x3x48x96], opt_shape=[4x3x48x96], max_shape=[16x3x48x96]. The shape format is NCHW.)

Deploying the LPRNet in the DeepStream sample

Once you get the TensorRT engine of LPRNet, you could deploy it into DeepStream’s LPDR sample. This sample is a complete solution including car detection, license plate detection, and license plate recognition. A configurable CTC decoder is provided in this sample.

Previous Character Recognition
Next Emotion Classification
© Copyright 2024, NVIDIA. Last updated on Aug 30, 2024.