Gesture Recognition¶
GestureNet is an NVIDIA-developed gesture classification model that is included in the Transfer Learning Toolkit. GestureNet supports the following tasks:
These tasks can be invoked from the TLT launcher using the following convention on the command line:
tlt gesturenet <sub_task> <args_per_subtask>
where args_per_subtask
are the command-line arguments required for a given subtask. Each
subtask is explained in detail below.
Pre-processing the Dataset¶
The GestureNet App requires the data images and labels to be in a specific format. Once it is
prepared, the Transfer Learning Toolkit includes dataset_convert
to prepare the
data for model training.
See the Data Annotation Format page for more information about the data format for gesture recognition.
Dataset Extraction Config¶
The dataset_config spec specifies the parameters neededed to crop hand bounding box and prepare dataset.
Here’s a sample spec:
"org_dataset": "data",
"mount_dir_path" :"/workspace/tlt-experiments/gesturenet/",
"org_data_dir" : "original",
"post_data_dir" : "extracted",
"kpi_users": ["uid_1", "uid_2"],
"sampling_rate": 1,
"convert_bbox_square": true,
"bbox_enlarge_ratio": 1.1,
"annotation_path": "annotation"
The following table describes the parameters:
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
org_dataset |
Name of dataset. |
String |
mount_dir_path |
Path to the root directory relative to which the data is stored. |
String |
org_data_dir |
Path to original images directory relative to |
String |
post_data_dir |
Path to directory relative to |
String |
kpi_users |
List of user IDs set aside for Test set. |
List of String |
sampling_rate |
Rate at which to select frames for labeling. If data is not video please set to 1. |
Integer |
1 if dataset is not from video |
convert_bbox_square |
Boolean variable to indicate if the labelled bounding box should be converted to a square. |
Boolean |
bbox_enlarge_ratio |
Scale factor used to enlarge bounding box. |
Float |
[1.0,1.2] |
annotation_config |
The nested annotation dictionary that contains path to folder with labels (relative to |
Dictionary |
Dataset Experiment Config¶
The dataset_experiment_config spec specifies the parameters neededed to combine different datasets. It allows user to provide user IDs that are set aside for validation or test set. It also allows different sampling strategies based on meta data and class counts.
Here’s a sample spec:
"mount_dir_path" :"/workspace/tlt-experiments/gesturenet",
"org_data_dir" : "original",
"post_data_dir" : "extracted",
"set_list": {
"train_val": [
"kpi": [
"uid_list": {
"uid_name": "user_id",
"predefined_val_users": false,
"val_fraction": 0.25,
"validation": [
"kpi": [
"image_feature_filter": {
"sampling": {
"sampling_mode": "average",
"use_class_weights": false,
"class_weights": {
"thumbs_up": 0.5,
"v": 0.5
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
mount_dir_path |
Path to the root directory relative to which the data is stored. |
String |
org_data_dir |
Path to original images directory relative to |
String |
post_data_dir |
Path to directory relative to |
String |
set_list |
This nested configuration for parameters related to datasets. |
Dictionary |
uid_list |
This nested configuration for parameters related to user ids. |
Dictionary |
image_feature_filter |
This nested configuration for parameters related to filtering images based on metadata. |
Dictionary |
sampling |
This nested configuration for parameters related to class weights and sampling startegy |
Dictionary |
The following table describes the set_list
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
train_val |
List of datasets from which to select users for training and validation. |
List of String |
kpi |
List of datasets from which to select users for test set. |
List of String |
The following table describes the uid_list
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
uid_name |
Name of field that represents unique identifier of each subject. |
String |
predefined_val_users |
Flag to indicate if train-validation split of is specifed by config. |
Boolean |
val_fraction |
Fraction of non-kpi users used for validation set. Only used if |
Float |
validation |
List of uid used in validation set. Only used if |
List of String |
kpi |
List of uid assigned to test set. |
List of String |
The following table describes the image_feature_filter
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
train_val |
Metadata fields that used to discard images in training and validation set. |
Dictionary |
kpi |
Metadata fields that used to discard images in test set. |
Dictionary |
The following table describes the sampling
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
sampling_mode |
Sampling methodology when using class_weights
String |
“average” |
use_class_weights |
Boolean variable to indicate if sampling should be based on class weights. |
Boolean |
True / False |
class_weights |
Dictionary mapping gesture classes of interest and their class weight. |
Dictionary |
Sample Usage of the Dataset Converter Tool¶
TLT has built in commands to run prepare datset for GestureNet model and is given below.
tlt gesturenet dataset_convert --dataset_spec <dataset_spec_path>
--experiment_spec <experiment_spec_path>
--k_folds <num_folds>
--output_filename <output_filename>
--experiment_name <experiment_name>
Required Arguments¶
: The path to dataset spec.--experiment_spec
: The path to dataset experiment spec.--k_folds
: Number of folds.--output_filename
: Output json that is ingested by GestureNet training pipeline.--experiment_name
: Name of experiment.
Sample Usage¶
Here is an example using a GestureNet model.
tlt gesturenet dataset_convert --dataset_spec $SPECS_DIR/dataset_config.json \
--k_folds 0 \
--experiment_spec $SPECS_DIR/dataset_experiment_config.json \
--output_filename $USER_EXPERIMENT_DIR/data.json \
--experiment_name v1
Creating a Configuration File¶
To do training, evaluation, and inference for GestureNet, several components need to be configured, each with their own parameters. The gesturenet train, gesturenet evaluate, and gesturenet inference commands for a GestureNet experiment share the same configuration file.
The main components of the configuration file is given below.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
random_seed |
The random seed for the experiment. |
Unsigned int |
108 |
batch_size |
Batch size used for experiment. |
Unsigned Int |
64 |
output_experiments_fld |
Directory where experiments will be saved. |
String |
save_weights_path |
Folder in output_experiments_fld that the model will be saved to. |
String |
trainer |
Trainer configuration. |
model |
Model configuration. |
evaluator |
Evaluator configuration. |
Trainer Config¶
The trainer configuration allows you to configure how you want to train your model. The two main components are top_training and finetuning. num_workers allows you to specify how many workers to use to train your model. Details on the top_training configuration file is explained in detail in the next section.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
top_training |
Top Training Configuration |
finetuning |
Fine Tuning Configuration |
num_workers |
Number of workers to train model. |
Unsigned Int |
1 |
Top Training Config¶
The top training configuration allows you to customize how your model trains. There are 5 main components to the top_training configuration and they are as follows:
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
stage_order |
1 |
loss_fn |
Loss function to use for top training the model. Currently only supports categorical cross entropy. |
String |
categorical_crossentropy |
train_epochs |
The number of epochs to perform top training. |
Unsigned Int |
5 |
num_layers_unfreeze |
The number of layers whose weights are updated during training. For example, if 3 layers are unfrozen then the model will freeze all the layers starting from the inputs until the last 3 layers in the model are left unfrozen. |
Unsigned Int |
3 |
optimizer |
The optimizer to use for top training. Currently support
String |
rmsprop |
Fine Tuning Config¶
Each fine tuning configuration file has 9 different options to perform fine tuning and is listed below.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
stage_order |
train_epochs |
The number of epochs to perform fine tuning. The fine tuning option will allow you to obtain the best results when switching datasets. Usually more layers are frozen and a lower learning rate is used to achieve the best results. |
Unsigned Int |
50 |
loss_fn |
The loss function to use for fine tuning. Currently only supports categorical crossentropy. |
String |
categorical_crossentropy |
initial_lrate |
The initial learning rate to be used for fine tuning. Fine tuning uses a step learning rate annnealing schedule according to the progress of the current experiment. The training progress is defined as the ratio of the current iteration to the maximum iterations. The scheduler adjusts the learning rate of the experiment in steps at regular intervals. |
Float |
3e-04 |
decay_step_size |
Decay step size for learning rate. Fine tuning uses a step learningrate annnealing schedule according to the progress of the current experiment. The training progress is defined as the ratio of the current iteration to the maximum iterations. The scheduler adjusts the learning rate of the experiment in steps at regular intervals. |
Float |
33 |
lr_drop_rate |
Drop rate for learning rate. Fine tuning uses a step learningrate annnealing schedule according to the progress of the current experiment. The training progress is defined as the ratio of the current iteration to the maximum iterations. The scheduler adjusts the learning rate of the experiment in steps at regular intervals. |
Float |
0.5 |
enable_checkpointing |
Flag to determine whether to enable checkpoints. |
Bool |
True |
num_layers_unfreeze |
The number of layers unfrozen (whose weights are updated) during training. It is advised to unfreeze most layers for finetuning step. |
Unsigned Int |
100 |
optimizer |
Optimizer to use for fine tuning. “sgd”, “adam” and “rmsprop” are supported. |
String |
sgd |
Model Config¶
The model configuration file allows you to customize the architecture you want to use and the hyperparameters. The key options available are given below.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
base_model |
The base model to use. The public version uses a vanilla resnet but the release version uses an optimized model that obtains better results. |
String |
resnet_vanilla |
num_layers |
Number of layers to use in the model. The current supported layers are 6, 10, 12, 18, 26, 34, 50, 101, 152. |
Unsigned Int |
18 |
weights_init |
The path to the saved weights. Model loads in the weights. |
String |
gray_scale_input |
Image input type. It is best to use RGB images but grayscale inputs work as well. If the images are RGB then set this flag to be false. |
Bool |
False |
data_format |
The image format to use. This must align with the model provided. The current options are either channels_first (NCHW) or channels_last (NHWC). At the moment, NCHW is the preferred format to use. |
String |
channels_first |
image_height |
Image height of the model input. |
UnsignedInt |
160 |
image_width |
Image width of the model input. |
UnsignedInt |
160 |
use_batch_norm |
Flag to determine whether to use batch normalization or not to use batch normalization. To use batch normalization set to True. |
Bool |
False |
kernel_regularizer_type |
The regularization to use for the convolutional layers. If you want to prune the model it is recommended to use l1 / lasso regularization. This helps to generate sparse weights that can later be pruned from min weight pruning. The current options are either l1 or l2 regularization. |
String |
l2 |
Kernel_regularizer_factor |
The value to use for the regularization. |
Float |
0.001 |
Evaluator Config¶
Evaluator configuration is the configuration options for evaluating your GestureNet.
Field |
Description |
Data Type and Constraints |
Recommended/Typical Value |
evaluation_exp_name |
Name of experiment. |
String |
data_path |
Path to evaluation json file. |
String |
Training the Model¶
TLT has built-in commands to train a GestureNet model and is given below.
tlt gesturenet train -e <spec_file>
-k <key>
Required Arguments¶
-e, --experiment_spec_filename
: Path to spec file.-k, –key
: User specific encoding key to save or load a.tlt
Sample Usage¶
Here’s an example of using the train command on GestureNet:
tlt gesturenet train -e $SPECS_DIR/train_spec.json \
-k $KEY
Evaluating the Model¶
TLT has built in commands to evaluate a GestureNet model and is given below.
tlt gesturenet evaluate -e <spec_file>
-m <model_file>
-k <key>
Required Arguments¶
-e, --experiment_spec_filename
: Experiment spec file to set up the evaluation experiment. This should be the same as training spec file.-m, --model
: Path to the model file to use for evaluation.-k, -–key
: Provide the encryption key to decrypt the model. This is a required argument only with a.tlt
model file.
Sample Usage¶
Here’s an example of using the evaluation command on a GestureNet model.
tlt gesturenet evaluate -e $USER_EXPERIMENT_DIR/model/train_spec.json \
-m $USER_EXPERIMENT_DIR/model/model.tlt \
-k $KEY
Running Inference on the Model¶
TLT has built in commands to run inference on a GestureNet model and is given below.
tlt gesturenet inference -e <spec_file>
-m <model_full_path>
-k <key>
--image_root_path <root_path>
--data_json <json_path>
--data_type <data_type>
-results_dir <results_dir>
Required Arguments¶
-e, --experiment_spec_filename
: Experiment spec file to set up the evaluation experiment. The model parameters should be the same as training spec file.-m, --model
: Path to the model file to use for evaluation.-k, -–key
: Provide the encryption key to decrypt the model. This is a required argument. only with a.tlt
model file.--image_root_path
: The root directpry that dataset is mounted at.--data_json
: The json spec with image path and hand bounding box.--data_type
: The dataset type within data_json that inference is to be run on.--results_dir
: Directory where the results are saved.
Sample Usage¶
Here is an example of running inference using a GestureNet model.
tlt gesturenet inference -e $USER_EXPERIMENT_DIR/model/train_spec.json \
-m $USER_EXPERIMENT_DIR/model/model.tlt \
-k $KEY \
--image_root_path /workspace/tlt-experiments/gesturenet \
--data_json /workspace/tlt-experiments/gesturenet/data.json \
--data_type kpi_set \
--results_dir $USER_EXPERIMENT_DIR/model
Exporting the Model¶
The command to export the GestureNet model to a TensorRT plan can be found below. It only supports FP16 at the moment.
tlt gesturenet export -m <model_full_path>
-k <key>
-o <out_file>
-t <export_type>
-ll <log_level>
Required Arguments¶
-m, --model_filename
: The full path to the model to export.-k, --key
: Encryption key used to train the model.-o, --out_file
: Place to save the exported model.
Optional Arguments¶
-t, --export_type
: Type of export to use. Options are onnx or tfonnx.-ll, --log_level
: The log level to use.
Sample Usage¶
Here is an example of exporting a GestureNet model.
tlt gesturenet export -m $USER_EXPERIMENT_DIR/model/model.tlt \
-k $KEY \
-o $USER_EXPERIMENT_DIR/model/model.etlt \
-t 'tfonnx'
Deploying to the TLT CV Inference Pipeline¶
The pretrain model for gesture classification provided through NGC is available by default to use inside the TLT CV Inference Pipeline. You can also deploy a model trained through TLT workflow to the TLT CV Inference Pipeline. Refer to TLT CV Quick Start Scripts section for instructions of both options.