Gaze Estimation

GazeNet is an NVIDIA developed gaze estimation model which is included in the TAO Toolkit as one of the models supported. With GazeNet the following tasks are supported:

  • dataset_convert

  • train

  • evaluate

  • inference

  • export

These tasks may be invoked from the TAO Toolkit Launcher by following the below mentioned convention from command line:

Copy
Copied!
            

tao gazenet <sub_task> <args_per_subtask>

where args_per_subtask are the command line arguments required for a given subtask. Each of these sub-tasks are explained in detail below.

As described in the Data Annotation Format section, the GazeNet app requires a defined JSON format data to be converted to TFRecords. This can be done using the dataset_convert subtask under GazeNet.

The dataset_convert tool takes in a defined json format data and convert it to the TFRecords that the GazeNet model ingests. See the following sections for the sample usage examples.

Sample Usage of the Dataset Converter Tool

The labeling json data format is the accepted dataset format for GazeNet. The labeling json data format must be converted to the TFRecord file format for ingestion. The sampe usage for the dataset_convert tool is as mentioned below.

Copy
Copied!
            

tao gazenet dataset_convert [-h] -folder-suffix TFRECORDS_FOLDER_SUFFIX -norm_folder_name NORM_DATA_FOLDER_NAME -sets DATASET_NAME -data_root_path DATASET_ROOT_PATH

You can use these optional arguments:

  • -h, --help: Show this help message and exit.

  • -folder-suffix, --ground_truth_experiment_folder_suffix: suffix of folder including generated .tfrecords files.

  • -norm_folder_name --norm_folder_name: Folder to generate normalized.

  • -data_root_path, –-data_root_path: root path to the dataset.

  • -sets --set_ids: name of the dataset.

Here’s an example of using the command with the dataset:

Copy
Copied!
            

tao gazenet dataset_convert -folder-suffix <tfrecord_folder_suffix> -norm_folder_name <norm data folder name>> \ -sets sample-set <dataset name> -data_root_path <root path to the dataset>

Output log from executing tao gazenet dataset_convert:

Copy
Copied!
            

Using TensorFlow backend. Test ['p01-1'] Validation ['p01-0'] Train ['p01-4', 'p01-3', 'p01-2'] Test ['p01-1'] Validation ['p01-0'] Train ['p01-4', 'p01-3', 'p01-2']


To do training, evaluation, and inference for GazeNet, several components need to be configured, each with their own parameters. The train, evaluate, and inference tasks for a GazeNet experiment share the same configuration file.

The specification file for GazeNet training configures the following components of training pipeline:

  • Trainer/Evaluator

  • Model

  • Loss

  • Optimizer

  • Dataloader

  • Augmentation

Trainer/Ealuator

GazeNet trainer and evaluator share the same configurations.

Here’s a sample example to config GazeNet trainer.

Copy
Copied!
            

__class_name__: GazeNetTrainer checkpoint_dir: null checkpoint_n_epoch: 1 dataloader: ... evaluation_metric: rmse network_inputs: face_and_eyes_and_fgrid infrequent_summary_every_n_steps: 0 log_every_n_secs: 10 model_selection_metric: logcosh num_epoch: 2 random_seed: 42 hooks: null enable_visualization: false loss: ... model: ... optimizer: ... visualize_bins_2d: 5 visualize_bins_3d: 100 visualize_num_images: 3

The following table describes the parameters used to config the trainer:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string GazeNetTrainer Name for the trainer specification GazeNetTrainer
checkpoint_dir string null Path to the checkpoint. If not specified, will save all checkpoints in the output folder NA
checkpoint_n_epoch int 1 Save checkpoint per n number of epochs 1 to num_epoch
dataloader structure NA Dataloader specification 1 to num_epoch
evaluation_metric string rmse Metric used during KPI testing rmse
network_inputs string face_and_eyes_and_fgrid Input type (only ‘face_and_eyes_and_fgrid’ is supported) face_and_eyes_and_fgrid
infrequent_summary_every_n_steps int 0 Infrequent summary every n epoch 0 to num_epoch
log_every_n_secs int 10 Log the training output for every n secs NA
model_selection_metric string logcosh Metric used to select final model logcosh
num_epoch int 40 Number of epochs NA
random_seed int 42 Random seed used during the experiments NA
enable_visualization boolean false Toggle to enable visualization false/true
visualize_bins_2d int 5 Resolution for 2D data distribution visualization NA
visualize_bins_3d int 100 Resolution for 3D data distribution visualization NA
visualize_num_images int 3 Number of data images to show on Tensorboard NA

Model

GazeNet can be configured using the model option in the spec file.

Here’s a sample model config to instantiate a GazeNet model with pretrained weights and the number of freeze blocks.

Copy
Copied!
            

model: __class_name__: GazeNetBaseModel model_parameters: dropout_rate: 0.25 frozen_blocks: 5 num_outputs: 5 pretrained_model_path: /workspace/tao-experiments/gazenet/pretrain_models/model.tlt regularizer_type: l2 regularizer_weight: 0.002 type: GazeNet_public use_batch_norm: true

The following table describes the model parameters:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string GazeNetBaseModel Name for the model config GazeNetBaseModel
dropout_rate float 0.3 Probability for drop out 0.0-1.0
frozen_blocks int 2 This parameter defines how many blocks that will be frozen during training. If the value for this variable is set to be larger than 0, provide a pretrain model. 0,1,2,3,4,5,6
num_outputs int 5 Number of outputs (x, y, z point of regards amd theta, phi gaze vector) 5
pretrained_model_path string null Path to the pretrain model NA
regularization_type string l2 Type of the regularization l1/l2/None
regularizer_weight float 0.002 Factor of the regularization 0.0-1.0
type string GazeNet_public Type of supported GazeNet model. Only “GazeNet_public” is currently supported. GazeNet_public
use_batch_norm boolean true Boolean variable to use batch normalization layers or not true/false

Loss

This section helps you configure the parameters for loss, optimizer, and learning rate scheduler for optimizer.

Copy
Copied!
            

loss: __class_name__: GazeLoss loss_type: logcosh

The following table describes the loss parameters:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string GazeLoss Name of the loss config NA
loss_type string logcosh Type of the loss function l1/rmse/cosine/l1_cosine_joint/l2_cosine_joint/logcosh l1: l1 loss rmse: root mean square error l1_cosine_joint: l1 loss for x, y, z point of regards cosine loss for theta and phi l2_cosine_joint: l2 loss for x, y, z point of regards cosine loss for theta and phi logcosh: log-cosh loss

Optimizer

Copy
Copied!
            

optimizer: __class_name__: AdamOptimizer beta1: 0.9 beta2: 0.999 epsilon: 1.0e-08 learning_rate_schedule: __class_name__: SoftstartAnnealingLearningRateSchedule annealing: 0.8 base_learning_rate: 0.003 last_step: 263000 min_learning_rate: 5.0e-07 soft_start: 0.2

The following table describes the optimizer parameters:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string AdamOptimizer Name of the optimizer config AdamOptimizer/AdadeltaOptimizer/GradientDescentOptimizer
beta1 float 0.9 The exponential decay rate for the 1st moment estimates 0-1
beta2 float 0.999 The exponential decay rate for the 2nd moment estimates 0-1
epsilon float 1.0e-08 A small constant for numerical stability NA
learning_rate_schedule structure SoftstartAnnealingLearningRateSchedule Type of learning rate schedule SoftstartAnnealingLearningRateSchedule ConstantLearningRateSchedule ExponentialDecayLearningRateSchedule

The following table describes the learning_rate_schedule parameters:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string SoftstartAnnealingLearningRateSchedule Name of the learning rate schedule config SoftstartAnnealingLearningRateSchedule - This scheduling has soft starting and ending learning rate value ConstantLearningRateSchedule - This scheduling has constant learning rate value ExponentialDecayLearningRateSchedule - This scheduling has learning rate that are decay exponentially
soft_start float 0.2 Indicating the fraction of last_step that will be taken before reaching the base_learning rate 0-1
annealing float 0.8 Indicating the fraction of last_step after which the learning rate ramps down from base_learning rate 0-1
base_learning_rate float 0.0002 Learning rate 0-1
min_learning_rate float 2.0e-07 Minimum value the learning rate will be set to 0-1
last_step int 953801 Last step the schedule is made for NA

GazeNet currently supports the soft-start annealing learning rate schedule. The learning rate when plotted as a function of the training progress (0.0, 1.0) results in the following curve.

learning_rate.png

In this experiment, the soft start was set as 0.2 and annealing as 0.8 with minimum learning rate as 2.0e-07 and a maximum learning rate or base_lr as 0.0002.

Dataloader

The dataloader module provides parameters used for dataset pre-processing, some basic pre-processing, data and dataloader when training. Here is a sample dataloader specification element:

Copy
Copied!
            

dataloader: __class_name__: GazeNetDataloaderAugV2 augmentation_info: ... batch_size: 128 dataset_info: ... eye_scale_factor: 1.8 face_scale_factor: 1.3 filter_phases: - training - testing - validation - kpi_testing filter_info: ... image_info: facegrid: channel: 1 height: 25 width: 25 image_face: channel: 1 height: 224 width: 224 image_frame: channel: 1 height: 480 width: 640 image_left: channel: 1 height: 224 width: 224 image_right: channel: 1 height: 224 width: 224 input_normalization_type: zero-one kpiset_info: ... learn_delta: false use_head_norm: false num_outputs: 5 theta_phi_degrees: false use_narrow_eye: true add_test_to: null

The following table describes the dataloader specification parameters:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string GazeNetDataloaderAugV2 Name of the dataloader specification GazeNetDataloaderAugV2
augmentation_info structure NA Augmentation specification NA
batch_size int 128 Batch size 0-1
dataset_info structure NA dataset specification 0-1
eye_scale_factor float 1.8 Scaling factor for eyes (if value is larger than 1, then eye crop is enlarged) NA
face_scale_factor float 1.3 Scaling factor for the face (if value is larger than 1, then face crop is enlarged) NA
filter_phases structure training, testing, validation, kpi_testing Phase to apply the filter training/testing/validation/kpi_testing
filter_info structure NA Data filter variables and criteria NA
image_info structure NA Input image information facegrid, image_face, image_frame, image_left, image_right
channel int NA Input normalization type zero-one
input_normalization_type string NA Input normalization type zero-one
kpiset_info structure NA KPI set information NA
learn_delta boolean false Boolean values to enable/disable learning of variable difference false
use_head_norm boolean false Data filter variable and criteria false
num_outputs int 5 Number of outputs (x, y, z point of regards amd theta, phi gaze vector) 5
theta_phi_degrees boolean false Boolean values to enable/disable theta phi learning false
use_narrow_eye boolean true Boolean values to enable/disable tight eye input true/false
add_test_to string null Testing dataset from dataio can be added to training or validation. By default, will keep testing dataset for KPI usage null/training/validation
Copy
Copied!
            

dataset_info: ground_truth_folder_name: - Ground_Truth_DataFactory_pipeline image_extension: png root_path: null test_file_name: test.tfrecords tfrecord_folder_name: - TfRecords_joint_combined tfrecords_directory_path: - /workspace/tao-experiments/gazenet/data/MPIIFaceGaze/sample-dataset tfrecords_set_id: - p01-day03 train_file_name: train.tfrecords validate_file_name: validate.tfrecords

The following table describes the dataset_info parameters:

Parameter

Datatype

Default

Description

Supported Values

ground_truth_folder_name string NA Ground truth folder name NA
image_extension string NA Image extension NA
root_path string NA Root path null
test_file_name string NA File name for test tfrecords NA
tfrecord_folder_name string NA Tfrecords folder name NA
tfrecords_directory_path string NA Path to Tfrecords directory NA
tfrecords_set_id string NA Set ID NA
train_file_name string NA File name for train tfrecords NA
validate_file_name string NA File name for validate tfrecords NA
Copy
Copied!
            

filter_info: - desired_val_max: 400.0 desired_val_min: -400.0 feature_names: - label/gaze_cam_x - desired_val_max: 400.0 desired_val_min: -400.0 feature_names: - label/gaze_cam_y - desired_val_max: 300.0 desired_val_min: -300.0 feature_names: - label/gaze_cam_z

The following table describes the filter_info parameters:

Parameter

Datatype

Default

Description

Supported Values

feature_names string NA Feature name label/gaze_cam_x, label/gaze_cam_y, label/gaze_cam_z
desired_val_max float NA Maximum value for the feature NA
desired_val_min float NA Minimum value for the feature NA
Copy
Copied!
            

kpiset_info: ground_truth_folder_name_kpi: - Ground_Truth_DataFactory_pipeline kpi_file_name: test.tfrecords kpi_root_path: null kpi_tfrecords_directory_path: - /workspace/tao-experiments/gazenet/data/MPIIFaceGaze/sample-dataset tfrecord_folder_name_kpi: - TfRecords_joint_combined tfrecords_set_id_kpi: - p01-day03

The following table describes the dataset_info parameters:

Parameter

Datatype

Default

Description

Supported Values

ground_truth_folder_name_kpi string NA Ground truth folder name for KPI dataset NA
kpi_file_name string NA File name for KPI tfrecords NA
kpi_root_path string null KPI root path Reserved value, currently only null is supported
kpi_tfrecords_directory_path string NA Path to KPI Tfrecords directory NA
tfrecord_folder_name_kpi string NA KPI tfrecords folder name NA
tfrecords_set_id_kpi string NA KPI tfrecords set ID NA

Augmentation

The augmentation module provides some basic pre-processing and augmentation when training. Here is a sample augmentation element:

Copy
Copied!
            

augmentation_info: blur_augmentation: blur_probability: 0.0 kernel_sizes: - 1 - 3 - 5 - 7 - 9 enable_online_augmentation: true gamma_augmentation: gamma_max: 1.1 gamma_min: 0.9 gamma_probability: 0.1 gamma_type: uniform modulus_color_augmentation: contrast_center: 127.5 contrast_scale_max: 0.0 hue_rotation_max: 0.0 saturation_shift_max: 0.0 modulus_spatial_augmentation: hflip_probability: 0.5 zoom_max: 1.0 zoom_min: 1.0 random_shift_bbx_augmentation: shift_percent_max: 0.16 shift_probability: 0.9

The following table describes the augmentation parameters:

Parameter

Datatype

Default

Description

Supported Values

__class_name__ string GazeNetDataloaderAugV2 Name of the dataloader specification GazeNetDataloaderAugV2
augmentation_info structure NA Augmentation specification NA
blur_augmentation structure NA Blur augmentation specification NA
blur_probability float 0.0 Probability of imgages to apply blur augmentation 0.0 - 1.0
kernel_sizes int 1,3,5,7,9 Kernel size for the blur operation 1, 3, 5, 7, 9
enable_online_augmentation boolean true Boolean values to enable/disable augmentation true/false
gamma_augmentation structure NA Gamma augmentation specification NA
gamma_max float 1.1 Maximum value of gamma variable 1.0 - 1.4
gamma_min float 0.9 Minimum value of gamma variable 0.7 - 1.0
gamma_probability float 0.1 Probability of data to apply gamma augmentation uniform
gamma_type string uniform Type of gamma augmentation true/false
modulus_color_augmentation structure NA Color argumentation specification NA
contrast_center float 127.5 Contrast center for color argumentation 0 - 255
contrast_scale_max float 0.0 Maximum scale of contrast change NA
hue_rotation_max float 0.0 Maximum hue rotation change NA
saturation_shift_max float 0.0 Maximum saturation shift change NA
modulus_spatial_augmentation structure NA Spatial augmentation specification NA
hflip_probability float 0.5 Probability of data to apply horizontal flip 0.0 - 1.0
zoom_max float 1.0 Maximum zoom scale NA
zoom_min float 1.0 Minimum zoom scale NA
random_shift_bbx_augmentation structure NA Bounding box random ship augmentation NA
shift_percent_max float 0.16 Maximum percent shift of the bounding box NA
shift_probability float 0.9 Probability of data to apply random shift augmentation 0.0 - 1.0

After following the steps to Pre-processing the Dataset to create TFRecords ingestible by the TAO training, and setting up a spec file. You are now ready to start training a gaze estimation network.

GazeNet training command:

Copy
Copied!
            

tao gazenet train [-h] -e <spec_file> -r <result directory> -k <key>

Required Arguments

  • -r, --results_dir: Path to a folder where experiment outputs should be written.

  • -k, –key: User specific encoding key to save or load a .tlt model.

  • -e, --experiment_spec_file: Path to spec file. Absolute path or relative to working directory.

Optional Arguments

-h, --help: To print help message.

Sample Usage

Here is an example of command for gazenet training:

Copy
Copied!
            

tao gazenet train -e <path_to_spec_file> -r <path_to_experiment_output> -k <key_to_load_the_model>

Note

The tao gazenet train tool can support training on images of different resolutions. Face, left eye, and right eye crop is obtained online through dataloader. However, it requires all input images to have the same resolution.


Execute evaluate on a GazeNet model.

Copy
Copied!
            

tao gazenet evaluate [-h] -type <testing dataset type> -m <model_file> -e <experiment_spec> -k <key>

Required Arguments

  • -e, --experiment_spec_file: Experiment spec file to set up the evaluation experiment. This should be the same as training spec file.

  • -m, --model: Path to the model file to use for evaluation. This could be a .tlt model file or a tensorrt engine generated using the export tool.

  • -k, -–key: Provide the encryption key to decrypt the model. This is a required argument only with a .tlt model file.

Optional Arguments

  • -h, --help: show this help message and exit.

If you have followed the example in Training the Model, you may now evaluate the model using the following command:

Copy
Copied!
            

tao gazenet evaluate -type <testing data type> -e <path to training spec file> -m <path to the model> -k <key to load the model>

Note

This command runs evaluation on the testing/KPI dataset.

Use these steps to evaluate on a new test set with ground truth labeled:

  1. Create tfrecords for this test set by following the steps listed in Pre-processing the Dataset section.

  2. Update the dataloader configuration part of the training experiment spec file to update kpiset_info with newly generated tfrecords for the test set. For more information on the dataset config, refer to Creating an Experiment Specification File. The evaluate tool iterates through all the folds in the kpiset_info.

Copy
Copied!
            

kpiset_info: ground_truth_folder_name_kpi: - Ground_Truth_Folder_Dataset1 - Ground_Truth_Folder_Dataset2 kpi_file_name: test.tfrecords kpi_root_path: null kpi_tfrecords_directory_path: - /path_to_kpi_dataset1 - /path_to_kpi_dataset2 tfrecord_folder_name_kpi: - TfRecords_joint_combined - TfRecords_joint_combined tfrecords_set_id_kpi: - kpi_dataset1 - kpi_dataset2

The rest of the experiment spec file remains the same as the training spec file.

The inference task for gazenet may be used to visualize gaze vector. An example of the command for this task is shown below:

Copy
Copied!
            

tao gazenet inference [-h] -e </path/to/inference/spec/file> \ -i </path/to/inference/input> \ -m <model_file> \ -o </path/to/inference/output> \ -k <model key>

Required Parameters

  • -e, --inference_spec: Path to an inference spec file.

  • -i, --inference_input: The directory of input images or a single image for inference.

  • -o, --inference_output: The directory to the output images and labels.

  • -k, --enc_key: Key to load model.

Sample usage for the inference sub-task

Here’s a sample command to run inference for a testing dataset.

Copy
Copied!
            

tao gazenet inference -e $SPECS_DIR/gazenet_tlt_pretrain.yaml \ -i $DATA_DOWNLOAD_DIR/inference-set \ -m $USER_EXPERIMENT_DIR/experiment_result/exp1/model.tlt \ -o $USER_EXPERIMENT_DIR/experiment_result/exp1 \ -k $KEY


Exporting the GazeNet Model

Here’s an example of the command line arguments of the export command:

Copy
Copied!
            

tao gazenet export [-h] -m <path to the .tlt model file generated by tao train> -o <path to output file> -t tfonnx -k <key>

Required Arguments

  • -m, --model_filename: Path to the .tlt model file to be exported using export.

  • -k, --output_filename: Key used to save the .tlt model file.

  • -o, --key: Key used to save the .tlt model file.

  • -t, --export_type: Model type to export to. Only ‘tfonnx’ is support in TAO Toolkit 3.0-21.08.

Sample usage for the export sub-task

Here’s a sample command to export a GazeNet model.

Copy
Copied!
            

tao gazenet export -m $USER_EXPERIMENT_DIR/experiment_result/exp1/model.tlt -o $USER_EXPERIMENT_DIR/experiment_dir_final/gazenet_onnx.etlt -t tfonnx -k $KEY


Deploying to DeepStream 6.0

The pretrained model for GazeNet provided through NGC is available by default with DeepStream 6.0.

For more details, refer to DeepStream TAO Integration for GazeNet.

© Copyright 2023, NVIDIA.. Last updated on Sep 5, 2023.