Facial Landmarks Estimation

Facial Landmarks

The facial landmarks estimator network aims to predict the (x,y) location of landmarks (keypoints) for a given input face image. FPENet (Fiducial Points Estimator Network) is generally used in conjuction with a face detector and the output is commonly used for face alignment, head pose estimation, emotion detection, eye blink detection, gaze estimation, among others.

Dataset Preparation

The FPENet app requires the data to be in a specific json format to be converted to TFRecords. To do so, the tool requires a configuration file as input. Configuration file details and sample usage examples are included in the following sections.

The ground truth dataset is created by labeling ground-truth facial keypoints by human labelers. If you are looking to re-train with your own dataset, follow the guideline below.

  • Label the keypoints in the correct order as accuractely as possible. The human labeler would be able to zoom in to a face region to correctly localize the keypoint.

  • For keypoints that are not easily distinguishable such as chin or nose, the best estimate should be made by the human labeler. Some keypoints are easily distinguishable such as mouth corners or eye corners.

  • Label a keypoint as “occluded” if the keypoint is not visible due to an external object or due to extreme head pose angles. A keypoint is considered occluded when the keypoint is in the image but not visible.

  • To reduce discrepency in labeling between multiple human labelers, the same keypoint ordering and instructions should be used across labelers. An independent human labeler may be used to test the quality of the annotated landmarks and potential corrections.

The Sloth and Label Studio tools may be used for labeling.

The datset format is described in the Labeling Data Format section.

Configuration File for Dataset Converter

A sample dataset configuration file is shown below.

sets: [dataset1, dataset2]
gt_path: 'GT'
save_path: 'models/tfrecords'
gt_root_path: '/workspace/tlt-experiments/data/'
save_root_path: '/workspace/tlt-experiments/'
image_root_path: '/workspace/tlt-experiments/'
tfrecord_folder: 'FpeTfRecords'
tfrecord_name: 'data.tfrecords'
num_keypoints: 80
bbox_enlarge_ratio: 1.0

Parameter

Datatype

Description

Default

Supported Values

sets

list

Set IDs to extract as a list. Example- [set1, set2, set3].

gt_path

string

Ground truth json path.

save_path

string

Save path for TF Records.

gt_root_path

string

Root path for ground truth jsons (if any). This path is pre-pended to the gt_path while reading jsons.

save_root_path

string

Root path for saving tfrecords data (if any). This path is pre-pended to the save_path for each set.

image_root_path

string

Root path for the images (if any). This path will be pre-pended to the image paths in jsons.

tfrecord_folder

string

TF record folder name to generate. This folder will be created if not exists.

tfrecord_name

string

TF record file name to generate.

num_keypoints

int

Number of facial keypoints.

68, 80, 104

bbox_enlarge_ratio

float

Scale to enlarge face bounding box with.

Sample Usage of the Dataset Converter Tool

tlt fpenet dataset_convert -e dataset_config.yaml

Creating an Experiment Spec file

To do training, evaluation and inference for FPENet, several components need to be configured, each with their own parameters. The commands for a FPENet experiments share the same configuration file.

The specification file configures these components:

  • Trainer

  • Model

  • Loss

  • Dataloader

  • Optimizer

Trainer Config

The Trainer config consists of some common args for running the FPENet app and it also encompasses the other configs: model, loss, dataloader, and optimizer.

__class_name__: FpeNetTrainer
checkpoint_dir: /workspace/tlt-expertiments/fpenet/
checkpoint_n_epoch: 1
enable_visualization: true
log_every_n_secs: 10
num_epoch: 20
num_keypoints: 80
random_seed: 35
visualize_num_images: 3
model:
  ...
loss:
  ...
optimizer:
  ...
dataloader:
  ...

Argument

Datatype

Description

Default

Supported Values

checkpoint_dir

string

The directory to save/load model checkpoints.

None

checkpoint_n_epoch

int

Number of epoch at which checkpoint is saved.

1

1 to num_epoch

enable_visualization

boolean

Enable visualization in tensorboard.

True

True/False

log_every_n_secs

int

Logging frequency in seconds.

60

num_epoch

int

Total number of epochs to train.

40

num_keypoints

int

Number of facial keypoints.

80

68, 80, 104

random_seed

int

Random seed for initialization.

42

visualize_num_images

int

Number of images to visualize per epoch.

3

model

Model config.

loss

Loss config.

optimizer

Optimizer config.

dataloader

Dataloader config.

Model Config

Configuration section to provide model related parameters.

Sample model config is shown below.

model:
  __class_name__: FpeNetBaseModel
  model_parameters:
    beta: 0.01
    pretrained_model_path: /workspace/tlt-experiments/pretrained_models/public/model.tlt
    regularizer_type: l2
    regularizer_weight: 1.0e-05
    type: FpeNet_public

Parameter

Datatype

Description

Default

Supported Values

pretrained_model_path

string

Path to pre-trained model to load weights from.

None

regularizer_type

string

Type of weights regularizer.

“l1”, “l2”

regulaizer_weight

float

Weight for regularizer.

type

string

Model type.

“FpeNet_public”, “FpeNet_release”

Loss Config

Configuration section to provide loss related parameters.

Sample loss config is shown below.

loss:
  __class_name__: FpeLoss
  kpts_coeff: 0.01
  loss_type: square_euclidean
  mask_occ: true
  weights_dict: null
  elt_loss_info:
    elt_alpha: 0.5
    enable_elt_loss: true
    modulus_spatial_augmentation:
      hflip_probability: 0.0
      rotate_rad_max: 0.35
      translate_max_x: 10
      translate_max_y: 10
      zoom_max: 1.2
      zoom_min: 0.8

Paramter

Datatype

Description

Default

Supported Values

kpts_coeff

float

Coefficent the loss is multiplied with.

0.01

loss_type

string

Type of loss to use.

“l1”

“l1”, “square_euclidean”, “wing_loss”

mask_occ

boolean

If True, will mask all occluded points.

False

weights_dict

dictionary

Contains the weights for the ‘eyes’, the ‘mouth’, and the rest of the ‘face’. These dict keys must be present, and the elements must sum up to 1

None

elt_loss_info

elt loss config

Dictionary about ELT loss.

ELT Loss configuration used by FpeNet.

Defined in- Improving Landmark Localization with Semi-Supervised Learning” CVPR’2018

Parameter

Datatype

Description

Default

Supported Values

elt_alpha

float

Weight for ELT loss.

None

enable_elt_loss

boolean

Flag to enable ELT loss.

None

True/False

modulus_spatial_augmentation

dictionary

Spatial augmentation configuration parameters. hflip_probability: Probability for horizontal flipping. rotate_rad_max: Maximum rotation in radians. translate_max_x: Maximum pixel translate in x direction. translate_max_y: Maximum pixel translate in y direction. zoom_max: Zoom ratio maximum. zoom_min: Zoom ratio minimum.

hflip_probability: 0.0 rotate_rad_max: 0.0 translate_max_x: 0.0 translate_max_y: 0.0 zoom_max: 1.0 zoom_min: 1.0

hflip_proability: 0.0 - 1.0 rotate_rad_max: - translate_max_x: 0 - image dims translate_max_y: 0 - image dims zoom_max: - zoom_min: -

Dataloader Config

Configuration section to provide data related parameters.

Sample dataloader config is shown below.

dataloader:
  __class_name__: FpeNetDataloader
  augmentation_info:
    augmentation_resize_probability: 0.5
    augmentation_resize_scale: 1.6
    enable_occlusion_augmentation: true
    enable_online_augmentation: true
    enable_resize_augmentation: true
    gamma_augmentation:
      gamma_max: 1.6
      gamma_min: 0.6
      gamma_probability: 0.1
      gamma_type: uniform
    modulus_spatial_augmentation:
      hflip_probability: 0.25
      rotate_rad_max: 0.35
      translate_max_x: 10
      translate_max_y: 10
      zoom_max: 1.2
      zoom_min: 0.8
    patch_probability: 0.5
    size_to_image_ratio: 0.5
    mask_augmentation_patch: true
  batch_size: 64
  dataset_info:
    image_extension: png
    no_occlusion_masking_sets: s578-usercalibration-incar-0 s578-usercalibration-incar-1
    root_path: /workspace/tlt-experiments/
    tfrecord_folder_name: FpeTfRecords
    tfrecords_directory_path: /workspace/tlt-experiments/models/tfrecords
    tfrecords_set_id_train: s578-usercalibration-incar-0
    tfrecords_set_id_val: s578-usercalibration-incar-0
    tfrecord_file_name: data.tfrecords
  image_info:
    image:
      channel: 1
      height: 80
      width: 80
  kpiset_info:
    tfrecords_set_id_kpi: s578-usercalibration-incar-1
  num_keypoints: 80

Parameter

Datatype

Description

Default

batch_size

int

Batch size for training/evaluation.

dataset_info

dataset proto config

Information on input dataset.

  • image_extension (string): Image extension. Currently, FPENet only supports “png” extension.

  • no_occlusion_masking_sets (string): Space separated names of datasets for which occlusion masking is not to be used.

  • root_path (string): Root path to append to image paths.

  • tfrecord_folder_name (string): Folder name for tfrecords inside each dataset.

  • tfrecords_directory_path (string): Path for tfrecords for each dataset.

  • tfrecords_set_id_train (string): Space separated names of dataset to use in training.

  • tfrecords_set_id_val (string): Space separated names of dataset to use in validation.

  • tfrecord_file_name (string): Filename for tfrecord file.

image_info

image_info proto config

Information on input image.

  • channel (int): Number of channels. Options- 1 (grayscale image), 3 (RGB image).

  • height (int): Image height in pixels.

  • width (int): Image width in pixels.

kpiset_info

kpiset_info proto config

Information for KPI evaluation.

  • tfrecords_set_id_kpi (string): Space separated names of datasets.

num_keypoints

int

Number of facial keypoint. Options- 68, 80. 104.

augmentation_info

augmentation proto config

Information on augmentation config.

  • enable_resize_augmentation (boolean): Flag to enable resize augmentation.

  • augmentation_resize_probability (float): Probability for applying image resize augmentation.

  • augmentation_resize_scale (float): Maximum scale to resize image for resize augmentation. Image is upscaled by this scale and then downscaled back to original image size.

  • enable_occlusion_augmentation (boolean): Flag to enable occlusion augmentation.

  • enable_online_augmentation (boolean): Flag to enable augmentation. If False, all augmentations are turned off.

  • gamma_augmentation: Gamma augmentation parameters.

    • gamma_max (float): Maximum value for gamma uniform distribution.

    • gamma_min (float): Minimum value for gamma uniform distribution.

    • gamma_probability (float): Probability that a gamma correction will occur.

    • gamma_type (string): Describes type of random sampling for gamma [‘normal’, ‘uniform’].

  • modulus_spatial_augmentation

    • hflip_probability (float): Probability for horizontal flipping.

    • rotate_rad_max (float): Maximum rotation in radians.

    • translate_max_x (int): Maximum pixel translate in x direction.

    • translate_max_y (int): Maximum pixel translate in y direction.

    • zoom_max (float): Zoom ratio maximum.

    • zoom_min (float): Zoom ratio minimum.

  • patch_probability (float): Probability to add occlusion augmentation.

  • size_to_image_ratio (float): Maximum scale of occlusion.

  • mask_augmentation_patch (boolean): Flag to enable keypoint masking of occlusion patch.

Optimizer Config

Configuration section to provide optimizer related parameters. The optimizer can be conifigured in the under the optimizer section in the config.

Sample optimizer config is shown below.

optimizer:
  __class_name__: AdamOptimizer
  beta1: 0.9
  beta2: 0.999
  epsilon: 1.0e-08
  learning_rate_schedule:
    __class_name__: SoftstartAnnealingLearningRateSchedule
    annealing: 0.5
    base_learning_rate: 0.0005
    last_step: 1000000
    min_learning_rate: 1.0e-07
    soft_start: 0.3

Parameter

Datatype

Description

Default

Supported Values

optimizer

optimizer proto config

This parameter defines which optimizer to use for training, and the parameters to configure it, namely:

  • epsilon (float): Is a very small number to prevent any division by zero in the implementation

  • beta1 (float)

  • beta2 (float)

epsilon - NA beta1 - 0.0 - 1.0 beta2 - 0.0 - 1.0

learning rate

learning rate scheduler proto

This parameter configures the learning rate schedule for the trainer. Currently FPENet only supports softstart annealing learning rate schedule, and maybe configured using the following parameters:

  • soft_start (float): Defines the time to ramp up the learning rate from minimum learning rate to maximum learning rate

  • annealing (float): Defines the time to cool down the learning rate from maximum learning rate to minimum learning rate

  • minimum_learning_rate(float): Minimum learning rate in the learning rate schedule.

  • maximum_learning_rate(float): Maximum learning rate in the learning rate schedule.

soft_start _annealing _schedule

soft_start - 0.0 - 1.0 annealing - 0.0 - 1.0 minimum_learning_rate - 0.0 - 1.0 maximum_learning_rate - 0.0 - 1.0

The soft-start annealing learning rate schedule- the learning rate when plotted as a function of the training progress (0.0, 1.0) results in the following curve.

../../_images/learning_rate1.png

In the above figure, the soft start was set as 0.3 and annealing as 0.7 with minimum learning rate as 5e-6 and a maximum learning rate or base_lr as 5e-4.

Complete Sample Experiment Spec File

__class_name__: FpeNetTrainer
checkpoint_dir: /workspace/tlt-expertiments/fpenet/
checkpoint_n_epoch: 1
dataloader:
  __class_name__: FpeNetDataloader
  augmentation_info:
    augmentation_resize_probability: 0.5
    augmentation_resize_scale: 1.6
    enable_occlusion_augmentation: true
    enable_online_augmentation: true
    enable_resize_augmentation: true
    gamma_augmentation:
      gamma_max: 1.6
      gamma_min: 0.6
      gamma_probability: 0.1
      gamma_type: uniform
    modulus_spatial_augmentation:
      hflip_probability: 0.25
      rotate_rad_max: 0.35
      translate_max_x: 10
      translate_max_y: 10
      zoom_max: 1.2
      zoom_min: 0.8
    patch_probability: 0.5
    size_to_image_ratio: 0.5
    mask_augmentation_patch: true
  batch_size: 64
  dataset_info:
    image_extension: png
    no_occlusion_masking_sets: s578-usercalibration-incar-0 s578-usercalibration-incar-1
    root_path: /workspace/tlt-experiments/
    test_file_name: data.tfrecords
    tfrecord_folder_name: FpeTfRecords
    tfrecords_directory_path: /workspace/tlt-experiments/models/tfrecords
    tfrecords_set_id_train: s578-usercalibration-incar-0
    tfrecords_set_id_val: s578-usercalibration-incar-0
    tfrecord_file_name: data.tfrecords
    use_extra_dataset: false
  image_info:
    image:
      channel: 1
      height: 80
      width: 80
  kpiset_info:
    tfrecords_set_id_kpi: s578-usercalibration-incar-1
  num_keypoints: 80
enable_visualization: true
hooks: null
infrequent_summary_every_n_steps: 0
log_every_n_secs: 10
loss:
  __class_name__: FpeLoss
  kpts_coeff: 0.01
  loss_type: square_euclidean
  mask_occ: true
  weights_dict: null
  elt_loss_info:
    elt_alpha: 0.5
    enable_elt_loss: true
    modulus_spatial_augmentation:
      hflip_probability: 0.0
      rotate_rad_max: 0.35
      translate_max_x: 10
      translate_max_y: 10
      zoom_max: 1.2
      zoom_min: 0.8
model:
  __class_name__: FpeNetBaseModel
  model_parameters:
    beta: 0.01
    dropout_rate: 0.5
    freeze_Convlayer: null
    pretrained_model_path: /workspace/tlt-experiments/pretrained_models/public/model.tlt
    regularizer_type: l2
    regularizer_weight: 1.0e-05
    train_fpe_model: true
    type: FpeNet_public
    use_less_face_layers: false
    use_upsampling_layer: false
  visualization_parameters: null
num_epoch: 20
num_keypoints: 80
optimizer:
  __class_name__: AdamOptimizer
  beta1: 0.9
  beta2: 0.999
  epsilon: 1.0e-08
  learning_rate_schedule:
    __class_name__: SoftstartAnnealingLearningRateSchedule
    annealing: 0.5
    base_learning_rate: 0.0005
    last_step: 1000000
    min_learning_rate: 1.0e-07
    soft_start: 0.3
random_seed: 35
visualize_num_images: 3

Training the model

A utility to train a model with the specified parameters.

Input: Images of (80, 80, 1)

Output: (N, 2) keypoint locations. (N, 1) keypoint confidence. N is the number of keypoints. It can have a value of 68, 80, or 104.

Sample Usage of the Train tool

tlt fpenet train -e <Experiment_Spec_File.yaml> -r <Results Folder> -k <Encode Key>
  • -e: Path to experiment spec file.

  • -r: Results folder directory to save models.

  • -k: Encryption key for model saving/loading.

Evaluating the model

A utility to evaluate a trained model on test data and generate KPI information.

The metric is the region keypoints pixel error. The region keypoint pixel error is the mean euclidean error in pixel location prediction as compared to the ground truth. We bucketize and average the error per face region (eyes, mouth, chin, etc.).

Sample Usage of the Evaluate tool

tlt fpenet evaluate -m <Results Folder> -k <Encode Key>
  • -m: Path to trained model folder.

  • -e: Experiment spec filename (if different from “experiment_spec.yam”).

  • -k: Encryption key for model loading.

Inference of the model

A utility to run inferences in sample images using a trained model. The utility inputs images with ground truth face bounding box information and generates the list of predictions for each image.

[
    {
        "filename": "image1.png",
        "annotations": [
            {
                "face_tight_bboxx": 415.10368330073106,
                "face_tight_bboxy": 243.97163120567382,
                "tool-version": "1.0",
                "face_tight_bboxwidth": 320.35730960707053,
                "face_tight_bboxheight": 329.25550579091134,
                "class": "FaceBbox"
            }
        ],
        "class": "image"
    },
    {
        "filename": "image2.png",
        "annotations": [
            {
                "face_tight_bboxx": 414.44551830055445,
                "face_tight_bboxy": 243.935820979011,
                "tool-version": "1.0",
                "face_tight_bboxwidth": 321.0993074943171,
                "face_tight_bboxheight": 340.87266938197325,
                "class": "FaceBbox"
            }
        ],
        "class": "image"
    }
]

Sample Usage of the Inference tool

tlt fpenet inference -e <Experiment Spec File> -i <Json File With Images> -m <Trained TLT Model Path> -k <Encode Key> -o <Output Folder> -r <Images Root Directory>
  • -e: Path to experiment spec file.

  • -i: Path to json file with inference image paths and face bounding box information.

  • -m: Path to the trained model path to infer images with.

  • -k: Encryption key for model loading.

  • -o: The directory to save the output images and predictions.

  • -r: Parent directory (if any) for the image paths in inference jsons.

Exporting the model

A utility for exporting a trained model to an encrypted onnx format.

Sample Usage of the Export tool

tlt fpenet export -m <Trained TLT Model Path> -k <Encode Key> -o <Output file .etlt>
  • -m: Path to trained model to be exported.

  • -k: Encryption key for model loading.

  • -o: Path to the output .etlt file (.etlt appended to model path otherwise).

  • -t: Target opset value for onnx conversion (default 10).

Deploying to the TLT CV Inference Pipeline

The pretrained model for fiducial points estimation provided through NGC is available by default to use inside the TLT CV Inference Pipeline. You can also deploy a model trained through TLT workflow to the TLT CV Inference Pipeline. Refer to TLT CV Quick Start Scripts section for instructions of both options.