NVIDIA Docs Hub Homepage NVIDIA TAO NVIDIA TAO Toolkit v3.0 Gesture Recognition

Gesture Recognition

Model Overview

The model described in this card is a classification network that aims to classify hand-crop images into five gesture types:

thumbs up
fist
stop
ok
two
random

GestureNet is generally cascaded with Bodypose for gesture-based applications.

Model Architecture

This is a classification model with a Resnet18 backbone.

Training Algorithm

The training algorithm optimizes the network to minimize the categorical cross entropy loss for the classes. This model was trained using the Gesture Recognition training app in TLT v3.0.

Reference

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

GestureNet is cascaded with a hand detect or a bodypose network. For example, BodyPoseNet detects human body joints, which are used to create hand crops, and GestureNet acts as a classifier that determines the gesture of the hand.

Input

RGB Images of 160 X 160 X 3

Output

Gesture category labels

The datasheet for the model is captured in its model card hosted on NGC.

Gesture Recognition

Model Overview

Model Architecture

Training Algorithm

Reference

Intended Use