The model described in this card is a classification network that aims to classify hand-crop images into five gesture types:
GestureNet is generally cascaded with Bodypose for gesture-based applications.
This is a classification model with a Resnet18 backbone.
The training algorithm optimizes the network to minimize the categorical cross entropy loss for the classes. This model was trained using the Gesture Recognition training app in TLT v3.0.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
GestureNet is cascaded with a hand detect or a bodypose network. For example, BodyPoseNet detects human body joints, which are used to create hand crops, and GestureNet acts as a classifier that determines the gesture of the hand.
RGB Images of 160 X 160 X 3
Gesture category labels
The datasheet for the model is captured in its model card hosted on NGC.