Gesture Recognition

NVIDIA TAO Release 30.2202

The model described in this card is a classification network that aims to classify hand-crop images into five gesture types:

  • thumbs up

  • fist

  • stop

  • ok

  • two

  • random

GestureNet is generally cascaded with Bodypose for gesture-based applications.

This is a classification model with a Resnet18 backbone.

The training algorithm optimizes the network to minimize the categorical cross entropy loss for the classes. This model was trained using the Gesture Recognition training app in TAO Toolkit v3.0.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

GestureNet is cascaded with a hand detect or a bodypose network. For example, BodyPoseNet detects human body joints, which are used to create hand crops, and GestureNet acts as a classifier that determines the gesture of the hand.


  • RGB Images of 160 X 160 X 3


  • Gesture category labels

The datasheet for the model is captured in its model card hosted on NGC.

© Copyright 2022, NVIDIA. Last updated on Jun 6, 2022.