Gesture Recognition

Model Overview

The model described in this card is a classification network that aims to classify hand-crop images into five gesture types:

  • thumbs up

  • fist

  • stop

  • ok

  • two

  • random

GestureNet is generally cascaded with Bodypose for gesture-based applications.


GestureNet use case

Model Architecture

This is a classification model with a Resnet18 backbone.

Training Algorithm

The training algorithm optimizes the network to minimize the categorical cross entropy loss for the classes. This model was trained using the Gesture Recognition training app in TAO Toolkit v3.0.


  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Intended Use

GestureNet is cascaded with a hand detect or a bodypose network. For example, BodyPoseNet detects human body joints, which are used to create hand crops, and GestureNet acts as a classifier that determines the gesture of the hand.


  • RGB Images of 160 X 160 X 3


  • Gesture category labels

The datasheet for the model is captured in its model card hosted on NGC.