NVIDIA Maxine Augmented Reality (AR) SDK User Guide#

NVIDIA® Augmented Reality SDK (AR SDK) enables real-time modeling and tracking of human faces from video. The SDK is powered by NVIDIA graphics processing units (GPUs) with Tensor Cores. As a result, the algorithm throughput is greatly accelerated, and latency is reduced.

The AR SDK has the following features:

Face detection and tracking detects, localizes, and tracks human faces in images or videos by using bounding boxes.
Facial landmark detection and tracking predicts and tracks the pixel locations of human facial landmark points and head poses in images or videos. It can predict 68 and 126 landmark points. The 68 detected facial landmarks follow the Multi-PIE 68 point mark-up information in Facial point annotations. The 126 facial landmark points detector can predict more points on the cheeks, the eyes, and on laugh lines.
Face 3D mesh and tracking reconstructs and tracks a 3D human face, and its head pose, from the provided facial landmarks.
3D Body Pose tracking predicts and tracks the 3D human pose from images or videos. It predicts 34 keypoints of body pose in 2D and 3D with Joint Angles and supports multi-person tracking. It supports both full body and upper body images or videos. See Appendix B for the keypoints.
Eye Contact estimates the gaze angles of a person in an image or video and redirects the gaze to make it frontal. It can operate in two modes. One mode where head pose and gaze angles are estimated in camera coordinates without any redirection and another where, in addition to estimation, the eyes of the person are redirected to make eye contact with the camera within a permissible range.
Facial Expression Estimation estimates face expression coefficients from the provided facial landmarks.
Video Live Portrait animates a person’s portrait photo using a driving video by matching the head movement and facial expressions in it.
Speech Live Portrait animates a person’s portrait photo using an audio input by animating the lip motion to match that of the audio.
LipSync animates a person’s video using an audio input by animating the lip motion to match that of the audio.

The AR SDK can be used in a wide variety of applications, such as augmented reality, beautification, 3D face animation, modeling, and so on. The SDK provides sample applications that demonstrate the features listed above in real time by using a webcam or offline videos.