Overview#

NVIDIA Eye Contact NIM leverages state-of-the-art AI models to dynamically redirect a user’s eye position toward the camera to simulate natural eye contact and enhance remote digital engagement. NVIDIA Eye Contact NIM models are built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.

Architecture#

NVIDIA Eye Contact operates on a region of interest around the eyes, also known as the eye patch. The eye patch is extracted from a video frame using the NVIDIA face tracking pipeline, which computes the 2D facial landmarks and the 6DOF head pose from the video frame. This head pose is then fed into the eye contact network.

The eye contact network has a disentangled encoder-decoder architecture. The encoder estimates the gaze angle from the input eye patch along with a set of features, also known as embeddings. Based on these embeddings, the decoder performs redirection of the gaze in the input patch to make the face look forward.

The final stage of the pipeline involves blending the eye patch back into the original video frame using an inverse transformation. For more details about the model, read Improve Human Connection in Video Conferences with NVIDIA Maxine Eye Contact in the NVIDIA Developer Technical Blog.

Input Processing Modes#

NVIDIA Eye Contact NIM supports two distinct input processing modes:

  • Transactional Mode: In transactional mode, the entire input video file is streamed and processed as a complete unit by the NIM before returning results to the client.

  • Streaming Mode: A specialized processing mode for streamable MP4 files where the input files are streamed to the NIM in chunks. As the chunks arrive, NIM runs inference incrementally and streams the output back to client in chunks, even before the whole input file is received by the NIM. Preferred for streamable mp4 inputs and large files.

For detailed information about when to use each mode and how to convert between file formats, refer to the Input Modes section of Basic Inference.

Try It Out#

Try the NVIDIA Eye Contact NIM at build.nvidia.com/nvidia/eyecontact.

To experience the NVIDIA Eye Contact NIM API without having to host your own servers, use the Try API feature, which uses the NVIDIA Cloud Function backend.