Overview

The Transfer Learning Toolkit (TLT) Computer Vision (CV) Inference Pipeline is a C++ based SDK that provides APIs to build applications from inferences from purpose-built pre-trained AI models. The underlying framework provides a foundation to build multimodal applications. For example, the Gaze Estimation sample application requires the combination of Face Detection and Facial Landmarks (Fiducial Keypoints) Estimation.

The TLT Computer Vision Inference Pipeline is made up of three key components:

The purpose-built AI models that are supported by this Inference Pipeline are as follows:

Model Deployment

Model

Model output format

Prunable

INT8

Compatible with TLT CV Inference Pipeline

BodyPoseNet

Encrypted ONNX

Yes

Yes

Yes

EmotionNet

Encrypted ONNX

No

No

Yes

FaceNet

Encrypted UFF

Yes

Yes

Yes

FPENet

Encrypted ONNX

No

No

Yes

GazeNet

Encrypted ONNX

No

No

Yes

GestureNet

Encrypted ONNX

No

No

Yes

HeartRateNet

Encrypted ONNX

No

No

Yes

Users can retrain supported TLT networks, drop the optimized TensorRT models into the NVIDIA Triton Inference Server, and build their own AI applications and use-cases using the TLT CV API.

The deployment of the Inference Pipeline is managed by the TLT CV Quick Start Scripts, which are bash scripts that pull/start relevant containers, compile TLT models, and start the Triton Server.