You can deploy the trained deep learning and computer-vision models on edge devices, such as a Jetson Xavier, Jetson Nano, Tesla, or in the cloud with NVIDIA GPUs. The exported \*.etlt model can be used in the TAO Toolkit Triton Apps.

The TAO Toolkit Triton Apps provide an inference sample for Pose Classification. It consumes a TensorRT engine and supports running with either (1) a NumPy array of skeleton series or (2) output JSON metadata from the deepstream-bodypose-3d app.

To use this sample, you need to generate the TensorRT engine from an \*.etlt model using tao-converter .

Generating TensorRT Engine Using tao-converter

The tao-converter tool is provided with the TAO Toolkit to facilitate the deployment of TAO trained models on TensorRT and/or Deepstream. This section elaborates on how to generate a TensorRT engine using tao-converter .

For deployment platforms with an x86-based CPU and discrete GPUs, the tao-converter is distributed within the TAO docker. Therefore, we suggest using the docker to generate the engine. However, this requires that the user adhere to the same minor version of TensorRT as distributed with the docker. The TAO docker includes TensorRT version 8.0.

Instructions for x86

For an x86 platform with discrete GPUs, the default TAO package includes the tao-converter built for TensorRT 8.2.5.1 with CUDA 11.4 and CUDNN 8.2. However, for any other version of CUDA and TensorRT, please refer to the overview section for download. Once the tao-converter is downloaded, follow the instructions below to generate a TensorRT engine.

Unzip the zip file on the target machine. Install the OpenSSL package using the command:

sudo apt-get install libssl-dev

Export the following environment variables:

$ export TRT_LIB_PATH="/usr/lib/x86_64-linux-gnu"
$ export TRT_INC_PATH="/usr/include/x86_64-linux-gnu"

Run the tao-converter using the sample command below and generate the engine. Instructions to build TensorRT OSS on Jetson can be found in the TensorRT OSS on x86 section above or in this GitHub repo.

Note Make sure to follow the output node names as mentioned in the Exporting the Model section of the respective model.





Instructions for Jetson

For the Jetson platform, the tao-converter is available to download in the NVIDIA developer zone. You may choose the version you wish to download as listed in the overview section. Once the tao-converter is downloaded, please follow the instructions below to generate a TensorRT engine.

Unzip the zip file on the target machine. Install the OpenSSL package using the command:

sudo apt-get install libssl-dev

Export the following environment variables:

$ export TRT_LIB_PATH="/usr/lib/aarch64-linux-gnu"
$ export TRT_INC_PATH="/usr/include/aarch64-linux-gnu"

For Jetson devices, TensorRT comes pre-installed with Jetpack. If you are using older JetPack, upgrade to JetPack-5.0DP. Instructions to build TensorRT OSS on Jetson can be found in the TensorRT OSS on Jetson (ARM64) section above or in this GitHub repo. Run the tao-converter using the sample command below and generate the engine.

Note Make sure to follow the output node names as mentioned in Exporting the Model section of the respective model.





Using the tao-converter

Here is a sample command to generate the PoseClassificationNet engine through tao-converter :

#convert ST-GCN model with 3D poses, input sequence length of 300, and 34 keypoints:
tao-converter <etlt_model> \
  -k <key_to_etlt_model> \
  -d 3,300,34,1 \
  -p input,1x3x300x34x1,4x3x300x34x1,16x3x300x34x1 \
  -o fc_pred \
  -t fp16 \
  -m 16 \
  -e <path_to_generated_trt_engine>

This command will generate an optimized TensorRT engine.

Running the Triton Inference Sample

You can generate the TensorRT engine when starting the Triton server using the following command:

bash scripts/start_server.sh

When the server is running, you can get results from a NumPy array of test data with the client using the command mentioned below:

python tao_client.py <path_to_test_data> \
  -m pose_classification_tao \
  -x 1 \
  -b 1 \
  --mode Pose_classification \
  -i https \
  -u localhost:8000 \
  --async \
  --output_path <path_to_output_directory>

Note The server will perform inference on the input test data. The results are saved as a text file where each line is formatted as [sequence_index], [rank1_pred_score]([rank1_class_index])=[rank1_class_name], [rank2_pred_score]([rank2_class_index])=[rank2_class_name], ..., [rankN_pred_score]([rankN_class_index])=[rankN_class_name] . The expected output for the NVIDIA test data would be as follows:

0, 27.6388(2)=sitting, 12.0806(3)=standing, 7.0409(1)=getting_up, -3.4164(0)=sitting_down, -16.4449(4)=walking, -26.9046(5)=jumping
1, 21.5809(2)=sitting, 8.4994(3)=standing, 5.1917(1)=getting_up, -2.3813(0)=sitting_down, -12.4322(4)=walking, -20.4436(5)=jumping
2, 5.6206(0)=sitting_down, 4.7264(4)=walking, -1.0996(5)=jumping, -2.3501(1)=getting_up, -3.2933(3)=standing, -3.5337(2)=sitting
....

You can also get inference results from the JSON output of the deepstream-bodypose-3d app using the following command:

python tao_client.py <path_to_json_file> \
  --dataset_convert_config ../dataset_convert_specs/dataset_convert_config_pose_classification.yaml \
  -m pose_classification_tao \
  -x 1 \
  -b 1 \
  --mode Pose_classification \
  -i https \
  -u localhost:8000 \
  --async \
  --output_path <path_to_output_directory>

Note The server will perform inference on the input JSON file. The results are also saved as a JSON file, which follows the same format as the input and adds the predicted "action" to each object at each frame. A sample of the JSON output would be as follows: The skeleton sequence of each object is broken into segments by a dataset converter (refer to the figure below). The sequence_length and sequence_overlap are configurable in dataset_convert_config_pose_classification.yaml . The output labels are assigned to frames after a certain period of time.







End-to-End Inference Using Triton

A sample for end-to-end inference from video is also provided in the TAO Toolkit Triton Apps. The sample runs deepstream-bodypose-3d to generate metadata of bounding boxes, tracked IDs, and 2D/3D poses that are saved in JSON format. The client implicitly converts the metadata into arrays of skeleton sequences and sends them to the Triton server. The predicted action for each sequence is returned and appended to the JSON metadata at corresponding frames. A video with overlaid metadata is also generated for visualization.

You can start the Triton server using the following command (only the Pose Classification model will be downloaded and converted into a TensorRT engine):

bash scripts/pose_cls_e2e_inference/start_server.sh

Once the Triton server has started, open up another terminal and run the following command to begin body pose estimation using DeepStream and run Pose Classification on the DeepStream output using the Triton server instance that you previously spun up: