NVIDIA TAO Toolkit v30.2108
NVIDIA TAO Release 30.2108

Running and Building Sample Applications

Enter the client container using the TAO Toolkit CV Quick Start Scripts. We have sample applications available to run along with their source code.

Each of the sample applications follows the format:

Copy
Copied!
            

./path/to/binary path/to/config/file

All the binaries should be run with respect to the folder that the container opens to by default (/workspace/tao_cv-pkg).

Below are instructions for runnin the out-of-the-box samples. The config files all assume the video handle /dev/video0 has been added to the Quick Start configuration and will be opened at the resolutions and FPS specified.

The config files will contain the following common fields:

  • video_path: This is the device handle or an absolute path to a video

  • fps: This is the frame per second to open the video_path. Ensure your device can handle this.

  • is_video_path_file: This is a boolean true or false to indicate whether the video_path is a file.

  • resolution_whc: This is the resolution (width, height, channels) to open video_path. Ensure your device can handle this.

  • roi_xywh: This is a region of interest (x, y, width, height) for applications that work with a single person.

  • visualization: This is a boolean true or false to indicate whether to open a window for visualization.

  • use_right_hand: This is a boolean true or false to indicate whether to run inference on the right hand (specific for gesture application).

  • use_decoded_image_api: This is a boolean true or false to indicate whether to use an API that sends single decoded image buffers instead of using a device/video handle within the Pipeline.

Instantiations of TAO Toolkit CV Inference Pipelines enable users to access fused inferences. Below is a table visualizing the support matrix for these inferences and requests. Each Pipeline uses its respective TAO network and its dependencies. Sample Applications demo the usage of these Pipelines.

CV Pipeline Enum

Available Responses

BODY_POSE

FRAME, BODY_POSE

EMOTION

FRAME, FACE_DETECT, FACIAL_LANDMARKS, EMOTION

FACE_DETECT

FRAME, FACE_DETECT

FACIAL_LANDMARKS

FRAME, FACE_DETECT, FACIAL_LANDMARKS

GAZE

FRAME, FACE_DETECT, FACIAL_LANDMARKS, GAZE

GESTURE

FRAME, BODY_POSE, GESTURE

HEART_RATE

FRAME, FACE_DETECT, HEART_RATE

For Pipelines that support multiple responses (e.g. Emotion supports obtaining face detections and landmarks), the ordering of the inferences is preserved per person. That is to say, the first emotion is for the first face in the face detection result.

Copy
Copied!
            

./samples/tao_cv/demo_bodypose/bodypose samples/tao_cv/demo_bodypose/demo.conf

This demo supports multiple people. The visualization will draw joints and lines on body parts as well as a bounding box over each body.

Note

This Body Pose Estimation Demo requires the Body Pose TAO model to be trained and deployed.

Ensure BodyPoseNet and its dependencies are loaded as READY during the Triton Server startup:

Copy
Copied!
            

+----------------------------------+---------+------------------------------------------+ | Model | Version | Status | +----------------------------------+---------+------------------------------------------+ | bodypose_384x288_ensemble_tlt | 1 | READY | | bodypose_384x288_postprocess_tlt | 1 | READY | | bodypose_384x288_tlt | 1 | READY | | ... | ... | ... | | hcgesture_tlt | 1 | READY | +----------------------------------+---------+------------------------------------------+ ... I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002


If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as READY.

Body Pose Configuration

If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as READY. For example, if you created a model for a network input size of width = 320 and height = 224, ensure that bodypose_320x224_* is loaded as READY.

You also need to modify a configuration file inside the container of the samples:

Copy
Copied!
            

CURRENT_WIDTH=384 DESIRED_WIDTH=320 CURRENT_HEIGHT=288 DESIRED_HEIGHT=224 sed -i "s/${CURRENT_WIDTH}/${DESIRED_WIDTH}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json sed -i "s/${CURRENT_HEIGHT}/${DESIRED_HEIGHT}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json


If you would like to perform this manually, these are the following locations to modify:

  • The preprocessor output shape configuration or output_image_meta

  • The resize operation for ResizeNormalizeFP32Image

  • the Triton model_name to request inference, which is the ensemble bodypose_320x224_ensemble_tlt

Note

This Body Pose Estimation Demo cannot consume a BodyPoseNet with fewer trained joints.

By default, BodyPoseNet will estimate 18 joints. Retraining with 14 joints and deploying in this inference pipeline is not supported.

Body Pose API Usage

The TAO Toolkit CV API for Body Pose Estimation returns up to 18 body joints (nose, neck, shoulders, elbows, wrists, knees, ankles, hips, eyes, and ears) in 2D pixel space. Also, part of the returned structure is a bounding box over the joints.

The following code snippet is a glimpse of how a developer would use the API to get a body pose. Assuming the initialized pipeline supports a BODY_POSE response, let’s get the coordinate of the noses of the bodies in frame. First, we must check whether the nose is present in the body by obtaining its index. Then, we can access the coordinates using the index and can draw the coordinate on an image.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::BODY_POSE; nja::TAOCVAPI cvAPI(pipelineType); auto posePayload = cvAPI.getBodyPose(); if (posePayload) { auto &poses = payload->group; for (auto const &pose: poses) { for (auto const &edge: edges) { auto joint = njv::NOSE; // desired joint using joint_t = std::underlying_type<njv::Joint>::type; // Find the index of the body part in the payload vector using `bodyPartPosition`. // If the value is < 0, then that body part is invalid, so we can skip drawing. int32_t indexJoint = pose.bodyPartPosition[static_cast<joint_t>(jointjointFrom)]; if (indexJoint < 0) // check validity of joint { continue; } auto const &bodyPart = pose.bodyParts[indexJoint]; auto pt = Point(bodyPartFrom.x, bodyPartFrom.y); drawPoint(image, pt); } } }

The call to getBodyPose() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Copy
Copied!
            

./samples/tao_cv/demo_emotion/emotion samples/tao_cv/demo_emotion/demo.conf

This demo will support multiple people. The visualization will draw a bounding box around each face and text indicating the classified emotion.

Emotion API Usage

The TAO Toolkit CV API for Emotion classificaiton returns one of 7 emotions:

  • neutral

  • happy

  • surprise

  • squint

  • disgust

  • scream

  • not applicable

The following code snippet is a glimpse of how a developer would use the API to get emotions. Assuming the initialized pipeline supports a EMOTION response, let’s access the return emotions and check if a person is happy.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::EMOTION; nja::TAOCVAPI cvAPI(pipelineType); auto emotionPayload = cvAPI.getEmotion(); if (emotionPayload) { for (const auto& emotionElem: emotionPayload->group) { if (emotionElem.emotion == njv::HAPPY) { std::cout << "Found a Happy person!" << std::endl; } } }

The call to getEmotion() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Copy
Copied!
            

./samples/tao_cv/demo_facedetect/facedetect samples/tao_cv/demo_facedetect/demo.conf

This demo will support multiple people. The visualization will draw a bounding box around each face.

Face Detection API Usage

The TAO Toolkit CV API for Face Detection returns a bounding box of X, Y, W, H in 2D pixel space of the original image size.

The following code snippet is a glimpse of how a developer would use the API to get face detections. Assuming the initialized pipeline supports a FACE_DETECT response, let’s print the coordinates of the bounding boxes.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::FACE_DETECT; nja::TAOCVAPI cvAPI(pipelineType); auto facePayload = cvAPI.getFaceDetect(); if (facePayload) { for (const auto& elem: facePayload->group) { const auto& box = elem.box; // Box coordinates provided in original image space std::cout << "x = " << box.x << std::endl; std::cout << "y = " << box.y << std::endl; std::cout << "w = " << box.w << std::endl; std::cout << "h = " << box.h << std::endl; } }

The call to getFaceDetect() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Copy
Copied!
            

./samples/tao_cv/demo_faciallandmarks/faciallandmarks samples/tao_cv/demo_faciallandmarks/demo.conf

This demo will support multiple people. The visualization will draw landmarks for each face.

Facial Landmarks API Usage

The TAO Toolkit CV API for Facial Landmarks returns 80 landmarks in 2D pixel space.

The following code snippet is a glimpse of how a developer would use the API to get the landmarks. Assuming the initialized pipeline supports a FACIAL_LANDMARKS response, let’s print the coordinates of the landmarks.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::FACIAL_LANDMARKS; nja::TAOCVAPI cvAPI(pipelineType); auto landmarksPayload = cvAPI.getFacialLandmarks(); if (landmarksPayload) { for (const auto& elem: landmarksPayload->group) { const auto& landmarks = elem.landmarks; for (size_t landmarkIndex = 0; landmarkIndex < landmarks.size(); landmarkIndex++) { std::cout << "index = " << landmarksIndex << "; x = " << x << "; y = " << y << std::endl; } } }

The call to getFacialLandmarks() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Copy
Copied!
            

./samples/tao_cv/demo_gaze/gaze samples/tao_cv/demo_gaze/demo.conf

This demo will support multiple people. The visualization will print out the 3D gaze vector for each face.

The sample application has custom fields for the Perspective-n-Point (PNP) problem. The camera calibration matrix, distortion coefficients, and 3D Facial Landmarks are all provided for the visualization of the gaze vector.

The camera-dependent default values are generated from an off-the-shelf webcam, and for more accurate visualization, users can use their own.

Gaze Estimation API Usage

The TAO Toolkit CV API for Gaze returns a 3D vector in the camera coordinate system. This X, Y, Z location is where the person is looking relative to the camera. The units are millimeters.

Also in the Gaze payload is theta and phi outputs, which are independent from the 3D vector. This output is a more general representation of a free-standing gaze vector in 3D space. When applied to an origin (not provided), say the pupil center or center of both eyes, it will represent the general gaze direction which can be extended to any arbitrary point in front of the user. This vector can then optionally be used to determine if it is intersecting with an object in space to determine another point of regard.

For improved accuracy, we suggest to use the point of regard x, y, z coordinates instead of theta and phi outputs.

The following code snippet is a glimpse of how a developer would use the API to get gaze. Assuming the initialized pipeline supports a GAZE response, let’s print the coordinates of the landmarks.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::GAZE; nja::TAOCVAPI cvAPI(pipelineType); auto gazePayload = cvAPI.getGaze(); if (gazePayload) { for (const auto& gazeElem: gazePayload->group) { const auto& gaze = gazeElem.gaze; std::cout << "x = " << gaze.x << std::endl; std::cout << "y = " << gaze.y << std::endl; std::cout << "z = " << gaze.z << std::endl; std::cout << "theta = " << gaze.theta << std::endl; std::cout << "phi = " << gaze.phi << std::endl; } }

The call to getGaze() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Copy
Copied!
            

./samples/tao_cv/demo_gesture/gesture samples/tao_cv/demo_gesture/demo.conf

This demo will support only a single person specified in a ROI. It will only print the classified gesture for a single hand as specified in the config.

Ensure BodyPoseNet and its dependencies are loaded as READY during the Triton Server startup:

Copy
Copied!
            

+----------------------------------+---------+------------------------------------------+ | Model | Version | Status | +----------------------------------+---------+------------------------------------------+ | bodypose_384x288_ensemble_tlt | 1 | READY | | bodypose_384x288_postprocess_tlt | 1 | READY | | bodypose_384x288_tlt | 1 | READY | | ... | ... | ... | | hcgesture_tlt | 1 | READY | +----------------------------------+---------+------------------------------------------+ ... I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002


Gesture Classification API Usage

The TAO Toolkit CV API for Gaze returns one of 5 gestures and a bounding box for that gesture.

  • Thumbs up

  • Fist

  • Stop

  • Okay

  • Two (also known as Raise)

  • Random

Only a single gesture for a single hand is returned.

The following code snippet is a glimpse of how a developer would use the API to get the gesture. Assuming the initialized pipeline supports a GESTURE response, let’s check the gesture.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::GESTURE; nja::TAOCVAPI cvAPI(pipelineType); auto gesturePayload = cvAPI.getGesture(); if (gesturePayload) { // For sake of example, assuming only 1 person in frame here. const auto &gestElem = gesturePayload->group[0]; // Ensure validty of gesture element. The struct GesturePayload has a // pre-allocated array of MAX_NUM_USER_SUPPORT gestures. However, // since we restrict the number of users using a region of interest, only the // first element has the possibility of being valid. if (gestElem.valid) { // Draw bounding box of relevant gesture const auto &box = gestElem.bbox; if (gestElem.gesture == njv::GestureType::THUMBS_UP) { std::cout << "Thumbs up" << std::endl; } } }

The call to getGesture() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Copy
Copied!
            

./samples/tao_cv/demo_heartrate/heartrate samples/tao_cv/demo_heartrate/demo.conf

This demo will support only a single person specified in a ROI. It will only print the estimated heart rate for that person and a bounding box around the face. The person of interest should be well illuminated and keep still, facing the camera directly.

This demo is unique in that it only supports FPS between 15 - 30 and needs your camera handle to support uncompressed YUYV.

Note

Make note of the resolutions and FPS support for your video handle (eg. using the command v4l2-ctl --list-formats-ext).

Heart Rate Estimation API Usage

The TAO Toolkit CV API for Heart Rate returns a structure with the beats per minutes with extra booleans and flags to ensure validity. It also returns a bounding box for that person’s face.

Only a single person’s heart rate is returned.

The following code snippet is a glimpse of how a developer would use the API to get the gesture. Assuming the initialized pipeline supports a HEART_RATE response, let’s check the heart rate.

Copy
Copied!
            

const auto pipelineType = nja::PipelineType::HEART_RATE; nja::TAOCVAPI cvAPI(pipelineType); auto payload = cvAPI.getHeartRate(); if (payload) { // For sake of example, assuming only 1 person in frame here. const auto& firstPerson = payload->group[0]; // Ensure validty of heart rate element. if (firstPerson.valid) { // Heart Rate is fragile to poor lighting, so a USB camera will // process the frames to increase exposure, contrast, etc. // autoatically. We check if the FPS is valid within certain range. if (!firstPerson.isFPSValid) { std::cerr << "Poor Lighting!" << std::endl; } else { // Heart Rate is fragile to motion. We use this boolean to // determine if the person is available to be estimated. if (firstPerson.available) { std::cout << "Heart Rate = " << std::to_string(firstPerson.heartRate) << std::endl; } } } }

The call to getHeartRate() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Source code and CMakeLists have been provided for users to recompile sources. A recommendation for users who want to modify sources would be to leverage the Quick Start Script config.sh field called volume_mnt_samples.

First, ensure the client container is running. Then in another terminal, copy the samples to the host machine:

Copy
Copied!
            

docker cp image_tao_cv_client:/workspace/tao_cv-pkg/samples /path/on/host

Then, stop the client container and modify the config.sh field called volume_mnt_samples to point to /path/on/host. The next time you start the client container, we will use the host samples folder and volume mount it inside the client container. This will allow the user to modify the sample code outside of the container while maintaining a devloper workflow inside the container.

Now, to recompile the sources, the user must be inside in the client container.

  1. Make a directory to save new binaries:

    Copy
    Copied!
                

    mkdir -p /workspace/tao_cv-pkg/samples/tao_cv_install

  2. Enter the location of our source files:

    Copy
    Copied!
                

    cd samples/tao_cv/

  3. Make and enter a build directory for CMake:

    Copy
    Copied!
                

    mkdir -p build && cd build

  4. Build and install the new binaries:

    Copy
    Copied!
                

    cmake -DPROJECT_ROOT:STRING=/workspace/tao_cv-pkg/ -DINSTALL_DIR=/workspace/tao_cv-pkg/samples/tao_cv_install/ .. make install

    Note

    If using CMake on a Jetson device, add an additional flag -DTARGET=1 which will result in cmake -DTARGET=1 ...

  5. Verify binaries exist in the folder:

    Copy
    Copied!
                

    ls -al /workspace/tao_cv-pkg/samples/tao_cv_install/tao_cv/

  6. These binaries can be run just as the precompiled binaries we provide, but still must be run with respect to the folder /workspace/tao_cv-pkg

    Copy
    Copied!
                

    cd /workspace/tao_cv-pkg ./samples/tao_cv_install/tao_cv/demo_facedetect/facedetect samples/tao_cv_install/tao_cv/demo_facedetect/demo.conf

© Copyright 2020, NVIDIA. Last updated on Aug 24, 2021.