Running and Building Sample Applications

Enter the client container using the TLT CV Quick Start Scripts. We have sample applications available to run along with their source code.

Each of the sample applications follows the format:

./path/to/binary path/to/config/file

All the binaries should be run with respect to the folder that the container opens to by default (/workspace/tlt_cv-pkg).

Below are instructions for runnin the out-of-the-box samples. The config files all assume the video handle /dev/video0 has been added to the Quick Start configuration and will be opened at the resolutions and FPS specified.

The config files will contain the following common fields:

  • video_path: This is the device handle or an absolute path to a video

  • fps: This is the frame per second to open the video_path. Ensure your device can handle this.

  • is_video_path_file: This is a boolean true or false to indicate whether the video_path is a file.

  • resolution_whc: This is the resolution (width, height, channels) to open video_path. Ensure your device can handle this.

  • roi_xywh: This is a region of interest (x, y, width, height) for applications that work with a single person.

  • visualization: This is a boolean true or false to indicate whether to open a window for visualization.

  • use_right_hand: This is a boolean true or false to indicate whether to run inference on the right hand (specific for gesture application).

  • use_decoded_image_api: This is a boolean true or false to indicate whether to use an API that sends single decoded image buffers instead of using a device/video handle within the Pipeline.

TLT CV Inference Pipelines

Instantiations of TLT CV Inference Pipelines enable users to access fused inferences. Below is a table visualizing the support matrix for these inferences and requests. Each Pipeline uses its respective TLT network and its dependencies. Sample Applications demo the usage of these Pipelines.

CV Pipeline Enum

Available Responses

EMOTION

FRAME, FACE_DETECT, FACIAL_LANDMARKS, EMOTION

FACE_DETECT

FRAME, FACE_DETECT

FACIAL_LANDMARKS

FRAME, FACE_DETECT, FACIAL_LANDMARKS

GAZE

FRAME, FACE_DETECT, FACIAL_LANDMARKS, GAZE

GESTURE

FRAME, GESTURE

HEART_RATE

FRAME, FACE_DETECT, HEART_RATE

For Pipelines that support multiple responses (e.g. Emotion supports obtaining face detections and landmarks), the ordering of the inferences is preserved per person. That is to say, the first emotion is for the first face in the face detection result.

Performance

The performance of the TLT CV Inference Pipeline can be impacted by several factors.

  • The sample applications accept a device handle or an encoded video. Common webcams perform live postprocessing of the video stream for contrast, brightness, and more. This can impact the FPS at which a device can serve frames.

  • Enabling visualization in sample applications will affect GPU resources.

  • For Pipelines supporting outputs for each person, as the number of people in frame increases, the Pipeline will take longer.

  • Increasing resolution will increasing preprocessing time for the Pipelines.

Also note that all of the avaiable Pipelines are multi-threaded applications, so decreasing the FPS of the application will result in lightening the load of the underlying scheduler.

Below is a table for the maximum Frames per Second (FPS) the supported pipelines with a single person in the frame or the applicable Region of Interest. Camera decoding is considered in these numbers. Visualization is disabled.

While configuring the sample applications, please ensure the video handle can support the desired FPS at the desired resolution.

Max FPS for each Pipeline with 1 person of interest

CV Pipeline Enum

Resolution

Jetson AGX Xavier *

Jetson Xavier NX *

EMOTION

720p

43

24

FACE_DETECT

720p

52

30

FACIAL_LANDMARKS

720p

45

28

GAZE

720p

35

22

GESTURE

720p

30

21

HEART_RATE

480p**

30

25

Note

*Jetson devices run on Power Mode 0 with jetson_clocks enabled.

Note

**Heart Rate is limited to 30 FPS and requires special device handle specifications for uncompressed data which is usually provided at 480p. Please refer to Running the Heart Rate Estimation Sample for more details.

Running the Emotion Classification Sample

./samples/tlt_cv/demo_emotion/emotion samples/tlt_cv/demo_emotion/demo.conf

This demo will support multiple people. The visualization will draw a bounding box around each face and text indicating the classified emotion.

Emotion API Usage

The TLT CV API for Emotion classificaiton returns one of 7 emotions:

  • neutral

  • happy

  • surprise

  • squint

  • disgust

  • scream

  • not applicable

The following code snippet is a glimpse of how a developer would use the API to get emotions. Assuming the initialized pipeline supports a EMOTION response, let’s access the return emotions and check if a person is happy.

const auto pipelineType = nja::PipelineType::EMOTION;
nja::TLTCVAPI cvAPI(pipelineType);
auto emotionPayload = cvAPI.getEmotion();
if (emotionPayload)
{
    for (const auto& emotionElem: emotionPayload->group)
    {
        if (emotionElem.emotion == njv::HAPPY)
        {
            std::cout << "Found a Happy person!" << std::endl;
        }
    }
}

The call to getEmotion() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Face Detection Sample

./samples/tlt_cv/demo_facedetect/facedetect samples/tlt_cv/demo_facedetect/demo.conf

This demo will support multiple people. The visualization will draw a bounding box around each face.

Face Detection API Usage

The TLT CV API for Face Detection returns a bounding box of X, Y, W, H in 2D pixel space of the original image size.

The following code snippet is a glimpse of how a developer would use the API to get face detections. Assuming the initialized pipeline supports a FACE_DETECT response, let’s print the coordinates of the bounding boxes.

const auto pipelineType = nja::PipelineType::FACE_DETECT;
nja::TLTCVAPI cvAPI(pipelineType);
auto facePayload = cvAPI.getFaceDetect();
if (facePayload)
{
    for (const auto& elem: facePayload->group)
    {
        const auto& box = elem.box;
        // Box coordinates provided in original image space
        std::cout << "x = " << box.x << std::endl;
        std::cout << "y = " << box.y << std::endl;
        std::cout << "w = " << box.w << std::endl;
        std::cout << "h = " << box.h << std::endl;
    }
}

The call to getFaceDetect() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Facial Landmarks Estimation Sample

./samples/tlt_cv/demo_faciallandmarks/faciallandmarks samples/tlt_cv/demo_faciallandmarks/demo.conf

This demo will support multiple people. The visualization will draw landmarks for each face.

Facial Landmarks API Usage

The TLT CV API for Facial Landmarks returns 80 landmarks in 2D pixel space.

The following code snippet is a glimpse of how a developer would use the API to get the landmarks. Assuming the initialized pipeline supports a FACIAL_LANDMARKS response, let’s print the coordinates of the landmarks.

const auto pipelineType = nja::PipelineType::FACIAL_LANDMARKS;
nja::TLTCVAPI cvAPI(pipelineType);
auto landmarksPayload = cvAPI.getFacialLandmarks();
if (landmarksPayload)
{
    for (const auto& elem: landmarksPayload->group)
    {
        const auto& landmarks = elem.landmarks;
        for (size_t landmarkIndex = 0; landmarkIndex < landmarks.size(); landmarkIndex++)
        {
            std::cout << "index = " << landmarksIndex << "; x = " << x << "; y = " << y << std::endl;
        }
    }
}

The call to getFacialLandmarks() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Gaze Estimation Sample

./samples/tlt_cv/demo_gaze/gaze samples/tlt_cv/demo_gaze/demo.conf

This demo will support multiple people. The visualization will print out the 3D gaze vector for each face.

The sample application has custom fields for the Perspective-n-Point (PNP) problem. The camera calibration matrix, distortion coefficients, and 3D Facial Landmarks are all provided for the visualization of the gaze vector.

The camera-dependent default values are generated from an off-the-shelf webcam, and for more accurate visualization, users can use their own.

Gaze Estimation API Usage

The TLT CV API for Gaze returns a 3D vector in the camera coordinate system. This X, Y, Z location is where the person is looking relative to the camera. The units are millimeters.

Also in the Gaze payload is theta and phi outputs, which are independent from the 3D vector. This output is a more general representation of a free-standing gaze vector in 3D space. When applied to an origin (not provided), say the pupil center or center of both eyes, it will represent the general gaze direction which can be extended to any arbitrary point in front of the user. This vector can then optionally be used to determine if it is intersecting with an object in space to determine another point of regard.

For improved accuracy, we suggest to use the point of regard x, y, z coordinates instead of theta and phi outputs.

The following code snippet is a glimpse of how a developer would use the API to get gaze. Assuming the initialized pipeline supports a GAZE response, let’s print the coordinates of the landmarks.

const auto pipelineType = nja::PipelineType::GAZE;
nja::TLTCVAPI cvAPI(pipelineType);
auto gazePayload = cvAPI.getGaze();
if (gazePayload)
{
    for (const auto& gazeElem: gazePayload->group)
    {
        const auto& gaze = gazeElem.gaze;
        std::cout << "x = " << gaze.x << std::endl;
        std::cout << "y = " << gaze.y << std::endl;
        std::cout << "z = " << gaze.z << std::endl;
        std::cout << "theta = " << gaze.theta << std::endl;
        std::cout << "phi = " << gaze.phi << std::endl;
    }
}

The call to getGaze() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Gesture Classification Sample

./samples/tlt_cv/demo_gesture/gesture samples/tlt_cv/demo_gesture/demo.conf

This demo will support only a single person specified in a ROI. It will only print the classified gesture for a single hand as specified in the config.

Gesture Classification API Usage

The TLT CV API for Gaze returns one of 5 gestures and a bounding box for that gesture.

  • Thumbs up

  • Fist

  • Stop

  • Okay

  • Two (also known as Raise)

  • Random

Only a single gesture for a single hand is returned.

The following code snippet is a glimpse of how a developer would use the API to get the gesture. Assuming the initialized pipeline supports a GESTURE response, let’s check the gesture.

const auto pipelineType = nja::PipelineType::GESTURE;
nja::TLTCVAPI cvAPI(pipelineType);
auto gesturePayload = cvAPI.getGesture();
if (gesturePayload)
{
    // For sake of example, assuming only 1 person in frame here.
    const auto &gestElem = gesturePayload->group[0];

    // Ensure validty of gesture element. The struct GesturePayload has a
    // pre-allocated array of MAX_NUM_USER_SUPPORT gestures. However,
    // since we restrict the number of users using a region of interest, only the
    // first element has the possibility of being valid.
    if (gestElem.valid)
    {
        // Draw bounding box of relevant gesture
        const auto &box = gestElem.bbox;

        if (gestElem.gesture == njv::GestureType::THUMBS_UP)
        {
            std::cout << "Thumbs up" << std::endl;
        }
    }
}

The call to getGesture() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Heart Rate Estimation Sample

./samples/tlt_cv/demo_heartrate/heartrate samples/tlt_cv/demo_heartrate/demo.conf

This demo will support only a single person specified in a ROI. It will only print the estimated heart rate for that person and a bounding box around the face. The person of interest should be well illuminated and keep still, facing the camera directly.

This demo is unique in that it only supports FPS between 15 - 30 and needs your camera handle to support uncompressed YUYV.

Note

Make note of the resolutions and FPS support for your video handle (eg. using the command v4l2-ctl --list-formats-ext).

Heart Rate Estimation API Usage

The TLT CV API for Heart Rate returns a structure with the beats per minutes with extra booleans and flags to ensure validity. It also returns a bounding box for that person’s face.

Only a single person’s heart rate is returned.

The following code snippet is a glimpse of how a developer would use the API to get the gesture. Assuming the initialized pipeline supports a HEART_RATE response, let’s check the heart rate.

const auto pipelineType = nja::PipelineType::HEART_RATE;
nja::TLTCVAPI cvAPI(pipelineType);
auto payload = cvAPI.getHeartRate();
if (payload)
{
    // For sake of example, assuming only 1 person in frame here.
    const auto& firstPerson = payload->group[0];

    // Ensure validty of heart rate element.
    if (firstPerson.valid)
    {
        // Heart Rate is fragile to poor lighting, so a USB camera will
        // process the frames to increase exposure, contrast, etc.
        // autoatically. We check if the FPS is valid within certain range.
        if (!firstPerson.isFPSValid)
        {
            std::cerr << "Poor Lighting!" << std::endl;
        }
        else
        {
            // Heart Rate is fragile to motion. We use this boolean to
            // determine if the person is available to be estimated.
            if (firstPerson.available)
            {
                std::cout << "Heart Rate = " << std::to_string(firstPerson.heartRate) << std::endl;
            }
        }
    }
}

The call to getHeartRate() will return the latest new return from the Pipeline or nullptr if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Building the Sample Applications

Source code and CMakeLists have been provided for users to recompile sources. A recommendation for users who want to modify sources would be to leverage the Quick Start Script config.sh field called volume_mnt_samples.

First, ensure the client container is running. Then in another terminal, copy the samples to the host machine:

docker cp image_tlt_cv_client:/workspace/tlt_cv-pkg/samples /path/on/host

Then, stop the client container and modify the config.sh field called volume_mnt_samples to point to /path/on/host. The next time you start the client container, we will use the host samples folder and volume mount it inside the client container. This will allow the user to modify the sample code outside of the container while maintaining a devloper workflow inside the container.

Now, to recompile the sources, the user must be inside in the client container.

  1. Make a directory to save new binaries:

    mkdir -p /workspace/tlt_cv-pkg/samples/tlt_cv_install
    
  2. Enter the location of our source files:

    cd samples/tlt_cv/
    
  3. Make and enter a build directory for CMake:

    mkdir -p build && cd build
    
  4. Build and install the new binaries:

    cmake -DPROJECT_ROOT:STRING=/workspace/tlt_cv-pkg/ -DINSTALL_DIR=/workspace/tlt_cv-pkg/samples/tlt_cv_install/ ..
    make install
    

    Note

    If using CMake on a Jetson device, add an additional flag -DTARGET=1 which will result in cmake -DTARGET=1 ...

  5. Verify binaries exist in the folder:

    ls -al /workspace/tlt_cv-pkg/samples/tlt_cv_install/tlt_cv/
    
  6. These binaries can be run just as the precompiled binaries we provide, but still must be run with respect to the folder /workspace/tlt_cv-pkg

     cd /workspace/tlt_cv-pkg
    ./samples/tlt_cv_install/tlt_cv/demo_facedetect/facedetect samples/tlt_cv_install/tlt_cv/demo_facedetect/demo.conf