Running and Building Sample Applications
========================================

.. _tlt_cv_inference_pipeline_sample_applications:

Enter the client container using the :ref:`TLT CV Quick Start Scripts<tlt_cv_quick_start_scripts>`.
We have sample applications
available to run along with their source code.

Each of the sample applications follows the format:

.. code-block:: bash

    ./path/to/binary path/to/config/file

All the binaries should be run with respect to the folder that the container
opens to by default (:code:`/workspace/tlt_cv-pkg`).

Below are instructions for runnin the out-of-the-box samples. The config files
all assume the video handle :code:`/dev/video0` has been added to the Quick Start
configuration and will be opened at the resolutions and FPS specified.

The config files will contain the following common fields:

* :code:`video_path`: This is the device handle or an absolute path to a video
* :code:`fps`: This is the frame per second to open the :code:`video_path`. Ensure your device can handle this.
* :code:`is_video_path_file`: This is a boolean :code:`true` or :code:`false` to indicate whether the :code:`video_path` is a file.
* :code:`resolution_whc`: This is the resolution (width, height, channels) to open :code:`video_path`. Ensure your device can handle this.
* :code:`roi_xywh`: This is a region of interest (x, y, width, height) for applications that work with a single person.
* :code:`visualization`: This is a boolean :code:`true` or :code:`false` to indicate whether to open a window for visualization.
* :code:`use_right_hand`: This is a boolean :code:`true` or :code:`false` to indicate whether to run inference on the right hand (specific for gesture application).
* :code:`use_decoded_image_api`: This is a boolean :code:`true` or :code:`false` to indicate whether to use an API that sends single decoded image buffers instead of using a device/video handle within the Pipeline.

TLT CV Inference Pipelines
^^^^^^^^^^^^^^^^^^^^^^^^^^

Instantiations of TLT CV Inference Pipelines enable users to access fused inferences.
Below is a table visualizing the support matrix for these inferences and requests.
Each Pipeline uses its respective TLT network and its dependencies. Sample
Applications demo the usage of these Pipelines.

======================== ========================
CV Pipeline Enum         Available Responses
======================== ========================
:code:`BODY_POSE`        :code:`FRAME`, :code:`BODY_POSE`
:code:`EMOTION`          :code:`FRAME`, :code:`FACE_DETECT`, :code:`FACIAL_LANDMARKS`, :code:`EMOTION`
:code:`FACE_DETECT`      :code:`FRAME`, :code:`FACE_DETECT`
:code:`FACIAL_LANDMARKS` :code:`FRAME`, :code:`FACE_DETECT`, :code:`FACIAL_LANDMARKS`
:code:`GAZE`             :code:`FRAME`, :code:`FACE_DETECT`, :code:`FACIAL_LANDMARKS`, :code:`GAZE`
:code:`GESTURE`          :code:`FRAME`, :code:`BODY_POSE`, :code:`GESTURE`
:code:`HEART_RATE`       :code:`FRAME`, :code:`FACE_DETECT`, :code:`HEART_RATE`
======================== ========================

For Pipelines that support multiple responses (e.g. Emotion supports obtaining
face detections and landmarks), the ordering of the inferences is preserved
per person. That is to say, the first emotion is for the first face in the face detection
result.


Running the Body Pose Estimation Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _tlt_cv_inference_pipeline_sample_applications_body_pose_estimation:


.. code-block:: bash

    ./samples/tlt_cv/demo_bodypose/bodypose samples/tlt_cv/demo_bodypose/demo.conf

This demo supports multiple people. The visualization will draw joints and
lines on body parts as well as a bounding box over each body.

.. Note:: This Body Pose Estimation Demo requires the Body Pose TLT model to be trained and
   deployed.

Ensure BodyPoseNet and its dependencies are loaded as :code:`READY` during the Triton
Server startup:

    .. code-block:: bash

        +----------------------------------+---------+------------------------------------------+
        | Model                            | Version | Status                                   |
        +----------------------------------+---------+------------------------------------------+
        | bodypose_384x288_ensemble_tlt    | 1       | READY                                    |
        | bodypose_384x288_postprocess_tlt | 1       | READY                                    |
        | bodypose_384x288_tlt             | 1       | READY                                    |
        | ...                              | ...     | ...                                      |
        | hcgesture_tlt                    | 1       | READY                                    |
        +----------------------------------+---------+------------------------------------------+

        ...

        I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001
        I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
        I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002

If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as
:code:`READY`.

Body Pose Configuration
-----------------------

If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as :code:`READY`.
For example, if you created a model for a network input size of :code:`width = 320` and :code:`height = 224`,
ensure that :code:`bodypose_320x224_*` is loaded as :code:`READY`.

You also need to modify a configuration file inside the container of the samples:


  .. code-block:: bash

    CURRENT_WIDTH=384
    DESIRED_WIDTH=320
    CURRENT_HEIGHT=288
    DESIRED_HEIGHT=224
    sed -i "s/${CURRENT_WIDTH}/${DESIRED_WIDTH}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json
    sed -i "s/${CURRENT_HEIGHT}/${DESIRED_HEIGHT}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json

If you would like to perform this manually, these are the following locations to modify:

- The preprocessor output shape configuration or :code:`output_image_meta`
- The resize operation for :code:`ResizeNormalizeFP32Image`
- the Triton :code:`model_name` to request inference, which is the ensemble :code:`bodypose_320x224_ensemble_tlt`

.. Note:: This Body Pose Estimation Demo cannot consume a BodyPoseNet with fewer trained joints.

By default, BodyPoseNet will estimate 18 joints. Retraining with 14 joints and deploying in this
inference pipeline is not supported.


Body Pose API Usage
-------------------

The TLT CV API for Body Pose Estimation returns up to 18 body joints (nose,
neck, shoulders, elbows, wrists, knees, ankles, hips, eyes, and ears) in 2D
pixel space. Also, part of the returned structure is a bounding box over the joints.

The following code snippet is a glimpse of how a developer would use the API
to get a body pose. Assuming the initialized pipeline supports a :code:`BODY_POSE`
response, let's get the coordinate of the noses of the bodies in frame. First, we must check
whether the nose is present in the body by obtaining its index. Then, we can
access the coordinates using the index and can draw the coordinate on an image.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::BODY_POSE;
    nja::TLTCVAPI cvAPI(pipelineType);

    auto posePayload = cvAPI.getBodyPose();
    if (posePayload)
    {
        auto &poses = payload->group;
        for (auto const &pose: poses)
        {
            for (auto const &edge: edges)
            {
                auto joint = njv::NOSE; // desired joint
                using joint_t = std::underlying_type<njv::Joint>::type;
                // Find the index of the body part in the payload vector using `bodyPartPosition`.
                // If the value is < 0, then that body part is invalid, so we can skip drawing.
                int32_t indexJoint = pose.bodyPartPosition[static_cast<joint_t>(jointjointFrom)];
                if (indexJoint < 0) // check validity of joint
                {
                    continue;
                }
                auto const &bodyPart = pose.bodyParts[indexJoint];

                auto pt = Point(bodyPartFrom.x, bodyPartFrom.y);
                drawPoint(image, pt);
            }
        }

    }


The call to :code:`getBodyPose()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.


Running the Emotion Classification Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    ./samples/tlt_cv/demo_emotion/emotion samples/tlt_cv/demo_emotion/demo.conf

This demo will support multiple people. The visualization will draw a bounding
box around each face and text indicating the classified emotion.

Emotion API Usage
-----------------

The TLT CV API for Emotion classificaiton returns one of 7 emotions:

* neutral
* happy
* surprise
* squint
* disgust
* scream
* not applicable

The following code snippet is a glimpse of how a developer would use the API
to get emotions. Assuming the initialized pipeline supports a :code:`EMOTION`
response, let's access the return emotions and check if a person is happy.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::EMOTION;
    nja::TLTCVAPI cvAPI(pipelineType);
    auto emotionPayload = cvAPI.getEmotion();
    if (emotionPayload)
    {
        for (const auto& emotionElem: emotionPayload->group)
        {
            if (emotionElem.emotion == njv::HAPPY)
            {
                std::cout << "Found a Happy person!" << std::endl;
            }
        }
    }


The call to :code:`getEmotion()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Face Detection Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    ./samples/tlt_cv/demo_facedetect/facedetect samples/tlt_cv/demo_facedetect/demo.conf

This demo will support multiple people. The visualization will draw a bounding
box around each face.

Face Detection API Usage
------------------------

The TLT CV API for Face Detection returns a bounding box of X, Y, W, H
in 2D pixel space of the original image size.

The following code snippet is a glimpse of how a developer would use the API
to get face detections. Assuming the initialized pipeline supports a :code:`FACE_DETECT`
response, let's print the coordinates of the bounding boxes.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::FACE_DETECT;
    nja::TLTCVAPI cvAPI(pipelineType);
    auto facePayload = cvAPI.getFaceDetect();
    if (facePayload)
    {
        for (const auto& elem: facePayload->group)
        {
            const auto& box = elem.box;
            // Box coordinates provided in original image space
            std::cout << "x = " << box.x << std::endl;
            std::cout << "y = " << box.y << std::endl;
            std::cout << "w = " << box.w << std::endl;
            std::cout << "h = " << box.h << std::endl;
        }
    }


The call to :code:`getFaceDetect()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Facial Landmarks Estimation Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    ./samples/tlt_cv/demo_faciallandmarks/faciallandmarks samples/tlt_cv/demo_faciallandmarks/demo.conf

This demo will support multiple people. The visualization will draw landmarks
for each face.

Facial Landmarks API Usage
--------------------------

The TLT CV API for Facial Landmarks returns 80 landmarks in 2D pixel space.

The following code snippet is a glimpse of how a developer would use the API
to get the landmarks. Assuming the initialized pipeline supports a :code:`FACIAL_LANDMARKS`
response, let's print the coordinates of the landmarks.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::FACIAL_LANDMARKS;
    nja::TLTCVAPI cvAPI(pipelineType);
    auto landmarksPayload = cvAPI.getFacialLandmarks();
    if (landmarksPayload)
    {
        for (const auto& elem: landmarksPayload->group)
        {
            const auto& landmarks = elem.landmarks;
            for (size_t landmarkIndex = 0; landmarkIndex < landmarks.size(); landmarkIndex++)
            {
                std::cout << "index = " << landmarksIndex << "; x = " << x << "; y = " << y << std::endl;
            }
        }
    }


The call to :code:`getFacialLandmarks()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.

Running the Gaze Estimation Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    ./samples/tlt_cv/demo_gaze/gaze samples/tlt_cv/demo_gaze/demo.conf

This demo will support multiple people. The visualization will print out the 3D
gaze vector for each face.

The sample application has custom fields for the Perspective-n-Point (PNP) problem.
The camera calibration matrix, distortion coefficients, and 3D Facial Landmarks
are all provided for the visualization of the gaze vector.

The camera-dependent default values are generated from an off-the-shelf webcam,
and for more accurate visualization, users can use their own.

Gaze Estimation API Usage
-------------------------

The TLT CV API for Gaze returns a 3D vector in the camera coordinate system.
This X, Y, Z location is where the person is looking relative to the camera. The
units are millimeters.

Also in the Gaze payload is :code:`theta` and :code:`phi` outputs, which are independent
from the 3D vector. This output is a more general representation of a free-standing
gaze vector in 3D space. When applied to an origin (not provided),
say the pupil center or center of both eyes, it will represent the general gaze
direction which can be extended to any arbitrary point in front of the user.
This vector can then optionally be used to determine if it is intersecting with
an object in space to determine another point of regard.

For improved accuracy, we suggest to use the point of regard :code:`x, y, z` coordinates instead of
:code:`theta` and :code:`phi` outputs.

The following code snippet is a glimpse of how a developer would use the API
to get gaze. Assuming the initialized pipeline supports a :code:`GAZE`
response, let's print the coordinates of the landmarks.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::GAZE;
    nja::TLTCVAPI cvAPI(pipelineType);
    auto gazePayload = cvAPI.getGaze();
    if (gazePayload)
    {
        for (const auto& gazeElem: gazePayload->group)
        {
            const auto& gaze = gazeElem.gaze;
            std::cout << "x = " << gaze.x << std::endl;
            std::cout << "y = " << gaze.y << std::endl;
            std::cout << "z = " << gaze.z << std::endl;
            std::cout << "theta = " << gaze.theta << std::endl;
            std::cout << "phi = " << gaze.phi << std::endl;
        }
    }


The call to :code:`getGaze()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.


Running the Gesture Classification Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    ./samples/tlt_cv/demo_gesture/gesture samples/tlt_cv/demo_gesture/demo.conf

This demo will support only a single person specified in a ROI. It will only
print the classified gesture for a single hand as specified in the config.

.. Note:: This Gesture Demo requires the Body Pose TLT model to be trained and deployed.
    Body Pose along with heuristics determine a hand bounding box to crop a region for GestureNet.

Ensure BodyPoseNet and its dependencies are loaded as :code:`READY` during the Triton
Server startup:

    .. code-block:: bash

        +----------------------------------+---------+------------------------------------------+
        | Model                            | Version | Status                                   |
        +----------------------------------+---------+------------------------------------------+
        | bodypose_384x288_ensemble_tlt    | 1       | READY                                    |
        | bodypose_384x288_postprocess_tlt | 1       | READY                                    |
        | bodypose_384x288_tlt             | 1       | READY                                    |
        | ...                              | ...     | ...                                      |
        | hcgesture_tlt                    | 1       | READY                                    |
        +----------------------------------+---------+------------------------------------------+

        ...

        I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001
        I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
        I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002

Gesture Classification API Usage
--------------------------------

The TLT CV API for Gaze returns one of 5 gestures and a bounding box for
that gesture.

* Thumbs up
* Fist
* Stop
* Okay
* Two (also known as Raise)
* Random

Only a single gesture for a single hand is returned.

The following code snippet is a glimpse of how a developer would use the API
to get the gesture. Assuming the initialized pipeline supports a :code:`GESTURE`
response, let's check the gesture.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::GESTURE;
    nja::TLTCVAPI cvAPI(pipelineType);
    auto gesturePayload = cvAPI.getGesture();
    if (gesturePayload)
    {
        // For sake of example, assuming only 1 person in frame here.
        const auto &gestElem = gesturePayload->group[0];

        // Ensure validty of gesture element. The struct GesturePayload has a
        // pre-allocated array of MAX_NUM_USER_SUPPORT gestures. However,
        // since we restrict the number of users using a region of interest, only the
        // first element has the possibility of being valid.
        if (gestElem.valid)
        {
            // Draw bounding box of relevant gesture
            const auto &box = gestElem.bbox;

            if (gestElem.gesture == njv::GestureType::THUMBS_UP)
            {
                std::cout << "Thumbs up" << std::endl;
            }
        }
    }


The call to :code:`getGesture()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.


.. _running_hr_sample:

Running the Heart Rate Estimation Sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    ./samples/tlt_cv/demo_heartrate/heartrate samples/tlt_cv/demo_heartrate/demo.conf

This demo will support only a single person specified in a ROI. It will only
print the estimated heart rate for that person and a bounding box around the
face. The person of interest should be well illuminated and keep still, facing
the camera directly.

This demo is unique in that it only supports FPS between 15 - 30 and needs
your camera handle to support uncompressed YUYV.

.. Note:: Make note of the resolutions and FPS support for your video handle (eg. using the command :code:`v4l2-ctl --list-formats-ext`).


Heart Rate Estimation API Usage
-------------------------------

The TLT CV API for Heart Rate returns a structure with the beats per minutes
with extra booleans and flags to ensure validity. It also returns a bounding box for
that person's face.

Only a single person's heart rate is returned.

The following code snippet is a glimpse of how a developer would use the API
to get the gesture. Assuming the initialized pipeline supports a :code:`HEART_RATE`
response, let's check the heart rate.

.. code-block:: c++

    const auto pipelineType = nja::PipelineType::HEART_RATE;
    nja::TLTCVAPI cvAPI(pipelineType);
    auto payload = cvAPI.getHeartRate();
    if (payload)
    {
        // For sake of example, assuming only 1 person in frame here.
        const auto& firstPerson = payload->group[0];

        // Ensure validty of heart rate element.
        if (firstPerson.valid)
        {
            // Heart Rate is fragile to poor lighting, so a USB camera will
            // process the frames to increase exposure, contrast, etc.
            // autoatically. We check if the FPS is valid within certain range.
            if (!firstPerson.isFPSValid)
            {
                std::cerr << "Poor Lighting!" << std::endl;
            }
            else
            {
                // Heart Rate is fragile to motion. We use this boolean to
                // determine if the person is available to be estimated.
                if (firstPerson.available)
                {
                    std::cout << "Heart Rate = " << std::to_string(firstPerson.heartRate) << std::endl;
                }
            }
        }
    }


The call to :code:`getHeartRate()` will return the latest new return from the Pipeline
or :code:`nullptr` if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.

More code showcasing this API is featured in the sample application.


Building the Sample Applications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Source code and CMakeLists have been provided for users to recompile sources.
A recommendation for users who want to modify sources would be to leverage the
Quick Start Script :code:`config.sh` field called :code:`volume_mnt_samples`.

First, ensure the client container is running. Then in another terminal, copy the samples
to the host machine:

.. code-block:: bash

    docker cp image_tlt_cv_client:/workspace/tlt_cv-pkg/samples /path/on/host

Then, stop the client container and modify the :code:`config.sh` field called :code:`volume_mnt_samples`
to point to :code:`/path/on/host`. The next time you start the client container, we
will use the host samples folder and volume mount it inside the client container.
This will allow the user to modify the sample code outside of the container while
maintaining a devloper workflow inside the container.

Now, to recompile the sources, the user must be inside in the client container.

1. Make a directory to save new binaries:

   .. code-block:: bash

      mkdir -p /workspace/tlt_cv-pkg/samples/tlt_cv_install

2. Enter the location of our source files:

   .. code-block:: bash

      cd samples/tlt_cv/

3. Make and enter a build directory for CMake:

   .. code-block:: bash

      mkdir -p build && cd build

4. Build and install the new binaries:

   .. code-block:: bash

      cmake -DPROJECT_ROOT:STRING=/workspace/tlt_cv-pkg/ -DINSTALL_DIR=/workspace/tlt_cv-pkg/samples/tlt_cv_install/ ..
      make install

   .. Note:: If using CMake on a Jetson device, add an *additional* flag :code:`-DTARGET=1` which will result in :code:`cmake -DTARGET=1 ...`

5. Verify binaries exist in the folder:

   .. code-block:: bash

      ls -al /workspace/tlt_cv-pkg/samples/tlt_cv_install/tlt_cv/

6. These binaries can be run just as the precompiled binaries we provide, but still must be run with respect to the folder :code:`/workspace/tlt_cv-pkg`

   .. code-block:: bash

      cd /workspace/tlt_cv-pkg
     ./samples/tlt_cv_install/tlt_cv/demo_facedetect/facedetect samples/tlt_cv_install/tlt_cv/demo_facedetect/demo.conf