Running and Building Sample Applications¶
Enter the client container using the TLT CV Quick Start Scripts. We have sample applications available to run along with their source code.
Each of the sample applications follows the format:
./path/to/binary path/to/config/file
All the binaries should be run with respect to the folder that the container
opens to by default (/workspace/tlt_cv-pkg
).
Below are instructions for runnin the out-of-the-box samples. The config files
all assume the video handle /dev/video0
has been added to the Quick Start
configuration and will be opened at the resolutions and FPS specified.
The config files will contain the following common fields:
video_path
: This is the device handle or an absolute path to a videofps
: This is the frame per second to open thevideo_path
. Ensure your device can handle this.is_video_path_file
: This is a booleantrue
orfalse
to indicate whether thevideo_path
is a file.resolution_whc
: This is the resolution (width, height, channels) to openvideo_path
. Ensure your device can handle this.roi_xywh
: This is a region of interest (x, y, width, height) for applications that work with a single person.visualization
: This is a booleantrue
orfalse
to indicate whether to open a window for visualization.use_right_hand
: This is a booleantrue
orfalse
to indicate whether to run inference on the right hand (specific for gesture application).use_decoded_image_api
: This is a booleantrue
orfalse
to indicate whether to use an API that sends single decoded image buffers instead of using a device/video handle within the Pipeline.
TLT CV Inference Pipelines¶
Instantiations of TLT CV Inference Pipelines enable users to access fused inferences. Below is a table visualizing the support matrix for these inferences and requests. Each Pipeline uses its respective TLT network and its dependencies. Sample Applications demo the usage of these Pipelines.
CV Pipeline Enum |
Available Responses |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For Pipelines that support multiple responses (e.g. Emotion supports obtaining face detections and landmarks), the ordering of the inferences is preserved per person. That is to say, the first emotion is for the first face in the face detection result.
Running the Body Pose Estimation Sample¶
./samples/tlt_cv/demo_bodypose/bodypose samples/tlt_cv/demo_bodypose/demo.conf
This demo supports multiple people. The visualization will draw joints and lines on body parts as well as a bounding box over each body.
Note
This Body Pose Estimation Demo requires the Body Pose TLT model to be trained and deployed.
Ensure BodyPoseNet and its dependencies are loaded as READY
during the Triton
Server startup:
+----------------------------------+---------+------------------------------------------+ | Model | Version | Status | +----------------------------------+---------+------------------------------------------+ | bodypose_384x288_ensemble_tlt | 1 | READY | | bodypose_384x288_postprocess_tlt | 1 | READY | | bodypose_384x288_tlt | 1 | READY | | ... | ... | ... | | hcgesture_tlt | 1 | READY | +----------------------------------+---------+------------------------------------------+ ... I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002
If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as
READY
.
Body Pose Configuration¶
If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as READY
.
For example, if you created a model for a network input size of width = 320
and height = 224
,
ensure that bodypose_320x224_*
is loaded as READY
.
You also need to modify a configuration file inside the container of the samples:
CURRENT_WIDTH=384 DESIRED_WIDTH=320 CURRENT_HEIGHT=288 DESIRED_HEIGHT=224 sed -i "s/${CURRENT_WIDTH}/${DESIRED_WIDTH}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json sed -i "s/${CURRENT_HEIGHT}/${DESIRED_HEIGHT}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json
If you would like to perform this manually, these are the following locations to modify:
The preprocessor output shape configuration or
output_image_meta
The resize operation for
ResizeNormalizeFP32Image
the Triton
model_name
to request inference, which is the ensemblebodypose_320x224_ensemble_tlt
Note
This Body Pose Estimation Demo cannot consume a BodyPoseNet with fewer trained joints.
By default, BodyPoseNet will estimate 18 joints. Retraining with 14 joints and deploying in this inference pipeline is not supported.
Body Pose API Usage¶
The TLT CV API for Body Pose Estimation returns up to 18 body joints (nose, neck, shoulders, elbows, wrists, knees, ankles, hips, eyes, and ears) in 2D pixel space. Also, part of the returned structure is a bounding box over the joints.
The following code snippet is a glimpse of how a developer would use the API
to get a body pose. Assuming the initialized pipeline supports a BODY_POSE
response, let’s get the coordinate of the noses of the bodies in frame. First, we must check
whether the nose is present in the body by obtaining its index. Then, we can
access the coordinates using the index and can draw the coordinate on an image.
const auto pipelineType = nja::PipelineType::BODY_POSE;
nja::TLTCVAPI cvAPI(pipelineType);
auto posePayload = cvAPI.getBodyPose();
if (posePayload)
{
auto &poses = payload->group;
for (auto const &pose: poses)
{
for (auto const &edge: edges)
{
auto joint = njv::NOSE; // desired joint
using joint_t = std::underlying_type<njv::Joint>::type;
// Find the index of the body part in the payload vector using `bodyPartPosition`.
// If the value is < 0, then that body part is invalid, so we can skip drawing.
int32_t indexJoint = pose.bodyPartPosition[static_cast<joint_t>(jointjointFrom)];
if (indexJoint < 0) // check validity of joint
{
continue;
}
auto const &bodyPart = pose.bodyParts[indexJoint];
auto pt = Point(bodyPartFrom.x, bodyPartFrom.y);
drawPoint(image, pt);
}
}
}
The call to getBodyPose()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Running the Emotion Classification Sample¶
./samples/tlt_cv/demo_emotion/emotion samples/tlt_cv/demo_emotion/demo.conf
This demo will support multiple people. The visualization will draw a bounding box around each face and text indicating the classified emotion.
Emotion API Usage¶
The TLT CV API for Emotion classificaiton returns one of 7 emotions:
neutral
happy
surprise
squint
disgust
scream
not applicable
The following code snippet is a glimpse of how a developer would use the API
to get emotions. Assuming the initialized pipeline supports a EMOTION
response, let’s access the return emotions and check if a person is happy.
const auto pipelineType = nja::PipelineType::EMOTION;
nja::TLTCVAPI cvAPI(pipelineType);
auto emotionPayload = cvAPI.getEmotion();
if (emotionPayload)
{
for (const auto& emotionElem: emotionPayload->group)
{
if (emotionElem.emotion == njv::HAPPY)
{
std::cout << "Found a Happy person!" << std::endl;
}
}
}
The call to getEmotion()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Running the Face Detection Sample¶
./samples/tlt_cv/demo_facedetect/facedetect samples/tlt_cv/demo_facedetect/demo.conf
This demo will support multiple people. The visualization will draw a bounding box around each face.
Face Detection API Usage¶
The TLT CV API for Face Detection returns a bounding box of X, Y, W, H in 2D pixel space of the original image size.
The following code snippet is a glimpse of how a developer would use the API
to get face detections. Assuming the initialized pipeline supports a FACE_DETECT
response, let’s print the coordinates of the bounding boxes.
const auto pipelineType = nja::PipelineType::FACE_DETECT;
nja::TLTCVAPI cvAPI(pipelineType);
auto facePayload = cvAPI.getFaceDetect();
if (facePayload)
{
for (const auto& elem: facePayload->group)
{
const auto& box = elem.box;
// Box coordinates provided in original image space
std::cout << "x = " << box.x << std::endl;
std::cout << "y = " << box.y << std::endl;
std::cout << "w = " << box.w << std::endl;
std::cout << "h = " << box.h << std::endl;
}
}
The call to getFaceDetect()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Running the Facial Landmarks Estimation Sample¶
./samples/tlt_cv/demo_faciallandmarks/faciallandmarks samples/tlt_cv/demo_faciallandmarks/demo.conf
This demo will support multiple people. The visualization will draw landmarks for each face.
Facial Landmarks API Usage¶
The TLT CV API for Facial Landmarks returns 80 landmarks in 2D pixel space.
The following code snippet is a glimpse of how a developer would use the API
to get the landmarks. Assuming the initialized pipeline supports a FACIAL_LANDMARKS
response, let’s print the coordinates of the landmarks.
const auto pipelineType = nja::PipelineType::FACIAL_LANDMARKS;
nja::TLTCVAPI cvAPI(pipelineType);
auto landmarksPayload = cvAPI.getFacialLandmarks();
if (landmarksPayload)
{
for (const auto& elem: landmarksPayload->group)
{
const auto& landmarks = elem.landmarks;
for (size_t landmarkIndex = 0; landmarkIndex < landmarks.size(); landmarkIndex++)
{
std::cout << "index = " << landmarksIndex << "; x = " << x << "; y = " << y << std::endl;
}
}
}
The call to getFacialLandmarks()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Running the Gaze Estimation Sample¶
./samples/tlt_cv/demo_gaze/gaze samples/tlt_cv/demo_gaze/demo.conf
This demo will support multiple people. The visualization will print out the 3D gaze vector for each face.
The sample application has custom fields for the Perspective-n-Point (PNP) problem. The camera calibration matrix, distortion coefficients, and 3D Facial Landmarks are all provided for the visualization of the gaze vector.
The camera-dependent default values are generated from an off-the-shelf webcam, and for more accurate visualization, users can use their own.
Gaze Estimation API Usage¶
The TLT CV API for Gaze returns a 3D vector in the camera coordinate system. This X, Y, Z location is where the person is looking relative to the camera. The units are millimeters.
Also in the Gaze payload is theta
and phi
outputs, which are independent
from the 3D vector. This output is a more general representation of a free-standing
gaze vector in 3D space. When applied to an origin (not provided),
say the pupil center or center of both eyes, it will represent the general gaze
direction which can be extended to any arbitrary point in front of the user.
This vector can then optionally be used to determine if it is intersecting with
an object in space to determine another point of regard.
For improved accuracy, we suggest to use the point of regard x, y, z
coordinates instead of
theta
and phi
outputs.
The following code snippet is a glimpse of how a developer would use the API
to get gaze. Assuming the initialized pipeline supports a GAZE
response, let’s print the coordinates of the landmarks.
const auto pipelineType = nja::PipelineType::GAZE;
nja::TLTCVAPI cvAPI(pipelineType);
auto gazePayload = cvAPI.getGaze();
if (gazePayload)
{
for (const auto& gazeElem: gazePayload->group)
{
const auto& gaze = gazeElem.gaze;
std::cout << "x = " << gaze.x << std::endl;
std::cout << "y = " << gaze.y << std::endl;
std::cout << "z = " << gaze.z << std::endl;
std::cout << "theta = " << gaze.theta << std::endl;
std::cout << "phi = " << gaze.phi << std::endl;
}
}
The call to getGaze()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Running the Gesture Classification Sample¶
./samples/tlt_cv/demo_gesture/gesture samples/tlt_cv/demo_gesture/demo.conf
This demo will support only a single person specified in a ROI. It will only print the classified gesture for a single hand as specified in the config.
Note
This Gesture Demo requires the Body Pose TLT model to be trained and deployed. Body Pose along with heuristics determine a hand bounding box to crop a region for GestureNet.
Ensure BodyPoseNet and its dependencies are loaded as READY
during the Triton
Server startup:
+----------------------------------+---------+------------------------------------------+ | Model | Version | Status | +----------------------------------+---------+------------------------------------------+ | bodypose_384x288_ensemble_tlt | 1 | READY | | bodypose_384x288_postprocess_tlt | 1 | READY | | bodypose_384x288_tlt | 1 | READY | | ... | ... | ... | | hcgesture_tlt | 1 | READY | +----------------------------------+---------+------------------------------------------+ ... I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002
Gesture Classification API Usage¶
The TLT CV API for Gaze returns one of 5 gestures and a bounding box for that gesture.
Thumbs up
Fist
Stop
Okay
Two (also known as Raise)
Random
Only a single gesture for a single hand is returned.
The following code snippet is a glimpse of how a developer would use the API
to get the gesture. Assuming the initialized pipeline supports a GESTURE
response, let’s check the gesture.
const auto pipelineType = nja::PipelineType::GESTURE;
nja::TLTCVAPI cvAPI(pipelineType);
auto gesturePayload = cvAPI.getGesture();
if (gesturePayload)
{
// For sake of example, assuming only 1 person in frame here.
const auto &gestElem = gesturePayload->group[0];
// Ensure validty of gesture element. The struct GesturePayload has a
// pre-allocated array of MAX_NUM_USER_SUPPORT gestures. However,
// since we restrict the number of users using a region of interest, only the
// first element has the possibility of being valid.
if (gestElem.valid)
{
// Draw bounding box of relevant gesture
const auto &box = gestElem.bbox;
if (gestElem.gesture == njv::GestureType::THUMBS_UP)
{
std::cout << "Thumbs up" << std::endl;
}
}
}
The call to getGesture()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Running the Heart Rate Estimation Sample¶
./samples/tlt_cv/demo_heartrate/heartrate samples/tlt_cv/demo_heartrate/demo.conf
This demo will support only a single person specified in a ROI. It will only print the estimated heart rate for that person and a bounding box around the face. The person of interest should be well illuminated and keep still, facing the camera directly.
This demo is unique in that it only supports FPS between 15 - 30 and needs your camera handle to support uncompressed YUYV.
Note
Make note of the resolutions and FPS support for your video handle (eg. using the command v4l2-ctl --list-formats-ext
).
Heart Rate Estimation API Usage¶
The TLT CV API for Heart Rate returns a structure with the beats per minutes with extra booleans and flags to ensure validity. It also returns a bounding box for that person’s face.
Only a single person’s heart rate is returned.
The following code snippet is a glimpse of how a developer would use the API
to get the gesture. Assuming the initialized pipeline supports a HEART_RATE
response, let’s check the heart rate.
const auto pipelineType = nja::PipelineType::HEART_RATE;
nja::TLTCVAPI cvAPI(pipelineType);
auto payload = cvAPI.getHeartRate();
if (payload)
{
// For sake of example, assuming only 1 person in frame here.
const auto& firstPerson = payload->group[0];
// Ensure validty of heart rate element.
if (firstPerson.valid)
{
// Heart Rate is fragile to poor lighting, so a USB camera will
// process the frames to increase exposure, contrast, etc.
// autoatically. We check if the FPS is valid within certain range.
if (!firstPerson.isFPSValid)
{
std::cerr << "Poor Lighting!" << std::endl;
}
else
{
// Heart Rate is fragile to motion. We use this boolean to
// determine if the person is available to be estimated.
if (firstPerson.available)
{
std::cout << "Heart Rate = " << std::to_string(firstPerson.heartRate) << std::endl;
}
}
}
}
The call to getHeartRate()
will return the latest new return from the Pipeline
or nullptr
if already obtained or unavailable. The API also provides the ability
to block on this call until a result is obtained.
More code showcasing this API is featured in the sample application.
Building the Sample Applications¶
Source code and CMakeLists have been provided for users to recompile sources.
A recommendation for users who want to modify sources would be to leverage the
Quick Start Script config.sh
field called volume_mnt_samples
.
First, ensure the client container is running. Then in another terminal, copy the samples to the host machine:
docker cp image_tlt_cv_client:/workspace/tlt_cv-pkg/samples /path/on/host
Then, stop the client container and modify the config.sh
field called volume_mnt_samples
to point to /path/on/host
. The next time you start the client container, we
will use the host samples folder and volume mount it inside the client container.
This will allow the user to modify the sample code outside of the container while
maintaining a devloper workflow inside the container.
Now, to recompile the sources, the user must be inside in the client container.
Make a directory to save new binaries:
mkdir -p /workspace/tlt_cv-pkg/samples/tlt_cv_install
Enter the location of our source files:
cd samples/tlt_cv/
Make and enter a build directory for CMake:
mkdir -p build && cd build
Build and install the new binaries:
cmake -DPROJECT_ROOT:STRING=/workspace/tlt_cv-pkg/ -DINSTALL_DIR=/workspace/tlt_cv-pkg/samples/tlt_cv_install/ .. make install
Note
If using CMake on a Jetson device, add an additional flag
-DTARGET=1
which will result incmake -DTARGET=1 ...
Verify binaries exist in the folder:
ls -al /workspace/tlt_cv-pkg/samples/tlt_cv_install/tlt_cv/
These binaries can be run just as the precompiled binaries we provide, but still must be run with respect to the folder
/workspace/tlt_cv-pkg
cd /workspace/tlt_cv-pkg ./samples/tlt_cv_install/tlt_cv/demo_facedetect/facedetect samples/tlt_cv_install/tlt_cv/demo_facedetect/demo.conf