Running and Building Sample Applications ======================================== .. _tlt_cv_inference_pipeline_sample_applications: Enter the client container using the :ref:`TLT CV Quick Start Scripts`. We have sample applications available to run along with their source code. Each of the sample applications follows the format: .. code-block:: bash ./path/to/binary path/to/config/file All the binaries should be run with respect to the folder that the container opens to by default (:code:`/workspace/tlt_cv-pkg`). Below are instructions for runnin the out-of-the-box samples. The config files all assume the video handle :code:`/dev/video0` has been added to the Quick Start configuration and will be opened at the resolutions and FPS specified. The config files will contain the following common fields: * :code:`video_path`: This is the device handle or an absolute path to a video * :code:`fps`: This is the frame per second to open the :code:`video_path`. Ensure your device can handle this. * :code:`is_video_path_file`: This is a boolean :code:`true` or :code:`false` to indicate whether the :code:`video_path` is a file. * :code:`resolution_whc`: This is the resolution (width, height, channels) to open :code:`video_path`. Ensure your device can handle this. * :code:`roi_xywh`: This is a region of interest (x, y, width, height) for applications that work with a single person. * :code:`visualization`: This is a boolean :code:`true` or :code:`false` to indicate whether to open a window for visualization. * :code:`use_right_hand`: This is a boolean :code:`true` or :code:`false` to indicate whether to run inference on the right hand (specific for gesture application). * :code:`use_decoded_image_api`: This is a boolean :code:`true` or :code:`false` to indicate whether to use an API that sends single decoded image buffers instead of using a device/video handle within the Pipeline. TLT CV Inference Pipelines ^^^^^^^^^^^^^^^^^^^^^^^^^^ Instantiations of TLT CV Inference Pipelines enable users to access fused inferences. Below is a table visualizing the support matrix for these inferences and requests. Each Pipeline uses its respective TLT network and its dependencies. Sample Applications demo the usage of these Pipelines. ======================== ======================== CV Pipeline Enum Available Responses ======================== ======================== :code:`BODY_POSE` :code:`FRAME`, :code:`BODY_POSE` :code:`EMOTION` :code:`FRAME`, :code:`FACE_DETECT`, :code:`FACIAL_LANDMARKS`, :code:`EMOTION` :code:`FACE_DETECT` :code:`FRAME`, :code:`FACE_DETECT` :code:`FACIAL_LANDMARKS` :code:`FRAME`, :code:`FACE_DETECT`, :code:`FACIAL_LANDMARKS` :code:`GAZE` :code:`FRAME`, :code:`FACE_DETECT`, :code:`FACIAL_LANDMARKS`, :code:`GAZE` :code:`GESTURE` :code:`FRAME`, :code:`BODY_POSE`, :code:`GESTURE` :code:`HEART_RATE` :code:`FRAME`, :code:`FACE_DETECT`, :code:`HEART_RATE` ======================== ======================== For Pipelines that support multiple responses (e.g. Emotion supports obtaining face detections and landmarks), the ordering of the inferences is preserved per person. That is to say, the first emotion is for the first face in the face detection result. Running the Body Pose Estimation Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _tlt_cv_inference_pipeline_sample_applications_body_pose_estimation: .. code-block:: bash ./samples/tlt_cv/demo_bodypose/bodypose samples/tlt_cv/demo_bodypose/demo.conf This demo supports multiple people. The visualization will draw joints and lines on body parts as well as a bounding box over each body. .. Note:: This Body Pose Estimation Demo requires the Body Pose TLT model to be trained and deployed. Ensure BodyPoseNet and its dependencies are loaded as :code:`READY` during the Triton Server startup: .. code-block:: bash +----------------------------------+---------+------------------------------------------+ | Model | Version | Status | +----------------------------------+---------+------------------------------------------+ | bodypose_384x288_ensemble_tlt | 1 | READY | | bodypose_384x288_postprocess_tlt | 1 | READY | | bodypose_384x288_tlt | 1 | READY | | ... | ... | ... | | hcgesture_tlt | 1 | READY | +----------------------------------+---------+------------------------------------------+ ... I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002 If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as :code:`READY`. Body Pose Configuration ----------------------- If deploying a non-default shape BodyPoseNet, ensure the appropriate variants are loaded as :code:`READY`. For example, if you created a model for a network input size of :code:`width = 320` and :code:`height = 224`, ensure that :code:`bodypose_320x224_*` is loaded as :code:`READY`. You also need to modify a configuration file inside the container of the samples: .. code-block:: bash CURRENT_WIDTH=384 DESIRED_WIDTH=320 CURRENT_HEIGHT=288 DESIRED_HEIGHT=224 sed -i "s/${CURRENT_WIDTH}/${DESIRED_WIDTH}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json sed -i "s/${CURRENT_HEIGHT}/${DESIRED_HEIGHT}/g" pipelines/vision/subgraphs/bodypose2d_tlt.subgraph.json If you would like to perform this manually, these are the following locations to modify: - The preprocessor output shape configuration or :code:`output_image_meta` - The resize operation for :code:`ResizeNormalizeFP32Image` - the Triton :code:`model_name` to request inference, which is the ensemble :code:`bodypose_320x224_ensemble_tlt` .. Note:: This Body Pose Estimation Demo cannot consume a BodyPoseNet with fewer trained joints. By default, BodyPoseNet will estimate 18 joints. Retraining with 14 joints and deploying in this inference pipeline is not supported. Body Pose API Usage ------------------- The TLT CV API for Body Pose Estimation returns up to 18 body joints (nose, neck, shoulders, elbows, wrists, knees, ankles, hips, eyes, and ears) in 2D pixel space. Also, part of the returned structure is a bounding box over the joints. The following code snippet is a glimpse of how a developer would use the API to get a body pose. Assuming the initialized pipeline supports a :code:`BODY_POSE` response, let's get the coordinate of the noses of the bodies in frame. First, we must check whether the nose is present in the body by obtaining its index. Then, we can access the coordinates using the index and can draw the coordinate on an image. .. code-block:: c++ const auto pipelineType = nja::PipelineType::BODY_POSE; nja::TLTCVAPI cvAPI(pipelineType); auto posePayload = cvAPI.getBodyPose(); if (posePayload) { auto &poses = payload->group; for (auto const &pose: poses) { for (auto const &edge: edges) { auto joint = njv::NOSE; // desired joint using joint_t = std::underlying_type::type; // Find the index of the body part in the payload vector using `bodyPartPosition`. // If the value is < 0, then that body part is invalid, so we can skip drawing. int32_t indexJoint = pose.bodyPartPosition[static_cast(jointjointFrom)]; if (indexJoint < 0) // check validity of joint { continue; } auto const &bodyPart = pose.bodyParts[indexJoint]; auto pt = Point(bodyPartFrom.x, bodyPartFrom.y); drawPoint(image, pt); } } } The call to :code:`getBodyPose()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. Running the Emotion Classification Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash ./samples/tlt_cv/demo_emotion/emotion samples/tlt_cv/demo_emotion/demo.conf This demo will support multiple people. The visualization will draw a bounding box around each face and text indicating the classified emotion. Emotion API Usage ----------------- The TLT CV API for Emotion classificaiton returns one of 7 emotions: * neutral * happy * surprise * squint * disgust * scream * not applicable The following code snippet is a glimpse of how a developer would use the API to get emotions. Assuming the initialized pipeline supports a :code:`EMOTION` response, let's access the return emotions and check if a person is happy. .. code-block:: c++ const auto pipelineType = nja::PipelineType::EMOTION; nja::TLTCVAPI cvAPI(pipelineType); auto emotionPayload = cvAPI.getEmotion(); if (emotionPayload) { for (const auto& emotionElem: emotionPayload->group) { if (emotionElem.emotion == njv::HAPPY) { std::cout << "Found a Happy person!" << std::endl; } } } The call to :code:`getEmotion()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. Running the Face Detection Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash ./samples/tlt_cv/demo_facedetect/facedetect samples/tlt_cv/demo_facedetect/demo.conf This demo will support multiple people. The visualization will draw a bounding box around each face. Face Detection API Usage ------------------------ The TLT CV API for Face Detection returns a bounding box of X, Y, W, H in 2D pixel space of the original image size. The following code snippet is a glimpse of how a developer would use the API to get face detections. Assuming the initialized pipeline supports a :code:`FACE_DETECT` response, let's print the coordinates of the bounding boxes. .. code-block:: c++ const auto pipelineType = nja::PipelineType::FACE_DETECT; nja::TLTCVAPI cvAPI(pipelineType); auto facePayload = cvAPI.getFaceDetect(); if (facePayload) { for (const auto& elem: facePayload->group) { const auto& box = elem.box; // Box coordinates provided in original image space std::cout << "x = " << box.x << std::endl; std::cout << "y = " << box.y << std::endl; std::cout << "w = " << box.w << std::endl; std::cout << "h = " << box.h << std::endl; } } The call to :code:`getFaceDetect()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. Running the Facial Landmarks Estimation Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash ./samples/tlt_cv/demo_faciallandmarks/faciallandmarks samples/tlt_cv/demo_faciallandmarks/demo.conf This demo will support multiple people. The visualization will draw landmarks for each face. Facial Landmarks API Usage -------------------------- The TLT CV API for Facial Landmarks returns 80 landmarks in 2D pixel space. The following code snippet is a glimpse of how a developer would use the API to get the landmarks. Assuming the initialized pipeline supports a :code:`FACIAL_LANDMARKS` response, let's print the coordinates of the landmarks. .. code-block:: c++ const auto pipelineType = nja::PipelineType::FACIAL_LANDMARKS; nja::TLTCVAPI cvAPI(pipelineType); auto landmarksPayload = cvAPI.getFacialLandmarks(); if (landmarksPayload) { for (const auto& elem: landmarksPayload->group) { const auto& landmarks = elem.landmarks; for (size_t landmarkIndex = 0; landmarkIndex < landmarks.size(); landmarkIndex++) { std::cout << "index = " << landmarksIndex << "; x = " << x << "; y = " << y << std::endl; } } } The call to :code:`getFacialLandmarks()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. Running the Gaze Estimation Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash ./samples/tlt_cv/demo_gaze/gaze samples/tlt_cv/demo_gaze/demo.conf This demo will support multiple people. The visualization will print out the 3D gaze vector for each face. The sample application has custom fields for the Perspective-n-Point (PNP) problem. The camera calibration matrix, distortion coefficients, and 3D Facial Landmarks are all provided for the visualization of the gaze vector. The camera-dependent default values are generated from an off-the-shelf webcam, and for more accurate visualization, users can use their own. Gaze Estimation API Usage ------------------------- The TLT CV API for Gaze returns a 3D vector in the camera coordinate system. This X, Y, Z location is where the person is looking relative to the camera. The units are millimeters. Also in the Gaze payload is :code:`theta` and :code:`phi` outputs, which are independent from the 3D vector. This output is a more general representation of a free-standing gaze vector in 3D space. When applied to an origin (not provided), say the pupil center or center of both eyes, it will represent the general gaze direction which can be extended to any arbitrary point in front of the user. This vector can then optionally be used to determine if it is intersecting with an object in space to determine another point of regard. For improved accuracy, we suggest to use the point of regard :code:`x, y, z` coordinates instead of :code:`theta` and :code:`phi` outputs. The following code snippet is a glimpse of how a developer would use the API to get gaze. Assuming the initialized pipeline supports a :code:`GAZE` response, let's print the coordinates of the landmarks. .. code-block:: c++ const auto pipelineType = nja::PipelineType::GAZE; nja::TLTCVAPI cvAPI(pipelineType); auto gazePayload = cvAPI.getGaze(); if (gazePayload) { for (const auto& gazeElem: gazePayload->group) { const auto& gaze = gazeElem.gaze; std::cout << "x = " << gaze.x << std::endl; std::cout << "y = " << gaze.y << std::endl; std::cout << "z = " << gaze.z << std::endl; std::cout << "theta = " << gaze.theta << std::endl; std::cout << "phi = " << gaze.phi << std::endl; } } The call to :code:`getGaze()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. Running the Gesture Classification Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash ./samples/tlt_cv/demo_gesture/gesture samples/tlt_cv/demo_gesture/demo.conf This demo will support only a single person specified in a ROI. It will only print the classified gesture for a single hand as specified in the config. .. Note:: This Gesture Demo requires the Body Pose TLT model to be trained and deployed. Body Pose along with heuristics determine a hand bounding box to crop a region for GestureNet. Ensure BodyPoseNet and its dependencies are loaded as :code:`READY` during the Triton Server startup: .. code-block:: bash +----------------------------------+---------+------------------------------------------+ | Model | Version | Status | +----------------------------------+---------+------------------------------------------+ | bodypose_384x288_ensemble_tlt | 1 | READY | | bodypose_384x288_postprocess_tlt | 1 | READY | | bodypose_384x288_tlt | 1 | READY | | ... | ... | ... | | hcgesture_tlt | 1 | READY | +----------------------------------+---------+------------------------------------------+ ... I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002 Gesture Classification API Usage -------------------------------- The TLT CV API for Gaze returns one of 5 gestures and a bounding box for that gesture. * Thumbs up * Fist * Stop * Okay * Two (also known as Raise) * Random Only a single gesture for a single hand is returned. The following code snippet is a glimpse of how a developer would use the API to get the gesture. Assuming the initialized pipeline supports a :code:`GESTURE` response, let's check the gesture. .. code-block:: c++ const auto pipelineType = nja::PipelineType::GESTURE; nja::TLTCVAPI cvAPI(pipelineType); auto gesturePayload = cvAPI.getGesture(); if (gesturePayload) { // For sake of example, assuming only 1 person in frame here. const auto &gestElem = gesturePayload->group[0]; // Ensure validty of gesture element. The struct GesturePayload has a // pre-allocated array of MAX_NUM_USER_SUPPORT gestures. However, // since we restrict the number of users using a region of interest, only the // first element has the possibility of being valid. if (gestElem.valid) { // Draw bounding box of relevant gesture const auto &box = gestElem.bbox; if (gestElem.gesture == njv::GestureType::THUMBS_UP) { std::cout << "Thumbs up" << std::endl; } } } The call to :code:`getGesture()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. .. _running_hr_sample: Running the Heart Rate Estimation Sample ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash ./samples/tlt_cv/demo_heartrate/heartrate samples/tlt_cv/demo_heartrate/demo.conf This demo will support only a single person specified in a ROI. It will only print the estimated heart rate for that person and a bounding box around the face. The person of interest should be well illuminated and keep still, facing the camera directly. This demo is unique in that it only supports FPS between 15 - 30 and needs your camera handle to support uncompressed YUYV. .. Note:: Make note of the resolutions and FPS support for your video handle (eg. using the command :code:`v4l2-ctl --list-formats-ext`). Heart Rate Estimation API Usage ------------------------------- The TLT CV API for Heart Rate returns a structure with the beats per minutes with extra booleans and flags to ensure validity. It also returns a bounding box for that person's face. Only a single person's heart rate is returned. The following code snippet is a glimpse of how a developer would use the API to get the gesture. Assuming the initialized pipeline supports a :code:`HEART_RATE` response, let's check the heart rate. .. code-block:: c++ const auto pipelineType = nja::PipelineType::HEART_RATE; nja::TLTCVAPI cvAPI(pipelineType); auto payload = cvAPI.getHeartRate(); if (payload) { // For sake of example, assuming only 1 person in frame here. const auto& firstPerson = payload->group[0]; // Ensure validty of heart rate element. if (firstPerson.valid) { // Heart Rate is fragile to poor lighting, so a USB camera will // process the frames to increase exposure, contrast, etc. // autoatically. We check if the FPS is valid within certain range. if (!firstPerson.isFPSValid) { std::cerr << "Poor Lighting!" << std::endl; } else { // Heart Rate is fragile to motion. We use this boolean to // determine if the person is available to be estimated. if (firstPerson.available) { std::cout << "Heart Rate = " << std::to_string(firstPerson.heartRate) << std::endl; } } } } The call to :code:`getHeartRate()` will return the latest new return from the Pipeline or :code:`nullptr` if already obtained or unavailable. The API also provides the ability to block on this call until a result is obtained. More code showcasing this API is featured in the sample application. Building the Sample Applications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Source code and CMakeLists have been provided for users to recompile sources. A recommendation for users who want to modify sources would be to leverage the Quick Start Script :code:`config.sh` field called :code:`volume_mnt_samples`. First, ensure the client container is running. Then in another terminal, copy the samples to the host machine: .. code-block:: bash docker cp image_tlt_cv_client:/workspace/tlt_cv-pkg/samples /path/on/host Then, stop the client container and modify the :code:`config.sh` field called :code:`volume_mnt_samples` to point to :code:`/path/on/host`. The next time you start the client container, we will use the host samples folder and volume mount it inside the client container. This will allow the user to modify the sample code outside of the container while maintaining a devloper workflow inside the container. Now, to recompile the sources, the user must be inside in the client container. 1. Make a directory to save new binaries: .. code-block:: bash mkdir -p /workspace/tlt_cv-pkg/samples/tlt_cv_install 2. Enter the location of our source files: .. code-block:: bash cd samples/tlt_cv/ 3. Make and enter a build directory for CMake: .. code-block:: bash mkdir -p build && cd build 4. Build and install the new binaries: .. code-block:: bash cmake -DPROJECT_ROOT:STRING=/workspace/tlt_cv-pkg/ -DINSTALL_DIR=/workspace/tlt_cv-pkg/samples/tlt_cv_install/ .. make install .. Note:: If using CMake on a Jetson device, add an *additional* flag :code:`-DTARGET=1` which will result in :code:`cmake -DTARGET=1 ...` 5. Verify binaries exist in the folder: .. code-block:: bash ls -al /workspace/tlt_cv-pkg/samples/tlt_cv_install/tlt_cv/ 6. These binaries can be run just as the precompiled binaries we provide, but still must be run with respect to the folder :code:`/workspace/tlt_cv-pkg` .. code-block:: bash cd /workspace/tlt_cv-pkg ./samples/tlt_cv_install/tlt_cv/demo_facedetect/facedetect samples/tlt_cv_install/tlt_cv/demo_facedetect/demo.conf