TLT CV Inference Pipeline Quick Start Scripts
=============================================

.. _tlt_cv_quick_start_scripts:

This page describes how to use the TLT CV Inference Pipeline Quick Start Scripts after the
:ref:`Installation Prerequisites<tlt_cv_inference_pipeline_install_prereq>` are installed.

The Quick Start Scripts hide container downloads, updates, model compilation, and more.
Here is a flow diagram of the Quick Start process:

.. image:: ../../content/tlt_cv_inf_pipeline_quick_start_flow.png

First, navigate to the directory that houses the scripts and ensure that they are executable:

.. code-block:: bash

    cd scripts
    chmod +x *.sh

All the scripts should be executed within this :code:`scripts` directory.

These scripts will automatically pull containers and models for x86 or aarch64 (Jetson).

Configuration
^^^^^^^^^^^^^

General configuration for the containers deployed using the Quick Start Scripts
can be viewed in the :code:`config.sh` file.
By default, the configuration file is set to launch all available containers
on the supported GPU, which is selected automatically based on the system architecture.

If you would like to use a video handle, ensure your video device handle
(for example, :code:`/dev/video0`) has been entered in :code:`config.sh` to make it
discoverable to the relevant Client container.

.. Note:: Make note of the resolutions and FPS support for your video handle (e.g. using
          the command :code:`v4l2-ctl --list-formats-ext`).

Models are automatically downloaded to the host machine at the location
(absolute path) specified by the variable :code:`models_location` inside the :code:`config.sh`.
This location becomes important in the context of retraining and replacing the TensorRT models.

By default, deployable TLT models come encrypted with their own keys.
The keys listed in the config are specific to these models and do not need to be modified
unless a user wishes to work with retrained and re-encrypted TLT models.

The :code:`config.sh` file contains a field to specify a volume mount for the sample
applications. This would be useful in the case of a user wanting to modify
applications and saving that new source to the host machine as opposed to the
container (which if exited can result in loss of modifications).

All of the configuration options are documented within the configuration file itself.

.. Note:: The NVIDIA Triton server will be listening/broadcasting on ports :code:`8001`
          for gRPC, :code:`8000` for HTTP, and :code:`8002` for Triton metrics.


Initialization
^^^^^^^^^^^^^^

Run the following:

.. code-block:: bash

    bash tlt_cv_init.sh

The :code:`tlt_cv_init.sh` script will pull all the relevant containers and models
to the machine. It will also download specific 3rd party dependencies to our client
container.

Successful completion of this download will result in the following:

.. code-block:: bash

    [INFO] Finished pulling containers and models

The script will then compile the TLT models into TensorRT models to deploy for
the NVIDIA Triton Server. This step will take up to 10 minutes as it compiles
all the TLT models for the Inference Pipeline. Upon successful completion, you
will see the following:

.. code-block:: bash

    [INFO] SUCCESS: Proceed to 'tlt_cv_start_server.sh'


Launching the Server and Client Containers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Run the following:

.. code-block:: bash

    bash tlt_cv_start_server.sh

This will launch the NVIDIA Triton Server for inference requests. To verify that
the server has started correctly, you can check if the output shows the following:

.. Note:: By default, BodyPoseNet and its accompanying dependencies are not loaded since the user
    must train and deploy their own model.
    Only :code:`bodypose_384x288_ensemble_tlt` and :code:`bodypose_384x288_tlt` should not have
    the :code:`READY` state.
    For more information on how to deploy a trained BodyPoseNet model, refer to
    :ref:`Integrating Body Pose Estimation into the TLT CV Inference Pipeline<tlt_cv_inf_body_pose_estimation>` section.

.. code-block:: bash

    +----------------------------------+---------+----------------------------------------------------------------------------------------+
    | Model                            | Version | Status                                                                                 |
    +----------------------------------+---------+----------------------------------------------------------------------------------------+
    | bodypose_384x288_ensemble_tlt    | -       | Not loaded: No model version was found                                                 |
    | bodypose_384x288_postprocess_tlt | 1       | READY                                                                                  |
    | bodypose_384x288_tlt             | 1       | UNAVAILABLE: Internal: unable to find PLAN model 'model.plan' for bodypose_384x288_tlt |
    | emotionmlp_tlt                   | 1       | READY                                                                                  |
    | facedetect_ensemble_tlt          | 1       | READY                                                                                  |
    | facedetect_postprocess_tlt       | 1       | READY                                                                                  |
    | facedetect_tlt                   | 1       | READY                                                                                  |
    | faciallandmarks_tlt              | 1       | READY                                                                                  |
    | gaze_facegrid_tlt                | 1       | READY                                                                                  |
    | hcgesture_tlt                    | 1       | READY                                                                                  |
    | heartrate_two_branch_tlt         | 1       | READY                                                                                  |
    +----------------------------------+---------+----------------------------------------------------------------------------------------+

    ...

    E0428 23:20:38.947826 1 tritonserver.cc:1629] Internal: failed to load all models
    I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001
    I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
    I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002


To stop the server, use :code:`ctrl-c` in the relevant terminal.

Next, in another terminal, run the following:

.. code-block:: bash

    bash tlt_cv_start_client.sh

This will open an interactive container session with sample applications and
all the necessary libraries. For more information regarding the Inference Pipeline sample applications,
refer to :ref:`Running and Building Sample Applications<tlt_cv_inference_pipeline_sample_applications>`.


Stopping
^^^^^^^^

To stop active containers, run the following:

.. code-block:: bash

    bash tlt_cv_stop.sh

Cleaning
^^^^^^^^

To clean your machine of containers and/or models that were downloaded at init,
run and follow the prompts:

.. code-block:: bash

    bash tlt_cv_clean.sh


Deploying TLT Models
^^^^^^^^^^^^^^^^^^^^

A utility script :code:`tlt_cv_compile.sh` is provided to simplify the deployment of
TLT models into the Inference Pipeline. The models are downloaded to the host
system in the :code:`models_location` specified in :code:`config.sh`. Simply replace the
default "deployable" model with the newly-trained ETLT model in the respective
:code:`tlt_*/` folder while preserving the name, and run one of the commands in
the next section for the new model.

For ease of use, save the encoding key beforehand:

.. code-block:: bash

    export ENCODING_KEY=<key>

.. Note:: Remember to rename the new TLT model to the default and already present
          model.

.. Note:: Default encoding keys for the original deployable TLT models exist in :code:`config.sh`.

The NVIDIA Triton Server points to the :code:`models_location`, so during the next
:code:`tlt_cv_start_server.sh` call, the newly deployed TensorRT model will serve
inferences.

Body Pose Estimation
--------------------

.. _tlt_cv_inf_body_pose_estimation:

BodyPoseNet must be trained and deployed by the user.

Let us say we have a new body pose estimation TLT model that we would like to deploy
for int8 or fp16 inference. Follow these steps to deploy it into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

    source config.sh
    pushd ${models_location}/tlt_bpnet_custom

3. Add the new TLT model in this folder and name it :code:`model.etlt`.
4. (Optional) For int8 calibration, place the calibration file in this same folder and name it :code:`calibration.bin`.
5. Change the directory back to the Quick Start scripts folder and run the following, which uses
   the :code:`tlt-converter` to generate a TensorRT model that will work with the Inference Pipeline.

   Choose which command to execute based on the existence of the calibration file or
   preference. Since the width and height of BodyPoseNet is flexible, one can alter the defaults
   using the :code:`-w` and :code:`-h` flags. Refer to :ref:`Body Pose Estimation Configuration<tlt_cv_inf_body_pose_estimation_configuration>` section
   for more details on further steps when modifying the default values.

   .. code-block:: bash

    popd
    bash tlt_cv_compile.sh -m bodypose_int8 -k $ENCODING_KEY -w 384 -h 288
    bash tlt_cv_compile.sh -m bodypose_fp16 -k $ENCODING_KEY -w 384 -h 288

   This will automatically drop the TensorRT model into the Triton Server models location.
   Ensure that the conversion is successful.

6. Start the Triton Server again and ensure the startup is successful.

   .. code-block:: bash

        +----------------------------------+---------+------------------------------------------+
        | Model                            | Version | Status                                   |
        +----------------------------------+---------+------------------------------------------+
        | ...                              | ...     | ...                                      |
        | bodypose_384x288_ensemble_tlt    | 1       | READY                                    |
        | bodypose_384x288_postprocess_tlt | 1       | READY                                    |
        | bodypose_384x288_tlt             | 1       | READY                                    |
        | ...                              | ...     | ...                                      |


Note that this deployment will overwrite any previous :code:`model.plan` that exists in this
folder in the model repository, and that this location is shared between the int8 and fp16 versions.

Body Pose Estimation Configuration
""""""""""""""""""""""""""""""""""

.. _tlt_cv_inf_body_pose_estimation_configuration:

The BodyPoseNet TLT model can be converted with a wide set of width and height parameters. Refer to the training
documentation for constraints.

Using the Quick Start script :code:`tlt_cv_compile.sh`, the width and height flags simplify this conversion
step. Let us say we want to have a smaller network input size of :code:`width = 320` and :code:`height = 224`.

The first step is to copy the default configurations in Triton to their own folders with different shape for
easier maintainability:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/triton_model_repository/repository
      cp -r bodypose_384x288_ensemble_tlt bodypose_320x224_ensemble_tlt
      cp -r bodypose_384x288_postprocess_tlt bodypose_320x224_postprocess_tlt
      cp -r bodypose_384x288_tlt bodypose_320x224_tlt

Next, we must modify the :code:`config.pbtxt` files for these newly created Triton folders:

1. Use a text editor to modify the ensemble config with a simple find-and-replace of the default width and height:

    .. code-block:: bash

       sed -i 's/384/320/g' bodypose_320x224_ensemble_tlt/config.pbtxt
       sed -i 's/288/224/g' bodypose_320x224_ensemble_tlt/config.pbtxt

    If you would like to perform this manually, the following would need the substitutions:

    - The name of the model
    - The input shape, which should be [Height, Width, Channel]
    - The ensemble schedule for the TensorRT model name
    - The ensemble schedule for the postprocess model name

2. Use a text editor to modify the TensorRT config, using a simple find-and-replace for the default width and height
   and a manual update for the output shapes.

   .. code-block:: bash

      sed -i 's/384/320/g' bodypose_320x224_tlt/config.pbtxt
      sed -i 's/288/224/g' bodypose_320x224_tlt/config.pbtxt

   If you would like to perform this manually, the following would need the substitutions:

   - The name of the model
   - The input shape, which should be [Height, Width, Channel]

   Next, you need to perform the manual calculation of the output dimensions:

   .. code-block:: bash

      vim bodypose_320x224_tlt/config.pbtxt

   The network output :code:`conv2d_transpose_1/BiasAdd:0`
   will have dimension :code:`[Network_Input_Height/2, Network_Input_Width/2, 38]`. The network output :code:`heatmap_out/BiasAdd:0`
   will have dimension :code:`[Network_Input_Height/8, Network_Input_Width/8, 19]`.

   This would become :code:`[224/2, 320/2, 38]` and :code:`[224/8, 320/8, 19]` respectively.
   The final configuration is :code:`[112, 160, 38]` and :code:`[28, 40, 19]` respectively.

3. Use a text editor to modify the postprocess config, using a simple find-and-replace for the default width and height
   and a manual update for the input shapes.

   .. code-block:: bash

      sed -i 's/384/320/g' bodypose_320x224_postprocess_tlt/config.pbtxt
      sed -i 's/288/224/g' bodypose_320x224_postprocess_tlt/config.pbtxt

   If you would like to perform this manually, the following would need the substitutions:

   - The name of the model

   Next, you need to perform the same manual calculation for the postprocess input dimensions.

   .. code-block:: bash

      vim bodypose_320x224_postprocess_tlt/config.pbtxt

   The dimensions for the input :code:`input_pafmap` correspond to the previous :code:`conv2d_transpose_1/BiasAdd:0`.
   The dimensions for the input :code:`input_heatmap` correspond to the previous :code:`heatmap_out/BiasAdd:0`. The
   input dimensions would look like the following:

   .. code-block:: bash

     input [
       {
         name: "input_pafmap"
         data_type: TYPE_FP32
         dims: [112, 160, 38]
       },
       {
         name: "input_heatmap"
         data_type: TYPE_FP32
         dims: [28, 40, 19]
       }
     ]


With the configuration for Triton complete, you can move on to compiling the model.

Assuming you have trained a model and obtained a calibration file per the prior instructions,
you can compile the model as follows:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m bodypose_int8 -k $ENCODING_KEY -w 320 -h 224

This will automatically populate a directory :code:`bodypose_320x224_tlt/1/model.plan` in the :code:`${models_location}`.

You can start the Triton Server. You should see the desired models ready for inference:

   .. code-block:: bash

     +----------------------------------+---------+------------------------------------------+
     | Model                            | Version | Status                                   |
     +----------------------------------+---------+------------------------------------------+
     | ...                              | ...     | ...                                      |
     | bodypose_320x224_ensemble_tlt    | 1       | READY                                    |
     | bodypose_320x224_postprocess_tlt | 1       | READY                                    |
     | bodypose_320x224_tlt             | 1       | READY                                    |
     | ...                              | ...     | ...                                      |


Lastly, before running the sample application with this newly shaped model, you must modify a configuration
file in the client container. Refer to the section :ref:`Running the Body Pose Estimation Sample<tlt_cv_inference_pipeline_sample_applications_body_pose_estimation>`
for more details.


Emotion
-------

Follow these steps deploy a new emotion TLT model into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/tlt_emotionnet_v${tlt_model_version_emotion}

3. Replace the default TLT model in this location and rename it so that the new TLT model will be named "model.etlt".
4. Run the following script, which uses :code:`tlt-converter` to generate a TensorRT model that will work with
   the Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m emotion -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server model location.
   Ensure that the conversion is successful.

5. Start the Triton Server again and ensure the startup is successful.

Face Detect (Pruned and Quantized)
----------------------------------

Let us say we have a new face detect **pruned and quantized** TLT model to deploy for int8 inference.
Follow these steps to deploy it into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/tlt_facenet_v${tlt_model_version_facedetect_int8}

3. Replace the default TLT model in this location and rename it so that the new TLT model will be named :code:`model.etlt`.
4. Replace the default TLT calibration file in this location and rename the new calibration file to "int8_calibration.txt".
5. Change directory back to the Quick Start Scripts folder and run the following script, which uses
   :code:`tlt-converter` to generate a TensorRT model that will work with the Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m facedetect_int8 -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server models location.
   Ensure that the conversion is successful.

6. Start the Triton Server again and ensure the startup is successful.

Note that this deployment will overwrite any previous :code:`model.plan` that exists in the model repository,
and that this location is shared with the pruned-only Face Detect version.

Face Detect (Pruned)
--------------------

Let us say we have a new face detect TLT model that is pruned only, and we would like to deploy it
for fp16 inference. Follow these steps to deploy it into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/tlt_facenet_v${tlt_model_version_facedetect_fp16}

3. Replace the default TLT model in this location and rename the new TLT model to :code:`model.etlt`.
4. Change directory back to the Quick Start Scripts folder and run the following script, which uses
   :code:`tlt-converter` to generate a TensorRT model that will work with the Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m facedetect_fp16 -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server models location.
   Ensure that the conversion is successful.

5. Start the Triton Server again and ensure the startup is successful.

Note that this deployment will overwrite any previous :code:`model.plan` that exists in the model repository,
and that this location is shared with the pruned and quantized Face Detect version.

Facial Landmarks
----------------

Follow these steps to deploy a facial landmarks TLT model into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

       source config.sh
       pushd ${models_location}/tlt_fpenet_v${tlt_model_version_faciallandmarks}

3. Replace the default TLT model in this location and rename the new TLT model to :code:`model.etlt`.
4. Run the following script, which uses :code:`tlt-converter` to generate a TensorRT model that will work with the
   Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m faciallandmarks -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server models location.
   Ensure that the conversion is successful.

.. Note:: By default, the TLT CV Inference Pipeline assumes 80 landmarks from the TensorRT model.
          For example, for a newly trained TLT model with 68 output landmarks,
          you must modify the Triton configuration at :code:`${models_location}/triton_model_repository/faciallandmarks_tlt/config.pbtxt`.
          Ensure that both outputs (not inputs) are changed to 68 (or the corresponding output of the new model).

5. Start the Triton Server again and ensure startup is successful.

Gaze
----

Follow these steps to deploy a new gaze TLT model into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/tlt_gazenet_v${tlt_model_version_gaze}

3. Replace the default TLT model in this location and rename the new TLT model to :code:`model.etlt`.
4. Run the following script, which uses :code:`tlt-converter` to generate a TensorRT model that will
   work with the Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m gaze -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server models location.
   Ensure that the conversion is successful.

5. Start the Triton Server again and ensure the startup is successful.


Gesture
-------

Follow these steps to deploy a new gesture TLT model into the TLT CV Inference Pipeline:

1. Stop the Triton Server using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/tlt_gesturenet_v${tlt_model_version_gesture}

3. Replace the default TLT model in this location and rename the new TLT model to :code:`model.etlt`.
4. Run the following script, which uses :code:`tlt-converter` to generate a TensorRT model that will
   work with the Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m gesture -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server model location.
   Ensure that the conversion is successful.

5. We can start the Triton Server again and ensure the startup is successful.


Heart Rate
----------

Follow these steps to deploy a new heart rate TLT model into the TLT CV Inference Pipeline:

1. Stop the Triton Server by using :code:`ctrl-c`.
2. Enter the Quick Start Scripts folder, use :code:`source` on :code:`config.sh`, and change directories into the model location:

   .. code-block:: bash

      source config.sh
      pushd ${models_location}/tlt_heartratenet_v${tlt_model_version_heartrate}

3. Replace the default TLT model in this location and rename the new TLT model to :code:`model.etlt`.
4. Run the following script, which uses :code:`tlt-converter` to generate a TensorRT model that will
   work with the Inference Pipeline:

   .. code-block:: bash

      popd
      bash tlt_cv_compile.sh -m heartrate -k $ENCODING_KEY

   This will automatically drop the TensorRT model into the Triton Server model location.
   Ensure that the conversion is successful.

5. Start the Triton Server again and ensure the startup is successful.