TAO Toolkit CV Inference Pipeline Quick Start Scripts

This page describes how to use the TAO Toolkit CV Inference Pipeline Quick Start Scripts after the Installation Prerequisites are installed.

The Quick Start Scripts hide container downloads, updates, model compilation, and more. Here is a flow diagram of the Quick Start process:

../../_images/tao_cv_inf_pipeline_quick_start_flow.png

First, navigate to the directory that houses the scripts and ensure that they are executable:

cd scripts
chmod +x *.sh

All the scripts should be executed within this scripts directory.

These scripts will automatically pull containers and models for x86 or aarch64 (Jetson).

Configuration

General configuration for the containers deployed using the Quick Start Scripts can be viewed in the config.sh file. By default, the configuration file is set to launch all available containers on the supported GPU, which is selected automatically based on the system architecture.

If you would like to use a video handle, ensure your video device handle (for example, /dev/video0) has been entered in config.sh to make it discoverable to the relevant Client container.

Note

Make note of the resolutions and FPS support for your video handle (e.g. using the command v4l2-ctl --list-formats-ext).

Models are automatically downloaded to the host machine at the location (absolute path) specified by the variable models_location inside the config.sh. This location becomes important in the context of retraining and replacing the TensorRT models.

By default, deployable TAO models come encrypted with their own keys. The keys listed in the config are specific to these models and do not need to be modified unless a user wishes to work with retrained and re-encrypted TAO models.

The config.sh file contains a field to specify a volume mount for the sample applications. This would be useful in the case of a user wanting to modify applications and saving that new source to the host machine as opposed to the container (which if exited can result in loss of modifications).

All of the configuration options are documented within the configuration file itself.

Note

The NVIDIA Triton server will be listening/broadcasting on ports 8001 for gRPC, 8000 for HTTP, and 8002 for Triton metrics.

Initialization

Run the following:

bash tao_cv_init.sh

The tao_cv_init.sh script will pull all the relevant containers and models to the machine. It will also download specific 3rd party dependencies to our client container.

Successful completion of this download will result in the following:

[INFO] Finished pulling containers and models

The script will then compile the TAO models into TensorRT models to deploy for the NVIDIA Triton Server. This step will take up to 10 minutes as it compiles all the TAO models for the Inference Pipeline. Upon successful completion, you will see the following:

[INFO] SUCCESS: Proceed to 'tao_cv_start_server.sh'

Launching the Server and Client Containers

Run the following:

bash tao_cv_start_server.sh

This will launch the NVIDIA Triton Server for inference requests. To verify that the server has started correctly, you can check if the output shows the following:

+----------------------------------+---------+-------------+
| Model                            | Version | Status      |
+----------------------------------+---------+-------------+
| bodypose_384x288_ensemble_tlt    | -       | READY       |
| bodypose_384x288_postprocess_tlt | 1       | READY       |
| bodypose_384x288_tlt             | 1       | READY       |
| emotionmlp_tlt                   | 1       | READY       |
| facedetect_ensemble_tlt          | 1       | READY       |
| facedetect_postprocess_tlt       | 1       | READY       |
| facedetect_tlt                   | 1       | READY       |
| faciallandmarks_tlt              | 1       | READY       |
| gaze_facegrid_tlt                | 1       | READY       |
| hcgesture_tlt                    | 1       | READY       |
| heartrate_two_branch_tlt         | 1       | READY       |
+----------------------------------+---------+-------------+

...

I0428 23:20:38.955865 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001
I0428 23:20:38.957249 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
I0428 23:20:38.999728 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002

To stop the server, use ctrl-c in the relevant terminal.

Next, in another terminal, run the following:

bash tao_cv_start_client.sh

This will open an interactive container session with sample applications and all the necessary libraries. For more information regarding the Inference Pipeline sample applications, refer to Running and Building Sample Applications.

Stopping

To stop active containers, run the following:

bash tao_cv_stop.sh

Cleaning

To clean your machine of containers and/or models that were downloaded at init, run and follow the prompts:

bash tao_cv_clean.sh

Deploying TAO Models

A utility script tao_cv_compile.sh is provided to simplify the deployment of TAO models into the Inference Pipeline. The models are downloaded to the host system in the models_location specified in config.sh. Simply replace the default “deployable” model with the newly-trained ETAO model in the respective tao_*/ folder while preserving the name, and run one of the commands in the next section for the new model.

For ease of use, save the encoding key beforehand:

export ENCODING_KEY=<key>

Note

Remember to rename the new TAO model to the default and already present model.

Note

Default encoding keys for the original deployable TAO models exist in config.sh.

The NVIDIA Triton Server points to the models_location, so during the next tao_cv_start_server.sh call, the newly deployed TensorRT model will serve inferences.

Body Pose Estimation

Let us say we have a new body pose estimation TAO model that we would like to deploy for int8 or fp16 inference. Follow these steps to deploy it into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/bodyposenet_v${tao_model_version_bodypose}
    
  3. Add the new TAO model in this folder and name it model.etlt.

  4. (Optional) For int8 calibration, place the calibration file in this same folder and name it calibration.bin.

  5. Change the directory back to the Quick Start scripts folder and run the following, which uses the tao-converter to generate a TensorRT model that will work with the Inference Pipeline.

    Choose which command to execute based on the existence of the calibration file or preference. Since the width and height of BodyPoseNet is flexible, one can alter the defaults using the -w and -h flags. Refer to Body Pose Estimation Configuration section for more details on further steps when modifying the default values.

    popd
    bash tao_cv_compile.sh -m bodypose_int8 -k $ENCODING_KEY -w 384 -h 288
    bash tao_cv_compile.sh -m bodypose_fp16 -k $ENCODING_KEY -w 384 -h 288
    

    This will automatically drop the TensorRT model into the Triton Server models location. Ensure that the conversion is successful.

  6. Start the Triton Server again and ensure the startup is successful.

    +----------------------------------+---------+------------------------------------------+
    | Model                            | Version | Status                                   |
    +----------------------------------+---------+------------------------------------------+
    | ...                              | ...     | ...                                      |
    | bodypose_384x288_ensemble_tlt    | 1       | READY                                    |
    | bodypose_384x288_postprocess_tlt | 1       | READY                                    |
    | bodypose_384x288_tlt             | 1       | READY                                    |
    | ...                              | ...     | ...                                      |
    

Note that this deployment will overwrite any previous model.plan that exists in this folder in the model repository, and that this location is shared between the int8 and fp16 versions.

Body Pose Estimation Configuration

The BodyPoseNet TAO model can be converted with a wide set of width and height parameters. Refer to the training documentation for constraints.

Using the Quick Start script tao_cv_compile.sh, the width and height flags simplify this conversion step. Let us say we want to have a smaller network input size of width = 320 and height = 224.

The first step is to copy the default configurations in Triton to their own folders with different shape for easier maintainability:

source config.sh
pushd ${models_location}/triton_model_repository/repository
cp -r bodypose_384x288_ensemble_tlt bodypose_320x224_ensemble_tlt
cp -r bodypose_384x288_postprocess_tlt bodypose_320x224_postprocess_tlt
cp -r bodypose_384x288_tlt bodypose_320x224_tlt

Next, we must modify the config.pbtxt files for these newly created Triton folders:

  1. Use a text editor to modify the ensemble config with a simple find-and-replace of the default width and height:

    sed -i 's/384/320/g' bodypose_320x224_ensemble_tlt/config.pbtxt
    sed -i 's/288/224/g' bodypose_320x224_ensemble_tlt/config.pbtxt
    

    If you would like to perform this manually, the following would need the substitutions:

    • The name of the model

    • The input shape, which should be [Height, Width, Channel]

    • The ensemble schedule for the TensorRT model name

    • The ensemble schedule for the postprocess model name

  2. Use a text editor to modify the TensorRT config, using a simple find-and-replace for the default width and height and a manual update for the output shapes.

    sed -i 's/384/320/g' bodypose_320x224_tlt/config.pbtxt
    sed -i 's/288/224/g' bodypose_320x224_tlt/config.pbtxt
    

    If you would like to perform this manually, the following would need the substitutions:

    • The name of the model

    • The input shape, which should be [Height, Width, Channel]

    Next, you need to perform the manual calculation of the output dimensions:

    vim bodypose_320x224_tlt/config.pbtxt
    

    The network output conv2d_transpose_1/BiasAdd:0 will have dimension [Network_Input_Height/2, Network_Input_Width/2, 38]. The network output heatmap_out/BiasAdd:0 will have dimension [Network_Input_Height/8, Network_Input_Width/8, 19].

    This would become [224/2, 320/2, 38] and [224/8, 320/8, 19] respectively. The final configuration is [112, 160, 38] and [28, 40, 19] respectively.

  3. Use a text editor to modify the postprocess config, using a simple find-and-replace for the default width and height and a manual update for the input shapes.

    sed -i 's/384/320/g' bodypose_320x224_postprocess_tlt/config.pbtxt
    sed -i 's/288/224/g' bodypose_320x224_postprocess_tlt/config.pbtxt
    

    If you would like to perform this manually, the following would need the substitutions:

    • The name of the model

    Next, you need to perform the same manual calculation for the postprocess input dimensions.

    vim bodypose_320x224_postprocess_tlt/config.pbtxt
    

    The dimensions for the input input_pafmap correspond to the previous conv2d_transpose_1/BiasAdd:0. The dimensions for the input input_heatmap correspond to the previous heatmap_out/BiasAdd:0. The input dimensions would look like the following:

    input [
      {
        name: "input_pafmap"
        data_type: TYPE_FP32
        dims: [112, 160, 38]
      },
      {
        name: "input_heatmap"
        data_type: TYPE_FP32
        dims: [28, 40, 19]
      }
    ]
    

With the configuration for Triton complete, you can move on to compiling the model.

Assuming you have trained a model and obtained a calibration file per the prior instructions, you can compile the model as follows:

popd
bash tao_cv_compile.sh -m bodypose_int8 -k $ENCODING_KEY -w 320 -h 224

This will automatically populate a directory bodypose_320x224_tlt/1/model.plan in the ${models_location}.

You can start the Triton Server. You should see the desired models ready for inference:

+----------------------------------+---------+------------------------------------------+
| Model                            | Version | Status                                   |
+----------------------------------+---------+------------------------------------------+
| ...                              | ...     | ...                                      |
| bodypose_320x224_ensemble_tlt    | 1       | READY                                    |
| bodypose_320x224_postprocess_tlt | 1       | READY                                    |
| bodypose_320x224_tlt             | 1       | READY                                    |
| ...                              | ...     | ...                                      |

Lastly, before running the sample application with this newly shaped model, you must modify a configuration file in the client container. Refer to the section Running the Body Pose Estimation Sample for more details.

Emotion

Follow these steps deploy a new emotion TAO model into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/emotionnet_v${tao_model_version_emotion}
    
  3. Replace the default TAO model in this location and rename it so that the new TAO model will be named “model.etlt”.

  4. Run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m emotion -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server model location. Ensure that the conversion is successful.

  5. Start the Triton Server again and ensure the startup is successful.

Face Detect (Pruned and Quantized)

Let us say we have a new face detect pruned and quantized TAO model to deploy for int8 inference. Follow these steps to deploy it into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/facenet_v${tao_model_version_facedetect_int8}
    
  3. Replace the default TAO model in this location and rename it so that the new TAO model will be named model.etlt.

  4. Replace the default TAO calibration file in this location and rename the new calibration file to “int8_calibration.txt”.

  5. Change directory back to the Quick Start Scripts folder and run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m facedetect_int8 -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server models location. Ensure that the conversion is successful.

  6. Start the Triton Server again and ensure the startup is successful.

Note that this deployment will overwrite any previous model.plan that exists in the model repository, and that this location is shared with the pruned-only Face Detect version.

Face Detect (Pruned)

Let us say we have a new face detect TAO model that is pruned only, and we would like to deploy it for fp16 inference. Follow these steps to deploy it into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/facenet_v${tao_model_version_facedetect_fp16}
    
  3. Replace the default TAO model in this location and rename the new TAO model to model.etlt.

  4. Change directory back to the Quick Start Scripts folder and run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m facedetect_fp16 -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server models location. Ensure that the conversion is successful.

  5. Start the Triton Server again and ensure the startup is successful.

Note that this deployment will overwrite any previous model.plan that exists in the model repository, and that this location is shared with the pruned and quantized Face Detect version.

Facial Landmarks

Follow these steps to deploy a facial landmarks TAO model into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/fpenet_v${tao_model_version_faciallandmarks}
    
  3. Replace the default TAO model in this location and rename the new TAO model to model.etlt.

  4. Run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m faciallandmarks -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server models location. Ensure that the conversion is successful.

Note

By default, the TAO Toolkit CV Inference Pipeline assumes 80 landmarks from the TensorRT model. For example, for a newly trained TAO model with 68 output landmarks, you must modify the Triton configuration at ${models_location}/triton_model_repository/faciallandmarks_tlt/config.pbtxt. Ensure that both outputs (not inputs) are changed to 68 (or the corresponding output of the new model).

  1. Start the Triton Server again and ensure startup is successful.

Gaze

Follow these steps to deploy a new gaze TAO model into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/gazenet_v${tao_model_version_gaze}
    
  3. Replace the default TAO model in this location and rename the new TAO model to model.etlt.

  4. Run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m gaze -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server models location. Ensure that the conversion is successful.

  5. Start the Triton Server again and ensure the startup is successful.

Gesture

Follow these steps to deploy a new gesture TAO model into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/gesturenet_v${tao_model_version_gesture}
    
  3. Replace the default TAO model in this location and rename the new TAO model to model.etlt.

  4. Run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m gesture -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server model location. Ensure that the conversion is successful.

  5. We can start the Triton Server again and ensure the startup is successful.

Heart Rate

Follow these steps to deploy a new heart rate TAO model into the TAO Toolkit CV Inference Pipeline:

  1. Stop the Triton Server by using ctrl-c.

  2. Enter the Quick Start Scripts folder, use source on config.sh, and change directories into the model location:

    source config.sh
    pushd ${models_location}/heartratenet_v${tao_model_version_heartrate}
    
  3. Replace the default TAO model in this location and rename the new TAO model to model.etlt.

  4. Run the following script, which uses tao-converter to generate a TensorRT model that will work with the Inference Pipeline:

    popd
    bash tao_cv_compile.sh -m heartrate -k $ENCODING_KEY
    

    This will automatically drop the TensorRT model into the Triton Server model location. Ensure that the conversion is successful.

  5. Start the Triton Server again and ensure the startup is successful.