MLRecogNet with TAO Deploy#

To generate an optimized TensorRT engine, the MLRecogNet .onnx file, which is generated using tao export, is taken as an input to tao-deploy. Currently, MLRecogNet supports FP32, FP16 and INT8 data types.

For more information about training an MLRecogNet model, refer to the MLRecogNet training documentation.

Each task is explained in detail in the following sections.

Note

  • Throughout this documentation, you will see references to $EXPERIMENT_ID and $DATASET_ID in the FTMS Client sections.

    • For instructions on creating a dataset using the remote client, see the Creating a dataset section in the Remote Client documentation.

    • For instructions on creating an experiment using the remote client, see the Creating an experiment section in the Remote Client documentation.

  • The spec format is YAML for TAO Launcher and JSON for FTMS Client.

  • File-related parameters, such as dataset paths or pretrained model paths, are required only for TAO Launcher and not for FTMS Client.

Converting ONNX File into TensorRT Engine#

Here is an example spec $TRT_GEN_SPEC for generating TensorRT engine from the exported MLRecogNet onnx model.

trt_config#

The trt_config parameter provides options related to TensorRT generation.

SPECS=$(tao-client ml_recog get-spec --action gen_trt_engine --id $EXPERIMENT_ID)

Parameter

Datatype

Default

Description

Supported Values

data_type

string

FP32

The precision to be used for the TensorRT engine

FP32/FP16/INT8

workspace_size

unsigned int

1024

The maximum workspace size for the TensorRT engine

>1024

min_batch_size

unsigned int

1

The minimum batch size for optimization profile shape

>0

opt_batch_size

unsigned int

1

The optimal batch size for optimization profile shape

>0

max_batch_size

unsigned int

1

The maximum batch size for optimization profile shape

>0

calibration

dict config

None

The configuration for the INT8 calibration

Calibration Config#

Parameter

Datatype

Default

Description

Supported Values

cal_cache_file

string

None

The path to calibration cache file. If there’s no calibration cache file at this path, a cache file is generated based on the the other calibration config parameters.

cal_batch_size

unsigned int

1

the batch size of calibration dataset

>0

cal_batches

unsigned int

1

The number of batches used for calibration. In total, there are cal_batches`x:code:`cal_batch_size calibration images used.

>0

cal_image_dir

string

None

The directory containing the calibration images

Use the following command to run MLRecogNet engine generation:

SPECS=$(tao-client ml_recog experiment-run-action --action gen_trt_engine --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $EXPORT_JOB_ID)

Note

$EXPORT_JOB_ID is the job ID of the Exporting the model section.

The output log example is shown below:

Starting ml_recog gen_trt_engine.
[06/22/2023-18:17:12] [TRT] [I] [MemUsageChange] Init CUDA: CPU +318, GPU +0, now: CPU 356, GPU 1003 (MiB)
[06/22/2023-18:17:14] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +443, GPU +116, now: CPU 853, GPU 1119 (MiB)
[06/22/2023-18:17:14] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Parsing ONNX model
[06/22/2023-18:17:14] [TRT] [W] The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
[06/22/2023-18:17:15] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Network Description
Input 'input' with shape (-1, 3, 224, 224) and dtype DataType.FLOAT
Output 'fc_pred' with shape (-1, 256) and dtype DataType.FLOAT
dynamic batch size handling
TensorRT engine build configurations:
  OptimizationProfile:
    "input": (1, 3, 224, 224), (10, 3, 224, 224), (10, 3, 224, 224)

  BuilderFlag.TF32

  Note: max representabile value is 2,147,483,648 bytes or 2GB.
  MemoryPoolType.WORKSPACE = 1073741824 bytes
  MemoryPoolType.DLA_MANAGED_SRAM = 0 bytes
  MemoryPoolType.DLA_LOCAL_DRAM = 1073741824 bytes
  MemoryPoolType.DLA_GLOBAL_DRAM = 536870912 bytes

  Tactic Sources = 31
[06/22/2023-18:17:17] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +854, GPU +362, now: CPU 1800, GPU 1481 (MiB)
[06/22/2023-18:17:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1926, GPU 1539 (MiB)
[06/22/2023-18:17:17] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/22/2023-18:17:22] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[06/22/2023-18:17:30] [TRT] [I] Total Activation Memory: 1565556736
[06/22/2023-18:17:30] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[06/22/2023-18:17:30] [TRT] [I] Total Host Persistent Memory: 132192
[06/22/2023-18:17:30] [TRT] [I] Total Device Persistent Memory: 140288
[06/22/2023-18:17:30] [TRT] [I] Total Scratch Memory: 134217728
[06/22/2023-18:17:30] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 658 MiB
[06/22/2023-18:17:30] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 91 steps to complete.
[06/22/2023-18:17:30] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 1.66392ms to assign 5 blocks to 91 nodes requiring 184394240 bytes.
[06/22/2023-18:17:30] [TRT] [I] Total Activation Memory: 184394240
[06/22/2023-18:17:30] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2491, GPU 1889 (MiB)
[06/22/2023-18:17:30] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +101, now: CPU 0, GPU 101 (MiB)
Export finished successfully.
Gen_trt_engine finished successfully.

Running Evaluation through TensorRT Engine#

Same spec file as TAO evaluation spec file. The following is a sample spec file $EVAL_SPEC:

SPECS=$(tao-client ml_recog get-spec --action evaluate --id $EXPERIMENT_ID)

Use the following command to run Deformable DETR engine evaluation:

SPECS=$(tao-client ml_recog experiment-run-action --action evaluate --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $GEN_TRT_ENGINE_JOB_ID)

The output log example is shown below:

Starting ml_recog evaluation.
[06/22/2023-20:41:53] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[06/22/2023-20:41:53] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/22/2023-20:41:53] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Loading gallery dataset...
...
Top 1 scores: 0.9958333333333333
Top 5 scores: 1.0
Confusion Matrix
[[ 34   0   0   0   0]
[  0 106   0   0   0]
[  0   0  29   0   0]
[  0   0   0  31   0]
[  0   0   0   1  47]]
Classification Report
              precision    recall  f1-score   support

    c000001       1.00      1.00      1.00        34
    c000002       1.00      1.00      1.00       106
    c000003       1.00      1.00      1.00        29
    c000004       0.97      1.00      0.98        31
    c000005       1.00      0.98      0.99        48

    accuracy                          1.00       248
  macro avg       0.99      1.00      0.99       248
weighted avg      1.00      1.00      1.00       248

Finished evaluation.
Evaluation finished successfully.

Running Inference through TensorRT Engine#

Same spec file as TAO inference spec file. Sample spec file $INFERENCE_SPEC:

SPECS=$(tao-client ml_recog get-spec --action inference --id $EXPERIMENT_ID)

Use the following command to run MLRecogNet engine inference:

SPECS=$(tao-client ml_recog experiment-run-action --action inference --id $EXPERIMENT_ID --specs "$SPECS" --parent_job_id $GEN_TRT_ENGINE_JOB_ID)

The output log example is shown below:

Starting ml_recog inference.
[06/22/2023-20:46:39] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[06/22/2023-20:46:39] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/22/2023-20:46:39] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Loading gallery dataset...
...
Finished inference.
Inference finished successfully.