Here is an example spec $TRT_GEN_SPEC for generating TensorRT engine from the exported MLRecogNet onnx model.

The trt_config parameter provides options related to TensorRT generation.

Copy Copied! results_dir: /path/to/results/dir dataset: val_dataset: reference: /path/to/reference/set query: /path/to/query/set pixel_mean: [0.485, 0.456, 0.406] pixel_std: [0.226, 0.226, 0.226] model: input_channel: 3 input_width: 224 input_height: 224 gen_trt_engine: gpu_id: 0 onnx_file: /path/to/exported/onnx/file trt_engine: /path/to/trt/engine/to/generate tensorrt: data_type: int8 workspace_size: 1024 min_batch_size: 1 opt_batch_size: 10 max_batch_size: 10 calibration: cal_cache_file: /path/to/calibration/cache/file/to/generate cal_batch_size: 16 cal_batches: 100 cal_image_dir: - /path/to/calibration/image/folder

Parameter Datatype Default Description Supported Values data_type string FP32 The precision to be used for the TensorRT engine FP32/FP16/INT8 workspace_size unsigned int 1024 The maximum workspace size for the TensorRT engine >1024 min_batch_size unsigned int 1 The minimum batch size for optimization profile shape >0 opt_batch_size unsigned int 1 The optimal batch size for optimization profile shape >0 max_batch_size unsigned int 1 The maximum batch size for optimization profile shape >0 calibration dict config None The configuration for the INT8 calibration

Calibration Config

Parameter Datatype Default Description Supported Values cal_cache_file string None The path to calibration cache file. If there’s no calibration cache file at this path, a cache file is generated based on the the other calibration config parameters. cal_batch_size unsigned int 1 the batch size of calibration dataset >0 cal_batches unsigned int 1 The number of batches used for calibration. In total, there are cal_batches`x:code:`cal_batch_size calibration images used. >0 cal_image_dir string None The directory containing the calibration images

Use the following command to run MLRecogNet engine generation:

Copy Copied! tao deploy ml_recog gen_trt_engine -e /path/to/spec.yaml \ gen_trt_engine.onnx_file=/path/to/onnx/file \ gen_trt_engine.trt_engine=/path/to/engine/file \ gen_trt_engine.tensorrt.data_type=<data_type>

-e, --experiment_spec : The experiment spec file to set up the TensorRT engine generation. This should be the same as the export specification file.

gen_trt_engine.onnx_file : The .onnx model to be converted.

gen_trt_engine.trt_engine : The path where the generated engine will be stored.

gen_trt_engine.tensorrt.data_type : MLRecogNet supports FP32, FP16 and INT8 TensorRT engine generation. When using INT8, you must provide the calibration dataset or calibration cache file.

Here’s an example of using the gen_trt_engine command to generate an FP16 TensorRT engine:

Copy Copied! tao model metric_learning_recognition gen_trt_engine -e $TRT_GEN_SPEC gen_trt_engine.onnx_file=$ONNX_FILE \ gen_trt_engine.trt_engine=$ENGINE_FILE \ gen_trt_engine.tensorrt.data_type=FP16

Here’s an example of output $RESULTS_DIR/status.json :

Copy Copied! {"date": "6/22/2023", "time": "18:17:11", "status": "STARTED", "verbosity": "INFO", "message": "Starting ml_recog gen_trt_engine."} {"date": "6/22/2023", "time": "18:17:30", "status": "SUCCESS", "verbosity": "INFO", "message": "Gen_trt_engine finished successfully."}

The output log example is shown below:

Copy Copied! Starting ml_recog gen_trt_engine. [06/22/2023-18:17:12] [TRT] [I] [MemUsageChange] Init CUDA: CPU +318, GPU +0, now: CPU 356, GPU 1003 (MiB) [06/22/2023-18:17:14] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +443, GPU +116, now: CPU 853, GPU 1119 (MiB) [06/22/2023-18:17:14] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars Parsing ONNX model [06/22/2023-18:17:14] [TRT] [W] The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network. [06/22/2023-18:17:15] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Network Description Input 'input' with shape (-1, 3, 224, 224) and dtype DataType.FLOAT Output 'fc_pred' with shape (-1, 256) and dtype DataType.FLOAT dynamic batch size handling TensorRT engine build configurations: OptimizationProfile: "input": (1, 3, 224, 224), (10, 3, 224, 224), (10, 3, 224, 224) BuilderFlag.TF32 Note: max representabile value is 2,147,483,648 bytes or 2GB. MemoryPoolType.WORKSPACE = 1073741824 bytes MemoryPoolType.DLA_MANAGED_SRAM = 0 bytes MemoryPoolType.DLA_LOCAL_DRAM = 1073741824 bytes MemoryPoolType.DLA_GLOBAL_DRAM = 536870912 bytes Tactic Sources = 31 [06/22/2023-18:17:17] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +854, GPU +362, now: CPU 1800, GPU 1481 (MiB) [06/22/2023-18:17:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1926, GPU 1539 (MiB) [06/22/2023-18:17:17] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [06/22/2023-18:17:22] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes. [06/22/2023-18:17:30] [TRT] [I] Total Activation Memory: 1565556736 [06/22/2023-18:17:30] [TRT] [I] Detected 1 inputs and 1 output network tensors. [06/22/2023-18:17:30] [TRT] [I] Total Host Persistent Memory: 132192 [06/22/2023-18:17:30] [TRT] [I] Total Device Persistent Memory: 140288 [06/22/2023-18:17:30] [TRT] [I] Total Scratch Memory: 134217728 [06/22/2023-18:17:30] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 658 MiB [06/22/2023-18:17:30] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 91 steps to complete. [06/22/2023-18:17:30] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 1.66392ms to assign 5 blocks to 91 nodes requiring 184394240 bytes. [06/22/2023-18:17:30] [TRT] [I] Total Activation Memory: 184394240 [06/22/2023-18:17:30] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2491, GPU 1889 (MiB) [06/22/2023-18:17:30] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +101, now: CPU 0, GPU 101 (MiB) Export finished successfully. Gen_trt_engine finished successfully.



