DriveWorks SDK Reference
4.0.0 Release
For Test and Development only

tools/tensorRT_optimization/README-tensorRT_optimization.md
Go to the documentation of this file.
1 # Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.
2 
3 @page dwx_tensorRT_tool TensorRT Optimizer Tool
4 @tableofcontents
5 
6 @section dwx_tensorRT_tool_description Description
7 
8 The NVIDIA<sup>&reg;</sup> DriveWorks TensorRT Optimizer Tool enables optimization for a given Caffe, UFF or ONNX model using TensorRT.
9 
10 For specific examples, please refer to the following:
11 
12 - @ref dwx_tensorRT_tool_examples_uff.
13 - @ref dwx_tensorRT_tool_examples_caffe.
14 - @ref dwx_tensorRT_tool_examples_onnx.
15 
16 @section dwx_tensorRT_tool_prerequisites Prerequisites
17 
18 This tool is available on NVIDIA DRIVE<sup>&trade;</sup> OS Linux.
19 
20 This tool creates output files that are placed into the current working directory by default. Please ensure the following for your convenience:
21 
22 - Write permissions are enabled for the current working directory.
23 - Include the tools folder in the binary search path of the system.
24 - Execute from your home directory.
25 
26 @section dwx_tensorRT_tool_usage Running the Tool
27 
28 The TensorRT Optimization tool accepts the following parameters. Several of these parameters are required based on model type. \n
29 For more information, please refer to the @ref dwx_tensorRT_tool_examples.
30 
31 Run the tool by executing:
32 
33  ./tensorRT_optimization --modelType=[uff|caffe|onnx]
34  --outputBlobs=[<output_blob1>,<output_blob2>,...]
35  --prototxt=[path to file]
36  --caffemodel=[path to file]
37  --uffFile=[path to file]
38  --inputBlobs=[<input_blob1>,<input_blob2>,...]
39  --inputDims=[<NxNxN>,<NxNxN>,...]
40  --onxxFile=[path to file]
41  [--iterations=[int]]
42  [--batchSize=[int]]
43  [--half2=[int]]
44  [--out=[path to file]]
45  [--int8]
46  [--calib=[calibration file name]]
47  [--cudaDevice=[CUDA GPU index]]
48  [--useDLA]
49  [--useSafeDLA]
50  [--dlaLayerConfig=[path to json layer config]]
51  [--pluginConfig=[path to plugin config file]]
52  [--precisionConfig=[path to precision config file]]
53  [--testFile=[path to binary file]]
54  [--useGraph=[int]]
55  [--workspaceSize=[int]]
56  [--explicitBatch=[int]]
57 
58 @subsection dwx_tensorRT_tool_params Parameters
59 
60  --modelType=[uff|caffe|onnx]
61  Description: The type of model to be converted to the TensorRT network.
62  Warning: uff and caffe model types are deprecated and will be dropped in the next major release.
63 
64  --outputBlobs=[<output_blob1>,<output_blob2>,...]
65  Description: Names of output blobs combined with commas.
66  Example: --outputBlobs=bboxes,coverage
67 
68  --prototxt=[path to file]
69  Description: Deploys a file that describes the Caffe network.
70  Example: --prototxt=deploy.prototxt
71 
72  --caffemodel=[path to file]
73  Description: Caffe model file containing weights.
74  Example: --caffemodel=weights.caffemodel
75 
76  --outputBlobs=[<output_blob1>,<output_blob2>,...]
77  Description: Names of output blobs combined with commas.
78  Example: --outputBlobs=bboxes,coverage
79 
80  --uffFile=[path to file]
81  Description: Path to a UFF file.
82  Example: --uffFile=~/myNetwork.uff
83 
84  --inputBlobs=[<input_blob1>,<input_blob2>,...]
85  Description: Names of input blobs combined with commas. Ignored if the model is ONNX or Caffe.
86  Example: --inputBlobs=data0,data1
87 
88  --inputDims=[<NxNxN>,<NxNxN>,...]
89  Description: Input dimensions for each input blob separated by commas, given in the same
90  order as the input blobs.
91  Dimensions are separated by `x`, and given in CHW format.
92  Example: --inputDims=3x480x960,1x1x10
93 
94  --onxxFile=[path to file]
95  Description: Path to an ONNX file.
96  Example: --onnxFile=~/myNetwork.onnx
97 
98  --iterations=[int]
99  Description: Number of iterations to run to measure speed.
100  This parameter is optional.
101  Example: --iterations=100
102  Default value: 10
103 
104  --batchSize=[int]
105  Description: Batch size of the model to be generated.
106  This parameter is optional.
107  Example: --batchSize=2
108  Default value: 1
109 
110  --half2=[int]
111  Description: The network running in paired fp16 mode. Requires platform to support native fp16.
112  This parameter is optional.
113  Example: --half2=1
114  Default value: 0
115 
116  --out=[path to file]
117  Description: Name of the optimized model file.
118  This parameter is optional.
119  Example: --out=model.bin
120  Default value: optimized.bin
121 
122  --int8
123  Description: If specified, run in INT8 mode.
124  This parameter is optional.
125 
126  --calib=[calibration file name]
127  Description: INT8 calibration file name.
128  This parameter is optional.
129  Example: --calib=calib.cache
130 
131  --cudaDevice=[CUDA GPU index]
132  Description: Index of a CUDA capable GPU device.
133  This parameter is optional.
134  Example: --cudaDevice=1
135  Default value: 0
136 
137  --verbose = [int]
138  Description: Enable tensorRT verbose logging.
139  This parameter is optional
140  Default value: 0
141 
142  --useDLA
143  Description: If specified, this generates a model to be executed on DLA. This argument is only valid on platforms with DLA hardware.
144  This parameter is optional.
145 
146  --useSafeDLA
147  Description: If specified, this generates a model to be executed on DLA.
148  The safe mode indicates all layers must be executable on DLA, the input/output of the DNN module
149  must be provided in the corresponding precision and format, and the input/output tensors must be provided
150  as NvMediaTensor for best performance.
151  `dwDNN` module is capable of streaming NvMediaTensors from/to CUDA and
152  converting precisions and format. For more information, please refer to `dwDNN` module's documentation.
153 
154  --dlaLayerConfig
155  Descripton: If specified, specific layers to be forced to GPU are read from this json. Layers to be run on GPU can be specified by type of layer or layer number. Layer type and layer number can be obtained from logs by running with default template. This argument is valid only if --useDLA=1
156  This parameter is optional.
157  Example: --dlaLayerConfig=./template_dlaconfig.json
158 
159  --pluginConfig=[path to plugin config file]
160  Description: Path to plugin configuration file. See template_plugin.json for an example.
161  This parameter is optional.
162  Example: --pluginConfig=template_plugin.json
163 
164  --precisionConfig=[path to precision config file]
165  Description: Path to a precision configuration file for generating models with mixed
166  precision. For layers not included in the configuration file, builder mode determines the precision. For these layers, TensorRT may choose any precision for better performance. If 'output_types' is not provided for a layer, the data type of the output tensors will be set to the precision of the layer. For the layers with precision set to INT8, scaling factors of the input/output tensors should be provided. This file can also be used to set the scaling factors for each tensor by name. The values provided in this file will override the scaling factors specified in calibration file (if provided). See 'template_precision.json' for an example.
167  This parameter is optional.
168  Example: --precisionConfig=template_precision.json
169 
170  --testFile=[path to binary file]
171  Description: Name of a binary file for model input/output validation. This file should contain
172  flattened pairs of inputs and expected outputs in the same order as the TensorRT model expects. The file is assumed to hold 32 bit floats. The number of test pairs is automatically detected.
173  This parameter is optional.
174  Example: Data with two inputs and two outputs would have a layout in the file as follows:
175  > \[input 1\]\[input 2\]\[output 1\]\[output 2\]\[input 1\]\[input 2\]\[output 1\]\[output 2\]...
176 
177  --useGraph
178  Description: If specified, executes the optimized network by CUDA graph. It helps check if the optimized network
179  works with CUDA graph acceleration.
180  This parameter is optional.
181 
182  --workspaceSize=[int]
183  Description: Max workspace size in megabytes. Limits the maximum size that any layer in the network
184  can use. If insufficient scratch is provided, TensorRT may not be able to find an implementation for a given layer.
185  This parameter is optional.
186 
187  --explicitBatch=[int]
188  Description: Determines whether explicit batch should be enabled or not.
189  For TensorRT versions higher than or equal to 6.3, if an ONNX model is provided as
190  input, this flag will be automatically set to 1.
191  This parameter is optional.
192 
193 @section dwx_tensorRT_tool_examples Examples
194 
195 @subsection dwx_tensorRT_tool_examples_uff Optimizing UFF Models
196 
197  ./tensorRT_optimization --modelType=uff
198  --outputBlobs=bboxes,coverage
199  --uffFile=~/myNetwork.uff
200  --inputBlobs=data0,data1
201  --inputDims=3x480x960,1x1x10
202 
203 @subsection dwx_tensorRT_tool_examples_caffe Optimizing Caffe Models
204 
205  ./tensorRT_optimization --modelType=caffe
206  --outputBlobs=bboxes,coverage
207  --prototxt=deploy.prototxt
208  --caffemodel=weights.caffemodel
209 
210 @note The `--inputBlobs` and `--inputDims` parameters are ignored if you select the Caffe model type. <br>All the input blobs will be automatically marked as input.
211 
212 @subsection dwx_tensorRT_tool_examples_onnx Optimizing ONNX Models
213 
214  ./tensorRT_optimization --modelType=onxx
215  --onnxFile=~/myNetwork.onnx
216 
217 @note The `--inputBlobs`, `--inputDims`, and `--outBlobs` parameters are ignored if you select the ONNX model type.<br>All the input and output blobs will be automatically marked as input or output, respectively.