TensorRT ======== .. _tensorrt_deployment: NVIDIA TensorRT is an SDK for high-performance deep learning inference. TensorRT provides APIs and parsers to import trained models from all major deep learning frameworks. It then generates optimized runtime engines deployable in the datacenter as well as in automotive and embedded environments. To understand TensorRT and it's capabilities better, refer to the official `TensorRT documentation`_. .. _TensorRT Documentation: https://developer.nvidia.com/tensorrt The models trained in TLT are deployed to NVIDIA's Inference SDK's such as DeepStream, Jarvis etc via TensorRT. While the conversational AI models trained using TLT can be consumed via TensorRT only via Jarvis, the computer vision models trained by TLT can be consumed by TensorRT, via the :code:`tlt-converter` tool. The **TLT converter** parses the exported :code:`.etlt` model file, and generates an optimized TensorRT engine. These engines can be generated to support inference at low precision, such as :code:`FP16` or :code:`INT8`. While most of the TLT models support direct integration of the .etlt files to DeepStream 5.1, DeepStream can also consume the optimized engine generated by the :code:`tlt-converter`. The TensorRT engines generated by this :code:`tlt-converter` are specific to the GPU that it was generated on. So, based on the platform that the model is being deployed to, you will need to download the specific version of the :code:`tlt-converter` and generate the engine there. The TLT models have been verified to integrate with TensorRT version 7.0, 7.1 and 7.2. TensorRT Open Source Software ----------------------------- .. _tensorrt_open_source_software: Eventhough TensorRT contains optimized implementations for several common operations used in Deep Neural Networks(DNNs), with Deep Learning being such a quickly evolving discipline, TensorRT provides users a method to bring in new operations via to the model graph via custom :code:`TensorRT Plugins`. Several samples of these custom plug-ins are hosted on GitHub under the repository called `TensorRT OSS`_. .. _TensorRT OSS: https://github.com/NVIDIA/TensorRT Instructions to build and install TensorRT OSS can be found in this `repository `_. The TLT applications that require TensorRT OSS are: * FasterRCNN * SSD * DSSD * YOLOv3 * YOLOv4 * RetinaNet * MaskRCNN Installing the TLT-Converter ---------------------------- .. _installing_the_tlt_converter: The :code:`TLT Converter` is distributed as a separate binary for x86 and Jetson platforms. The following table lists the links where you can download the :code:`tlt-converter`. .. _tlt_converter_matrix: .. csv-table:: TLT Converter Support Matrix for x86 :file: ../content/tlt_converter_x86.csv :widths: 30,30,30 :class: longtable :header-rows: 1 .. csv-table:: TLT Converter Support Matrix for Jetson :file: ../content/tlt_converter_aarch.csv :widths: 30,30 :class: longtable :header-rows: 1 Installing on an x86 platform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: excerpts/instructions_for_x86_with_OSS.rst Installing on an jetson platform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: excerpts/instructions_for_jetson_with_OSS.rst Running the TLT converter ------------------------- Using the tlt-converter ^^^^^^^^^^^^^^^^^^^^^^^ .. code:: tlt-converter [-h] -k -d -o [-c ] [-e ] [-b ] [-m ] [-t ] [-w ] [-i ] [-p ] [-s] [-u ] input_file Required Arguments ~~~~~~~~~~~~~~~~~~ * :code:`input_file`: Path to the :code:`.etlt` model exported using :code:`tlt export`. * :code:`-k`: The key used to encode the :code:`.tlt` model when doing the training. * :code:`-d`: Comma-separated list of input dimensions that should match the dimensions used for :code:`tlt export`. * :code:`-o`: Comma-separated list of output blob names that should match the output configuration used for :code:`tlt export`. Optional Arguments ~~~~~~~~~~~~~~~~~~ * :code:`-e`: Path to save the engine to. (default: :code:`./saved.engine`) * :code:`-t`: Desired engine data type, generates calibration cache if in INT8 mode. The default value is :code:`fp32`. The options are {:code:`fp32`, :code:`fp16`, :code:`int8`}. * :code:`-w`: Maximum workspace size for the TensorRT engine. The default value is :code:`1073741824(1<<30)`. * :code:`-i`: Input dimension ordering, all other TLT commands use NCHW. The default value is :code:`nchw`. The options are {:code:`nchw`, :code:`nhwc`, :code:`nc`}. * :code:`-p`: Optimization profiles for :code:`.etlt` models with dynamic shape. Comma separated list of optimization profile shapes in the format :code:`,,,`, where each shape has the format: :code:`xxx`. Can be specified multiple times if there are multiple input tensors for the model. This is only useful for new models introduced in TLT 3.0. This parameter is not required for models that are already existed in TLT 2.0. * :code:`-s`: TensorRT strict type constraints. A Boolean to apply TensorRT strict type constraints when building the TensorRT engine. * :code:`-u`: Use DLA core. Specifying DLA core index when building the TensorRT engine on Jetson devices. INT8 Mode Arguments ~~~~~~~~~~~~~~~~~~~ * :code:`-c`: Path to calibration cache file, only used in INT8 mode. The default value is :code:`./cal.bin`. * :code:`-b`: Batch size used during the export step for INT8 calibration cache generation. (default: :code:`8`). * :code:`-m`: Maximum batch size for TensorRT engine.(default: :code:`16`). If meet with out-of-memory issue, decrease the batch size accordingly. This parameter is not required for :code:`.etlt` models generated with dynamic shape. (This is only possible for new models introduced in TLT 3.0.) The usage for each TLT Computer Vision is explained in the respective models chapter.