Open Model Architectures
=============================

.. _open_model_architectures:

Transfer Learning Toolkit supports image classification, six object detection architectures--including YOLOv3, YOLOv4, FasterRCNN, SSD, DSSD, RetinaNet, and DetectNet_v2--and a semantic and instance segmentation architecture, namely UNet and MaskRCNN.
In addition, there are 16 classification backbones supported by
TLT. For a complete list of all the permutations that are supported by TLT, see the matrix
below:

+---------------------------+-------------------------+--------------------------------------------------------------------------------------------------+---------------------------+---------------------------+
|                           | **ImageClassification** | **Object Detection**                                                                             | **Instance Segmentation** | **Semantic Segmentation** |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
| **Backbone**              |                         | **DetectNet_V2** | **FasterRCNN** | **SSD** | **YOLOv3** | **RetinaNet** | **DSSD** | **YOLOv4** | **MaskRCNN**              | **UNet**                  |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|ResNet10/18/34/50/101      | Yes                     | Yes              | Yes            | Yes     | Yes        | Yes           | Yes      | Yes        |  Yes                      | Yes                       |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|VGG 16/19                  | Yes                     | Yes              | Yes            | Yes     | Yes        | Yes           | Yes      | Yes        |                           | Yes                       |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|GoogLeNet                  | Yes                     | Yes              | Yes            | Yes     | Yes        | Yes           | Yes      | Yes        |                           |                           |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|MobileNet V1/V2            | Yes                     | Yes              | Yes            | Yes     | Yes        | Yes           | Yes      | Yes        |                           |                           |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|SqueezeNet                 | Yes                     | Yes              |                | Yes     | Yes        | Yes           | Yes      | Yes        |                           |                           |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|DarkNet 19/53              | Yes                     | Yes              | Yes            | Yes     | Yes        | Yes           | Yes      | Yes        |                           |                           |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|CSPDarkNet 19/53           | Yes                     |                  |                |         |            |               |          | Yes        |                           |                           |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+
|Efficient B0               | Yes                     |                  | Yes            | Yes     |            | Yes           | Yes      |            |                           |                           |
+---------------------------+-------------------------+------------------+----------------+---------+------------+---------------+----------+------------+---------------------------+---------------------------+

Model Requirements
------------------

Classification
^^^^^^^^^^^^^^

* **Input size**: 3 * H * W (W, H >= 16)
* **Input format**: JPG, JPEG, PNG

.. Note::

   Classification input images do not need to be manually resized. The input dataloader
   resizes images as needed.

Object Detection
^^^^^^^^^^^^^^^^

DetectNet_v2
************

* **Input size**: C * W * H (where C = 1 or 3, W > =480, H >=272 and W, H are multiples of 16)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

.. Note::

   The :code:`train` tool does not support training on images of multiple resolutions, or resizing
   images during training. All of the images must be resized offline to the final training size
   and the corresponding bounding boxes must be scaled accordingly.

FasterRCNN
**********

* **Input size**: C * W * H (where C = 1 or 3; W > =160; H >=160)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

.. Note::

   The :code:`train` tool does not support training on images of multiple resolutions, or resizing
   images during training. All of the images must be resized offline to the final training size
   and the corresponding bounding boxes must be scaled accordingly.

SSD
***

* **Input size**: C * W * H (where C = 1 or 3, W >= 128, H >= 128)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

DSSD
****

* **Input size**: C * W * H (where C = 1 or 3, W >= 128, H >= 128)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

YOLOv3
******

* **Input size**: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

YOLOv4
******

* **Input size**: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

RetinaNet
*********

* **Input size**: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

Instance Segmentation
^^^^^^^^^^^^^^^^^^^^^

MaskRCNN
********

* **Input size**: C * W * H (where C = 3, W > =128, H >=128 and W, H are multiples of 32)
* **Image format**: JPG
* **Label format**: COCO detection

Semantic Segmentation
^^^^^^^^^^^^^^^^^^^^^

UNet
****

* **Input size**: C * W * H (where C = 3, W > =128, H >=128 and W, H are multiples of 32)
* **Image format**: JPG, JPEG, PNG, BMP
* **Label format**: Image/Mask pair

.. Note::

   The :code:`train` tool does not support training on images of multiple resolutions. All of the images and masks must be of equal size. 
   However, image and masks need not be necessarily equal to model input size. The images/ masks will be resized to the model input size during training.

Training
^^^^^^^^

The TLT container contains Jupyter notebooks and the necessary spec files to train any network
combination. The pre-trained weight for each backbone is provided on NGC. The pre-trained model is
trained on Open image dataset. The pre-trained weights provide a great starting point for applying
transfer learning on your own dataset.

To get started, first choose the type of model that you want to train, then go to the appropriate
model card on NGC and choose one of the supported backbones.

+----------------------+-------------------------------+
| **Model to train**   | **NGC model card**            |
+----------------------+-------------------------------+
| YOLOv3               | `TLT object detection`_       |
+----------------------+                               |
| YOLOv4               |                               |
+----------------------+                               |
| SSD                  |                               |
+----------------------+                               |
| FasterRCNN           |                               |
+----------------------+                               |
| RetinaNet            |                               |
+----------------------+                               |
| DSSD                 |                               |
+----------------------+-------------------------------+
| DetectNet_v2         | `TLT DetectNet_v2 detection`_ |
|                      |                               |
+----------------------+-------------------------------+
| MaskRCNN             | `TLT instance segmentation`_  |
+----------------------+-------------------------------+
| Image Classification | `TLT image classification`_   |
|                      |                               |
+----------------------+-------------------------------+
| UNet                 | `TLT semantic segmentation`_  |
+----------------------+-------------------------------+

.. _TLT object detection: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_object_detection
.. _TLT DetectNet_v2 detection: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_detectnet_v2
.. _TLT instance segmentation: https://ngc.nvidia.com/catalog/models/nvidia:tlt_instance_segmentation
.. _TLT image classification: https://ngc.nvidia.com/catalog/models/nvidia:tlt_pretrained_classification
.. _TLT semantic segmentation: https://ngc.nvidia.com/catalog/models/nvidia:tlt_semantic_segmentation

Once you pick the appropriate pre-trained model, follow the TLT workflow to use your dataset and
pre-trained model to export a tuned model that is adapted to your use case. The **TLT Workflow**
sections walk you through all the steps in training.

.. image:: ../content/tlt_workflow.png

Deployment
^^^^^^^^^^

You can deploy your trained model on any edge device using DeepStream and TensorRT. See
:code:`Deploying to DeepStream` chapter of every network for deployment instructions.

.. image:: ../content/tlt_overview.png

.. tabularcolumns:: |p{1cm}|p{7cm}|

.. csv-table:: Open Model Deployment
   :file: ../content/open_arch_tables.csv
   :widths: 30,30,20,10,30,30,20
   :class: longtable
   :header-rows: 1