Open Model Architectures

Transfer Learning Toolkit supports image classification, six object detection architectures–including YOLOv3, YOLOv4, FasterRCNN, SSD, DSSD, RetinaNet, and DetectNet_v2–and a semantic and instance segmentation architecture, namely UNet and MaskRCNN. In addition, there are 16 classification backbones supported by TLT. For a complete list of all the permutations that are supported by TLT, see the matrix below:

ImageClassification

Object Detection

Instance Segmentation

Semantic Segmentation

Backbone

DetectNet_V2

FasterRCNN

SSD

YOLOv3

RetinaNet

DSSD

YOLOv4

MaskRCNN

UNet

ResNet10/18/34/50/101

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

VGG 16/19

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

GoogLeNet

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

MobileNet V1/V2

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

SqueezeNet

Yes

Yes

Yes

Yes

Yes

Yes

Yes

DarkNet 19/53

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

CSPDarkNet 19/53

Yes

Yes

Efficient B0

Yes

Yes

Yes

Yes

Yes

Model Requirements

Classification

  • Input size: 3 * H * W (W, H >= 16)

  • Input format: JPG, JPEG, PNG

Note

Classification input images do not need to be manually resized. The input dataloader resizes images as needed.

Object Detection

DetectNet_v2

  • Input size: C * W * H (where C = 1 or 3, W > =480, H >=272 and W, H are multiples of 16)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

Note

The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

FasterRCNN

  • Input size: C * W * H (where C = 1 or 3; W > =160; H >=160)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

Note

The train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

SSD

  • Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

DSSD

  • Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

YOLOv3

  • Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

YOLOv4

  • Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

RetinaNet

  • Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)

  • Image format: JPG, JPEG, PNG

  • Label format: KITTI detection

Instance Segmentation

MaskRCNN

  • Input size: C * W * H (where C = 3, W > =128, H >=128 and W, H are multiples of 32)

  • Image format: JPG

  • Label format: COCO detection

Semantic Segmentation

UNet

  • Input size: C * W * H (where C = 3, W > =128, H >=128 and W, H are multiples of 32)

  • Image format: JPG, JPEG, PNG, BMP

  • Label format: Image/Mask pair

Note

The train tool does not support training on images of multiple resolutions. All of the images and masks must be of equal size. However, image and masks need not be necessarily equal to model input size. The images/ masks will be resized to the model input size during training.

Training

The TLT container contains Jupyter notebooks and the necessary spec files to train any network combination. The pre-trained weight for each backbone is provided on NGC. The pre-trained model is trained on Open image dataset. The pre-trained weights provide a great starting point for applying transfer learning on your own dataset.

To get started, first choose the type of model that you want to train, then go to the appropriate model card on NGC and choose one of the supported backbones.

Model to train

NGC model card

YOLOv3

TLT object detection

YOLOv4

SSD

FasterRCNN

RetinaNet

DSSD

DetectNet_v2

TLT DetectNet_v2 detection

MaskRCNN

TLT instance segmentation

Image Classification

TLT image classification

UNet

TLT semantic segmentation

Once you pick the appropriate pre-trained model, follow the TLT workflow to use your dataset and pre-trained model to export a tuned model that is adapted to your use case. The TLT Workflow sections walk you through all the steps in training.

../_images/tlt_workflow.png

Deployment

You can deploy your trained model on any edge device using DeepStream and TensorRT. See Deploying to DeepStream chapter of every network for deployment instructions.

../_images/tlt_overview.png
Open Model Deployment

Model

Model output format

Prunable

INT8

Compatible with DS5.0/5.0.1

Compatible with DS5.1

TRT-OSS required

DetectNet_v2

Encrypted UFF

Yes

Yes

Yes

Yes

No

FasterRCNN

Encrypted UFF

Yes

Yes

Yes

Yes

Yes

SSD

Encrypted UFF

Yes

Yes

Yes

Yes

Yes

YOLOv3

Encrypted UFF

Yes

Yes

Yes

Yes (with TRT 7.1)

Yes

YOLOv4

Encrypted UFF

Yes

Yes

Yes

Yes (with TRT 7.1)

Yes

DSSD

Encrypted UFF

Yes

Yes

Yes

Yes

Yes

RetinaNet

Encrypted UFF

Yes

Yes

Yes

Yes

Yes

MaskRCNN

Encrypted UFF

No

Yes

UNET

Encrypted ONNX

No

Yes

No

Yes