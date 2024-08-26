There are two types of pre-trained models that you can start with:

General-purpose vision models : The pre-trained weights for these models merely act as a starting point to build more complex models. For computer vision use cases, these pre-trained weights are trained on Open Image datasets, and they provide a much better starting point for training versus starting from a random initialization of weights.

Purpose-built pre-trained models: These are highly accurate models that are trained on thousands of data inputs for a specific task. These domain-focused models can either be used directly for inference or can be used with TAO Toolkit for transfer learning on your own dataset.

* New in TAO Toolkit 3.0-21.08 GA

You can choose from 100+ permutations of model architecture and backbone with the general purpose vision models.

Purpose-built models are built for high accuracy and performance. You can deploy these models out of the box for applications such as smart city, retail, public safety, and healthcare, or you can retrain them with your own data. All models are trained on thousands of proprietary images and achieve very high accuracy on NVIDIA test data. More information about each of these models is available in ndividual model cards. Typical use cases and some model KPIs are provided in the table below. PeopleNet can be used for detecting and counting people in smart buildings, retail, hospitals, etc. For smart traffic applications, TrafficCamNet and DashCamNet can be used to detect and track vehicles on the road.

Note The accuracy reported for BodyPoseNet is based on a model trained using the COCO dataset. To reproduce the same accuracy, use the sample notebook.





The performance of these pretrained models across various NVIDIA platforms is summarized in the table below. The numbers in the table are the inference performance measured using the trtexec tool in TensorRT samples.

With general purpose models, you can train an image classification model, object detection model, or an instance segmentation model.

For classification , you can train using one of the available architectures such as ResNet, EfficientNet, VGG, MobileNet, GoogLeNet, SqueezeNet, or DarkNet.

For object detection tasks , you can choose from the popular YOLOv3/v4/v4-tiny, FasterRCNN, SSD, RetinaNet, and DSSD architectures, as well as NVIDIA’s own DetectNet_v2 architecture.

For instance segmentation, you can use MaskRCNN for instance segmentation or UNET for semantic segmentation.

This gives you the flexibility and control to build AI models for any number of applications, from smaller, light-weight models for edge GPUs to larger models for more complex tasks. For all the permutations and combinations, refer to the table below and see the Open Model Architectures section.

The table below summarizes the computer vision models and the features enabled.