Model Pruning#

Model pruning is one of the key differentiators for TAO. Pruning involves removing from the neural network nodes that contribute less to the overall accuracy of the model, reducing the overall size of the model, significantly reducing the memory footprint, and increasing inference throughput—all factors that are very important for edge deployment.

Currently, pruning is supported for a subset of Computer Vision models. The following graph provides an example of performance gains achieved when going from an unpruned CV model to a pruned CV model. (Inference was run on an NVIDIA T4; TrafficCamNet, DashCamNet, and PeopleNet are three of the custom pretrained models that are available on NGC.)

../_images/pruned_vs_unpruned.png — Pruned vs Unpruned Performance#