Running TLT in the Cloud

Training Deep Learning models can be a very resource intensive process. To get an accurate model, you need several hours of training time and data on the order gigabytes. Apart from the training, you will also need to run several experiments to get the best hyper-parameter configuration. These reasons make running the NVIDIA Transfer Learning Toolkit on the Cloud an appealing option.

TLT 3.0 is designed to run interactively on a virtual machine. The following sections describe how to run TLT on Amazon Web Services (AWS) or on Google Cloud Platform (GCP).

  1. Running TLT on AWS

  2. Running TLT on GCP

Note

Running TLT over the cloud requires users to lease and instantiate Virtual Machines. This can be expensive if left unattended. Don’t forget to close/shut down your instances when you are done with the training.