Running TLT on an AWS VM

Amazon Web Services provides the Elastic Compute Cloud (EC2) instance for running compute jobs in the cloud. This page provides instructions for running TLT on an EC2 VM.

Pre-Requisites

To define the security group and AWS key pair, complete the preliminary setup instructions here.

Setting up an AWS EC2 instance

  1. Log in to your AWS account, or create one by following instructions on the official AWS Getting Started Web Page

  2. Once you have logged in, select Compute under EC2.

  3. Select the AWS zone. For the purpose of this tutorial, use “US N.Virginia”.

  4. Click Launch Instance.

  5. Start an EC2 Virtual Machine Instance. For running TLT, use the NVIDIA Deep Learning Amazon Machine Instance (AMI). To use this AMI, select the AWS Marketplace and search for the NVIDIA Deep Learning AMI.

    Note

    The Amazon EC2 P3 and G4 instances are optimized for the NVIDIA Volta/Turing GPUs.

  6. Select one of the Amazon EC2 P3 and G4 instance types according to your P3 and G4 instance types.

  7. Click Review and Launch to review the default configuration settings.

  8. After choosing an instance type, click Next: Configure Instance Details.

    Note

    There are no instance details that need to be configured, so you can proceed to the next step.

  9. Add storage by clicking Next: Add Storage

    Note

    For TLT, users are encouraged to request at least 200GB of storage space.

  10. Add tags. Naming your instances helps to keep multiple instances organized.

  11. Continue to Configure a Security Group. Click Select an Existing Security Group and select the Security Group you created during Preliminary Setup. You have now configured your AWS instance.

  12. Click on Review and Launch to launch your instance. You should get a pop-up asking for the key pair you would like to use. Choose the key pair that you set up in the preliminary setup instructions.

  13. You may now connect to your instance by following the instructions on this webpage.

Installing the Pre-Requisites for TLT in the VM

The NVIDIA Deep Learning AMI by default comes with several dependencies pre-installed to launch

NVIDIA-built Deep Learning Containers. To run TLT, you are required to install some simple dependencies.

  1. Install prerequisite apt packages:

    sudo apt update
    sudo apt install python-pip python3-pip unzip
    pip3 install --upgrade pip
    
  2. Install virtualenv wrapper:

    pip3 install virtualenvwrapper
    
  3. Configure the virtualenv wrapper:

    export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
    export WORKON_HOME=/home/ubuntu/.virtualenvs
    export PATH=/home/ubuntu/.local/bin:$PATH
    source /home/ubuntu/.local/bin/virtualenvwrapper.sh
    

    Note

    You may also add these commands to the /home/ubuntu/.bashrc file of the VM so that the configuration persists for multiple sessions.

  4. Create a virtualenv for the launcher using the following command

    mkvirtualenv -p /usr/bin/python3 launcher
    

    Note

    You only need to create a virtualenv once in the instance. When you restart the instance, simply run the commands in step 3 and invoke the same virtual env using the command below:

    workon launcher
    
  5. Install jupyterlab in the virtualenv using the command below:

    pip3 install jupyterlab
    
  6. Log in to the NGC docker registry named nvcr.io:

    docker login nvcr.io
    

    The username here is $oauthtoken and the password is the NGC API KEY.You may set this API key from the NGC website.

Download and run the test samples

Now that you have created a virtualenv and installed all the dependencies, you are now ready to download and run the TLT samples on the notebook. The instructions below assume that you are running the TLT Computer Vision samples. For more Conversational AI samples, refer to the sample notebooks in this section.

  1. Download and unzip the notebooks from NGC using the commands below:

    wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tlt_cv_samples/versions/v1.1.0/zip -O    tlt_cv_samples_v1.1.0.zip
    unzip -u tlt_cv_samples_v1.1.0.zip  -d ./tlt_cv_samples_v1.1.0 && cd ./tlt_cv_samples_v1.1.0
    
  2. Launch the jupyter notebook using the command below:

    jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root --NotebookApp.token=<notebook_token>
    

    This will kick off the jupyter notebook server in the VM. To access this server, navigate to http://<dns_name>:8888/ and, when prompted, enter the <notebook_token> used to start the notebook server. The dns_name here is the Public IPv4 DNS of the VM that you will see under the EC2 dashboard of your respective instance.