Running TAO Toolkit on Google Cloud Platform

Google Cloud Platform provides the Compute Engine, which is a a computing and hosting service that lets you create and run virtual machines on Google infrastructure. The Compute Engine provides a Linux or a Windows VM. To run TAO Toolkit, you will need to set up a Linux VM.

Setting up a VM Linux VM Instance

Instructions to set up a VM are outlined in the official compute engine instructions.
Select a compute engine from the VM Instances option in the console.
Create a new instance using the Create Instance tab
Set the machine family of the instance GPU.
Set boot image to Ubuntu, with the following options:

Boot disk type: Balanced persistent dist Size (GB) > 200
Select your default network.
Spin up the VM by clicking Create.

Note

NVIDIA recommends using the A2 series of VM instances that are powered by the NVIDIA Tesla A100 GPU’s for best training performance.

Using the VM

Once you have set up the instance, note the IP address of the VM created from the console.

Set up SSH access
1. Generate an SSH key from the terminal you intend to use to log in to the created VM. You can do so by running the command below and following the prompts:
  Copy
  
  Copied!
```
            
            ssh-keygen -t rsa -b 4096
        
```
1. Copy the contents of the ~/.ssh/id_rsa.pub file and add it to the instance.
2. Use the login ID in the public key to log in to the public IP address of the instance.

Setting up the VM and Enabling GPUs

Prepare the OS dependencies and check the GPUs:

Copy
Copied!

            
            sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get install -y pciutils

lspci | grep nvidia

Install the NVIDIA GPU driver:

Copy
Copied!

            
            sudo apt-get -y install nvidia-driver-460
sudo apt-get -y docker.io
sudo apt-get install python3-pip unzip

Install docker-ce and nvidia-docker2:

Copy
Copied!

            
            distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
systemctl restart docker
usermod -a -G docker $USER

You can verify the docker installation and the GPU instances, as shown below:

Copy
Copied!

            
            docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
|-------------------------------+----------------------+----------------------+

Log in to the docker registry nvcr.io by running the command below:
Copy

Copied!
```
            
            docker login nvcr.io
        
```
The username here is $oauthtoken and the password is the NGC API KEY. You may set this API key from the NGC website.

Installing the Pre-requisites for TAO Toolkit

Upgrade python-pip to the latest version:

Copy
Copied!

            
            pip3 install --upgrade pip

Install the virtualenv wrapper:

Copy
Copied!

            
            pip3 install virtualenvwrapper

Configure the virtualenv wrapper:

Copy
Copied!

            
            export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export WORKON_HOME=/home/ubuntu/.virtualenvs
export PATH=/home/ubuntu/.local/bin:$PATH
source /home/ubuntu/.local/bin/virtualenvwrapper.sh

Note

You may also add these commands to the ~/.bashrc of the VM to retain them for multiple sessions.

Create a virtualenv for the launcher using the following command:
Copy

Copied!
```
            
            mkvirtualenv -p /usr/bin/python3 launcher
        
```
Note
You only need to create a virtualenv once in the instance. When you restart the instance, simply run the commands in step 3 and invoke the same virtual env using the command below:
Copy

Copied!

workon launcher

Install jupyterlab in the virtualenv using the command below

Copy
Copied!

            
            pip3 install jupyterlab

Downloading and Running Test Samples

Now that you have created a virtualenv and installed all the dependencies, you are now ready to download and run the TAO Toolkit samples on the notebook. The instructions below assume that you are running the TAO Computer Vision samples. For more Conversational AI samples, refer to the sample notebooks in this section.

Download and unzip the notebooks from NGC using the commands below:

Copy
Copied!

            
            wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.2.0/zip -O    cv_samples_v1.2.0.zip
unzip -u cv_samples_v1.2.0.zip  -d ./cv_samples_v1.2.0 && cd ./cv_samples_v1.2.0

Launch the jupyter notebook using the command below:
Copy

Copied!
```
            
            jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root --NotebookApp.token=<notebook_token>
        
```
This will kick off the jupyter notebook server in the VM. To access this server, navigate to http://<dns_name>:8888/ and enter the <notebook_token> used to start the notebook server, when prompted. The dns_name here is the Public IPv4 DNS of the VM that you noted down earlier.