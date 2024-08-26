Running TAO Toolkit on Google Cloud Platform
Google Cloud Platform provides the Compute Engine, which is a a computing and hosting service that lets you create and run virtual machines on Google infrastructure. The Compute Engine provides a Linux or a Windows VM. To run TAO Toolkit, you will need to set up a Linux VM.
Instructions to set up a VM are outlined in the official compute engine instructions.
Select a compute engine from the VM Instances option in the console.
Create a new instance using the Create Instance tab
Set the machine family of the instance
GPU.
Set boot image to Ubuntu, with the following options:
Boot disk type: Balanced persistent dist
Size (GB)> 200
Select your default network.
Spin up the VM by clicking Create.
NVIDIA recommends using the A2 series of VM instances that are powered by the NVIDIA Tesla A100 GPU’s for best training performance.
Once you have set up the instance, note the IP address of the VM created from the console.
Set up SSH access
Generate an SSH key from the terminal you intend to use to log in to the created VM. You can do so by running the command below and following the prompts:
ssh-keygen -t rsa -b 4096
Copy the contents of the
~/.ssh/id_rsa.pubfile and add it to the instance.
Use the login ID in the public key to log in to the public IP address of the instance.
-
Prepare the OS dependencies and check the GPUs:
sudo apt-get update sudo apt-get -y upgrade sudo apt-get install -y pciutils lspci | grep -i nvidia
Install the NVIDIA GPU driver:
sudo apt-get -y install nvidia-driver-460 sudo apt-get -y docker.io sudo apt-get install -y python3-pip unzip
Install NVIDIA Container Toolkit:
Follow the instructions given in Installing the NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html> to install nvidia-docker.
Log in to the docker registry
nvcr.ioby running the command below:
docker login nvcr.io
The username here is
$oauthtokenand the password is the
NGC API KEY. You may set this API key from the NGC website.
Upgrade
python-pipto the latest version:
pip3 install --upgrade pip
Install the virtualenv wrapper:
pip3 install virtualenvwrapper
Configure the virtualenv wrapper:
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3 export WORKON_HOME=/home/ubuntu/.virtualenvs export PATH=/home/ubuntu/.local/bin:$PATH source /home/ubuntu/.local/bin/virtualenvwrapper.shNote
You may also add these commands to the
~/.bashrcof the VM to retain them for multiple sessions.
Create a virtualenv for the launcher using the following command:
mkvirtualenv -p /usr/bin/python3 launcherNote
You only need to create a virtualenv once in the instance. When you restart the instance, simply run the commands in step 3 and invoke the same virtual env using the command below:
workon launcher
Install jupyterlab in the virtualenv using the command below
pip3 install jupyterlab
Now that you have created a virtualenv and installed all the dependencies, you are now ready to download and run the TAO Toolkit samples on the notebook. The instructions below assume that you are running the TAO Computer Vision samples.
Download and unzip the notebooks from NGC using the commands below:
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/5.0.0/zip -O tao-getting-started_5.0.0.zip unzip -u tao-getting-started_5.0.0.zip -d ./tao-getting-started_5.0.0 && cd ./tao-getting-started_5.0.0
Launch the jupyter notebook using the command below:
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root --NotebookApp.token=<notebook_token>
This will kick off the jupyter notebook server in the VM. To access this server, navigate to
http://<dns_name>:8888/and enter the
<notebook_token>used to start the notebook server, when prompted. The
dns_namehere is the Public IPv4 DNS of the VM that you noted down earlier.