Running TAO Toolkit on an Azure VM

Microsoft Azure Cloud offers several GPU optimized Virtual machines (VM) with access to NVIDIA A100, V100 and T4 GPUs.

  1. Azure provides several VMs that are powered by NVIDIA GPUs–including the ND 100, NCv3, and NC T4v3 series. We recommend using the NVIDIA prodivded GPU optimized image as the base image for the VM. This base image includes all the lower-level dependencies, which reduces the friction of installing drivers and other pre-requisites.

    Pull the GPU optimized image from Azure marketplace by clicking on the Get it Now button.

    gpu_optimized_image.png

    Select the v21.04.1 version under the Software plan to select the latest version. This will have the latest NVIDIA drivers and CUDA toolkit. Once you select the version, it will direct you to the Azure portal where you will create your VM.

    azure_image_version_selection_window.png

  2. Configure your VM.

    1. In the Azure portal, click Create to start configuring the VM.

      azure_portal.png

      This will pull the following page, where you can select your subscription method, resource group, region, and Hardware configuration.

    2. Provide a name for your VM. Then click the Review + Create button at the end to do a final review.

      Note

      The default disk space is 32GB. We recommend using > 128GB disk for this experiment.

      azure_create_vm.png

    3. Make a final review of the offering that you are creating. Then click the Create button to spin up your VM in Azure.

      Note

      Once you create the VM, you will start incurring cost, so review the pricing details.

      azure_vm_review.png

  3. Log in to your VM: Once you have created your VM, SSH into your VM using the username and domain name or IP address of your VM.

    Copy
    Copied!
                

    ssh <username>@<ip_address>

  1. Configure user permissions in the VM:

    Copy
    Copied!
                

    sudo su - root usermod -a -G docker azureuser

  2. Install the pre-requisite apt packages:

    Copy
    Copied!
                

    apt-get -y install python3-pip unzip

  3. Install the virtualenv wrapper:

    Copy
    Copied!
                

    pip3 install virtualenvwrapper

  4. Configure the virtualenv wrapper:

    Copy
    Copied!
                

    export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3 source /usr/local/bin/virtualenvwrapper.sh

  5. Create a virtualenv for the launcher using the following command:

    Copy
    Copied!
                

    mkvirtualenv -p /usr/bin/python3 launcher

    Note

    You only need to create a virtualenv once in the instance. When you restart the instance, simply run the commands in step 3 and invoke the same virtualenv using the command below:

    Copy
    Copied!
                

    workon launcher

  6. Install jupyterlab in the virtualenv using the command below:

    Copy
    Copied!
                

    pip3 install jupyterlab

  7. Log in to the NGC docker registry named nvcr.io:

    Copy
    Copied!
                

    docker login nvcr.io

    The username is $oauthtoken and the password is the NGC API KEY. You may set this API key from the NGC website.

  8. Install the TAO Toolkit launcher package:

    Copy
    Copied!
                

    pip3 install nvidia-tao

  9. Verify the launcher installation using the tao info --verbose command.

    Copy
    Copied!
                

    Configuration of the TAO Toolkit Instance task_group: model: dockers: nvidia/tao/tao-toolkit: 5.0.0-tf2.9.1: docker_registry: nvcr.io tasks: 1. classification_tf2 2. efficientdet_tf2 5.0.0-tf1.15.5: docker_registry: nvcr.io tasks: 1. bpnet 2. classification_tf1 3. converter 4. detectnet_v2 5. dssd 6. efficientdet_tf1 7. faster_rcnn 8. fpenet 9. lprnet 10. mask_rcnn 11. multitask_classification 12. retinanet 13. ssd 14. unet 15. yolo_v3 16. yolo_v4 17. yolo_v4_tiny 5.0.0-pyt: docker_registry: nvcr.io tasks: 1. action_recognition 2. classification_pyt 3. deformable_detr 4. dino 5. mal 6. ml_recog 7. ocdnet 8. ocrnet 9. optical_inspection 10. pointpillars 11. pose_classification 12. re_identification 13. re_identification_transformer 14. segformer dataset: dockers: nvidia/tao/tao-toolkit: 5.0.0-dataservice: docker_registry: nvcr.io tasks: 1. augmentation 2. auto_label 3. annotations 4. analytics deploy: dockers: nvidia/tao/tao-toolkit: 5.0.0-deploy: docker_registry: nvcr.io tasks: 1. classification_pyt 2. classification_tf1 3. classification_tf2 4. deformable_detr 5. detectnet_v2 6. dino 7. dssd 8. efficientdet_tf1 9. efficientdet_tf2 10. faster_rcnn 11. lprnet 12. mask_rcnn 13. ml_recog 14. multitask_classification 15. ocdnet 16. ocrnet 17. optical_inspection 18. retinanet 19. segformer 20. ssd 21. unet 22. yolo_v3 23. yolo_v4 24. yolo_v4_tiny format_version: 3.0 toolkit_version: 5.0.0 published_date: 05/31/2023

Now that you have created a virtualenv and installed all the dependencies, you are now ready to download and run the TAO samples on the notebook. The instructions below assume that you are running the TAO Computer Vision samples.

  1. Download and unzip the notebooks from NGC using the commands below:

    Copy
    Copied!
                

    wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/5.0.0/zip -O tao-getting-started_5.0.0.zip unzip -u tao-getting-started_5.0.0.zip -d ./tao-getting-started_5.0.0 && cd ./tao-getting-started_5.0.0

  2. Launch the jupyter notebook using the command below:

    Copy
    Copied!
                

    jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root --NotebookApp.token=<notebook_token>

    This will kick off the jupyter notebook server in the VM. To access this server, navigate to http://<dns_name>:8888/ and, when prompted, enter the <notebook_token> used to start the notebook server. The dns_name here is the public IPv4 DNS of the VM that you will see under the EC2 dashboard of your respective instance.

© Copyright 2023, NVIDIA.. Last updated on Dec 8, 2023.