NVIDIA GPU-Optimized Virtual Machine Images are available on Microsoft Azure compute instances with NVIDIA A100, T4, and V100 GPUs.

For those familiar with the Azure platform, the process of launching the instance is as simple as logging into Azure, selecting the NVIDIA GPU-optimized Image of choice, configuring settings as needed, then launching the VM. After launching the VM, you can SSH into it and start building a host of AI applications in deep learning, machine learning and data science leveraging the plethora of GPU-accelerated containers, pre-trained models and resources from NGC.

This document provides step-by-step instructions for accomplishing this, including how to use the Azure CLI.



Cloud security starts with the security policies of your CSP account. Refer to the following link for how to configure your security policies for your CSP:

Users must follow the security guidelines and best practices of their CSP to secure their VM and account.

These instructions assume the following:

You have an Azure account - https://portal.azure.com, with either permissions to create a Resource Group or with a Resource Group already available to you.

Browse the NGC website and identified an available NGC container and tag to run on the VirtualMachine Instance (VMI).

If you plan to use the Azure CLI or Terraform, then the Azure CLI 2.0 must be installed.

Windows Users: The CLI code snippets are for bash on Linux or Mac OS X. If you are using Windows and want to use the snippets as-is, you can use the Windows Subsystem for Linux and use the bash shell (you will be in Ubuntu Linux).

Be sure you are familiar with the information in this chapter before starting to use the NVIDIA GPU Cloud Machine Image on Microsoft Azure.



If you do not already have SSH keys set up specifically for Azure, you will need to set one up and have it on the machine you will use to SSH to the VM. In the examples, the key is named "azure-key".

On Linux or OS X, generate a new key with the following command:

Copy Copied! ssh-keygen -t rsa -b 2048 -f ~/.ssh/azure-key

On Windows, the location will depend on the SSH client you use, so modify the path above in the snippets or in your SSH client configuration.

Alternatively, you could also choose to authenticate using a username and password that can be setup while creating the VM. However, the SSH key method ensures optimal security.

For more information, refer to Create and manage SSH keys for authentication to a Linux VM in Azure.

When creating your NVIDIA GPU Cloud VM, Azure sets up a network security group for the VM and you should choose to allow external access to inbound ports 22 (for SSH) and 443 (for HTTPS). You can add inbound rules to the network security group later for other ports as needed, such as port 8888 for DIGITS.



You can also set up a separate network security group so that it will be available any time you create a new NVIDIA GPU Cloud VM. This can be done ahead of time. Refer to the Microsoft instructions to Create, Change, or Delete a Network Security Group

Add the following inbound rules to your network security group:

SSH Destination port ranges: 22 Protocol: TCP Name: SSH

HTTPS Destination port ranges: 443 Protocol: TCP Name: HTTPS

Others as needed Example: DIGITS Destination port ranges: 8888 Protocol: TCP Name: DIGITS



Security Warning

It is important to use proper precautions and security safeguards prior to granting access, or sharing your AMI over the internet. By default, internet connectivity to the AMI instance is blocked. You are solely responsible for enabling and securing access to your AMI. Please refer to Azure guides for managing security groups.

Log into the Azure portal (https://portal.azure.com). Select Create a resource from the Azure services menu. On the New pane, search for "nvidia", and then select the NVIDIA Virtual Machine Image you would like to use. Note that you can choose between the NVIDIA AI Enterprise image or the various NVIDIA GPU-Optimized images. Based on your selection, you will have two slightly different paths. Selecting the pay-as-you-go NVIDIA AI Enterprise image: At the list page, click Get It Now. At the Azure app creation dialog, select your plan based on the number of GPUs you want to launch and click Continue. At the VMI deployment page, verify your plan selection and pricing, then click Create. From the Create a virtual machine page, the rest of the process is similar for all options. You can find detailed instructions for this process in Step 5, provided below. Selecting the free GPU-Optimized image: At the listing page, click Get It Now. At the Azure app creation dialog, review the information and click Continue. At the VMI deployment page, select your desired release version from the software plan menu and click Create. From the Create a virtual machine page, the rest of the process is similar for all options. You can find detailed instructions for this process in Step 5, provided below. Select the latest release version (or version of your choice if required) from the software plan menu and then click Create Complete the settings under the Basics tab as follows: Subscription and Resource Group : Choose relevant options to your subscription

: Choose relevant options to your subscription Virtual Machine Name : Name of choice

: Name of choice Region : Select a region with instance types featuring the latest NVIDIA GPUs (NC-v3 Series). In this example we use the (US) East US region. A list of available instance types by region can be found here

: Select a region with instance types featuring the latest NVIDIA GPUs (NC-v3 Series). In this example we use the (US) East US region. A list of available instance types by region can be found here Authentication Choice : SSH, with username of choice

: SSH, with username of choice SSH public key: Paste in your SSH public key that you previously generated Click Next to select a Premium SSH and add data disks. In the Networking section, select the Network Security Group you created earlier under the Configure network security group option. Make other Settings selections as needed, then click OK. After the validation passes, the portal presents the details of your new image which you can download as a template to automate deployment later. Click Deploy to deploy the image. The deployment starts, as indicated by the traveling bar underneath the Alert icon. It may take a few minutes to complete.

Open the VM instance that you created. Navigate to the Azure portal home page and click on Virtual Machines under the Azure services menu. Select the VM you created and want to connect to. Click Connect from the action bar at the top and then select SSH. If the instructions to log in via SSH login do not work, refer to Troubleshooting SSH connections to an Azure Linux VM that fails, errors out, or is refused documentation for further troubleshooting.

Open the VM instance you created. Navigate to the Azure portal home page and click on Virtual Machines under the Azure services menu. Select the VM you created and want to manage. Click Start or Stop from the action bar at the top.

When you created your VM, other resources for that instance were automatically created for you, such as a network interface, public IP address, and boot disk. If you deleted your VM, you will also need to delete these resources.



Open the VM instance you created. Navigate to the Azure portal home page and click on Virtual Machines under the Azure services menu. Select the VM you created and want to delete. Click Delete from the action bar at the top and confirm your choice by typing ‘yes’ on the pane to the left that pops up.

If you plan to use Azure CLI, then the CLI must be installed.

Some of the CLI snippets in these instructions make use of jq, which should be installed on the machine from which you'll run the CLI. You may paste these snippets into your own bash scripts or type them at the command line.



Use the following table as a guide for determining the values you will need for creating your GPU Cloud VM. The variable names are arbitrary and used in the instructions that follow.

VARIABLE DESCRIPTION EXAMPLE AZ_VM_NAME Name for your GPU Cloud VM my-nvgpu-vmi AZ_RESOURCE_GROUP Your resource group ACME_RG AZ_IMAGE The NVIDIA GPU-Optimized Image. See the release notes NVIDIA Virtual Machine Images on Azure for the latest release. NVIDIA-GPU-Cloud-Image AZ_LOCATION A zone that contains GPUs. Refer to https://azure.microsoft.com/en-us/global-infrastructure/services/ to see available locations for NCv2 and NCv3 series SKUs. westus2 AZ_SIZE The SKU specified by the number of vCPUs, RAM, and GPUs. Refer to https://docs.microsoft.com/en-us/azure/virtual-machines/linux/sizes-gpu for the list of P40, P100, and V100 SKUs to choose from. NC6s_v2 AZ_SSH_KEY <path>/<public-azure-key.pub> ~/.ssh/azure-key.pub AZ_USER Your username jsmith AZ_NSG Your network security group my-nvgpu-nsg

Be sure you have installed Azure CLI and that you are ready with the VM setup information listed in the section Set Up Environment Variables. You can then either manually replace the variable names in the commands in this section with the actual values or define the variables ahead of time.



Log in to the Azure CLI. Copy Copied! az login Enter the following: Copy Copied! az vm create \ --name ${AZ_VM_NAME} \ --resource-group ${AZ_RESOURCE_GROUP} \ --image ${AZ_IMAGE} \ --location ${AZ_LOCATION} \ --size ${AZ_SIZE} \ --ssh-key-value ${AZ_SSH_KEY} \ --admin-username ${AZ_USER} \ --nsg ${AZ_NSG} If successful, you should see output consisting of a JSON description of your VM. The GPU Cloud VM gets deployed. Note the public IP address for use when establishing an SSH connection to the VM. You can also set up an AZ_PUBLIC_IP variable by defining an Azure JSON file for the VM as follows: Copy Copied! AZ_JSON=$(az vm create \ --name ${AZURE_VM_NAME} \ --resource-group ${AZ_RESOURCE_GROUP} \ --image ${AZ_IMAGE} \ --location ${AZ_LOCATION} \ --size ${AZ_SIZE} \ --ssh-key-value ${AZ_SSH_KEY} \ --admin-username ${AZ_USER} \ --nsg ${AZ_NSG}) AZ_PUBLIC_IP=$(echo $AZ_JSON | jq .publicIpAddress | sed 's/\"//g') && \ echo $AZ_JSON && echo AZ_PUBLIC_IP=$AZ_PUBLIC_IP

Azure sets up a non-persistent scratch disk for each VM. See the sections Persistent Data Storage for Azure Virtual Machines for instructions on setting up alternate storage for your datasets.

Using a CLI on Mac or Linux (Windows users: use OpenSSH on Windows PowerShell or use the Windows Subsystem for Linux), run ssh to connect to your GPU VM instance.

Copy Copied! ssh -i $AZ_SSH_KEY $AZ_USER@$AZ_PUBLIC_IP

VMs can be stopped and started again without losing any of their storage and other resources.

To stop and deallocate a running VM:

Copy Copied! az vm deallocate --resource-group $AZ_RESOURCE_GROUP --name $AZ_VM_NAME

To start a stopped VM:

Copy Copied! az vm start --resource-group $AZ_RESOURCE_GROUP --name $AZ_VM_NAME

When starting a stopped VM, you will need to update the public IP variable, as it will change with the newly started VM.

Copy Copied! AZ_PUBLIC_IP=$(az network public-ip show \ --resource-group $AZ_RESOURCE_GROUP \ --name $AZ_VM_NAME\PublicIP | jq .ipAddress | sed 's/\"//g') && \ echo AZ_PUBLIC_IP=$AZ_PUBLIC_IP

When you created your VM, other resources for that instance were automatically created for you, such as a network interface, public IP address, and boot disk. If you deleted your instance, you will also need to delete these resources.

Perform the deletions in the following order.



Delete your VM. Copy Copied! az vm delete -g $AZ_RESOURCE_GROUP -n $AZ_VM_NAME Delete the VM OS disk. List the disks in your Resource Group. Copy Copied! az disk list -g $AZ_RESOURCE_GROUP The associated OS disk will have the name of your VM as the base name. Delete the OS disk. Copy Copied! az disk delete -g $AZ_RESOURCE_GROUP -n MyDisk See https://docs.microsoft.com/en-us/cli/azure/disk?view=azure-cli-latest#az-disk-delete for more information. Delete the VM network interface. List the network interface resources in your Resource Group. Copy Copied! az network nic list -g $AZ_RESOURCE_GROUP The associated network interface will have the name of your VM as the base name. Delete the network interface resource. Copy Copied! az network nic delete -g $AZ_RESOURCE_GROUP -n MyNic See https://docs.microsoft.com/en-us/cli/azure/network/nic?view=azure-cli-latest#az-network-nic-delete for more information. Delete the VM public IP address. List the public IPs in your Resource Group. Copy Copied! az network public-ip list -g $AZ_RESOURCE_GROUP The associated public IP will have the name of your VM as the base name. Delete the public IP. Copy Copied! az network public-ip delete -g $AZ_RESOURCE_GROUP -n MyIp See https://docs.microsoft.com/en-us/cli/azure/network/public-ip?view=azure-cli-latest#az-network-public-ip-delete for more information.

You can create Premium Storage SSD from the Azure dashboard. Premium Storage SSDs are ideal for persistent storage of a large number of datasets and offer better performance.



Open the VM instance that you created. Navigate to the Azure portal home page and click on Virtual Machines under the Azure services menu. Select the VM that you created and want to manage. Select Disks under the Settings category in the control panel on the left. Click Add Disk and click Create Disk in the drop down menu upon clicking Name. On the Create Managed Disk pane, Enter a disk name Select a resource group Select Premium SSD for Account type Enter a disk size Click Create. When the validation is completed, click Save.

To create a new data disk and attach it to your VM, include the following option in the az vm create command.

Copy Copied! --data-disk-sizes-gb <data-disk-size>

To attach an existing data disk to your VM when creating it, include the following option in the az vm create command.

Copy Copied! -- attach-data-disks <data-disk-name>

Once the data disk is created, establish an SSH connection to your VM. Create a filesystem on the data disk. You can view the volume by running lsblk command. Copy Copied! :~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb 8:16 0 1.5T 0 disk └─sdb1 8:17 0 1.4T 0 part /mnt sr0 11:0 1 628K 0 rom sdc 8:32 0 2T 0 disk └─sdc1 8:33 0 2T 0 part sda 8:0 0 240G 0 disk └─sda1 8:1 0 240G 0 part / :`# mkfs.ext4 /dev/sdc1 Mount the volume to a mount directory. Copy Copied! ~# mount /dev/sdc1 /data To mount the volume automatically every time the VM is stopped and restarted, add an entry to /etc/fstab . When adding an entry to /etc/fstab , use a UUID based device path (See device-names-problem for details). For example:. Copy Copied! UUID=33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e /data ext4 defaults,nofail 1 2

You can delete a Data Disk only if it not attached to a VM. Be aware that once you delete a Data Disk, you cannot undo the action.

