Launching an NVIDIA GPU Cloud VM Using Azure CLI

This section explains how to create a GPU Cloud VM instance using the Azure CLI.. For complete CLI documentation and sample scripts visit the Azure CLI 2.0 Documentation.

Using Example Python Scripts

A comprehensive set of example Python scripts for automating the CLI are provided at https://github.com/nvidia/ngc-examples/tree/master/ncsp. You can download the scripts and modify them to meet your requirements. The code examples that follow use similar environment variables and structure as the scripts.

Using the Instructions in this Chapter

This flow and the code snippets in this section are for Linux or Mac OS X. If you are using Windows, you can use the Windows Subsystem for Linux and use the bash shell (where you will be in Ubuntu Linux). Many of these CLI commands can have significant delays.

Installing Azure CLI

Follow the instructions at https://docs.microsoft.com/en-us/cli/azure/install-azure-cli. These include instructions for Linux, Mac, and Windows.

.

Preparing Your VM Variables

Use the following table as a guide for determining the values you will need for creating your GPU Cloud VM. The variable names are arbitrary, and used in the instructions that follow.

Note: Linux host name cannot exceed 64 characters in length or contain the following characters: ` ~ ! @ # $ % ^ & * ( ) = + _ [ ] { } \\ | ; : ' \" , < > / ?.
VARIABLE DESCRIPTION EXAMPLE
AZ_VM_NAME Name for your GPU Cloud VM my-nvgpu-vmi
AZ_RESOURCE_GROUP Your resource group ACME_RG
AZ_IMAGE

The GPU Cloud VMI. See the release notes https://docs.nvidia.com/ngc/ngc-azure-vmi-release-notes for the latest release.

NVIDIA-GPU-Cloud-Image
AZ_LOCATION

A zone that contains GPUs. Refer to https://azure.microsoft.com/en-us/global-infrastructure/services/ to see available locations for NCv2 and NCv3 series SKUs.

westus2

AZ_SIZE The SKU specified by the number of vCPUs, RAM, and GPUs. Refer to https://docs.microsoft.com/en-us/azure/virtual-machines/linux/sizes-gpu for the list of P40, P100, and V100 SKUs to choose from. NC6s_v2
AZ_SSH_KEY <path>/<public-azure-key.pub> ~/.ssh/azure-key.pub
AZ_USER

Your username

jsmith
AZ_NSG Your network security group

my-nvgpu-nsg

Creating Your GPU Cloud VM

Be sure you have installed Azure CLI and that you are ready with the VM setup information listed in the section Preparing Your VM Variables. You can then either manually replace the variable names in the commands in this section with the actual values, or define the variables ahead of time.

  1. Log in to the Azure CLI.
    az login
  2. Enter the following:
    az vm create \
     --name ${AZ_VM_NAME} \
     --resource-group ${AZ_RESOURCE_GROUP} \
     --image ${AZ_IMAGE} \ --location ${AZ_LOCATION} \
     --size ${AZ_SIZE} \ --ssh-key-value ${AZ_SSH_KEY} \
     --admin-username ${AZ_USER} \
     --nsg ${AZ_NSG}
    
    If successful, you should see output consisting of a JSON description of your VM. The GPU Cloud VM gets deployed. Note the public IP address for use when establishing an SSH connection to the VM. You can also set up an AZ_PUBLIC_IP variable by defining an Azure JSON file for the VM as follows:
    AZ_JSON=$(az vm create \
     --name ${AZURE_VM_NAME} \
     --resource-group ${AZ_RESOURCE_GROUP} \
     --image ${AZ_IMAGE} \ --location ${AZ_LOCATION} \
     --size ${AZ_SIZE} \ --ssh-key-value ${AZ_SSH_KEY} \
     --admin-username ${AZ_USER} \
     --nsg ${AZ_NSG})
    AZ_PUBLIC_IP=$(echo $AZ_JSON | jq .publicIpAddress | sed 's/\"//g') && \
     echo $AZ_JSON && echo AZ_PUBLIC_IP=$AZ_PUBLIC_IP 
Azure sets up a non-persistent scratch disk for each VM. See the sections Using Premium Storage SSD for Datasets and Using File Storage for Datasets for instructions on setting up alternate storage for your datasets.

Connecting to Your GPU Instance with SSH

Run ssh to connect to your GPU VM.instance.

ssh -i $AZ_SSH_KEY $AZ_USER@$AZ_PUBLIC_IP

Stopping (Deallocating) and Starting VMs with the CLI

VMs can be stopped and started again without losing any of their storage and other resources.

To stop and deallocate a running VM:

az vm deallocate --resource-group $AZ_RESOURCE_GROUP --name $AZ_VM_NAME

To start a stopped VM:

az vm start --resource-group $AZ_RESOURCE_GROUP --name $AZ_VM_NAME

When starting a stopped VM, you will need to update the public IP variable, as it will change with the newly started VM.

AZ_PUBLIC_IP=$(az network public-ip show \
  --resource-group $AZ_RESOURCE_GROUP \
  --name $AZ_VM_NAME\PublicIP | jq .ipAddress | sed 's/\"//g') && \
  echo AZ_PUBLIC_IP=$AZ_PUBLIC_IP

Deleting VMs and Associated Resources with the CLI

When you created your VM, other resources for that instance were automatically created for you, such as a network interface, public IP address, and boot disk. If you deleted your instance, you will also need to delete these resources.

Perform the deletions in the following order.

  1. Delete your VM.
    az vm delete -g $AZ_RESOURCE_GROUP -n $AZ_VM_NAME
  2. Delete the VM OS disk.
    1. List the disks in your Resource Group.
      az disk list -g $AZ_RESOURCE_GROUP
      The associated OS disk will have the name of your VM as the base name.
    2. Delete the OS disk.
      az disk delete -g $AZ_RESOURCE_GROUP -n MyDisk 
    See https://docs.microsoft.com/en-us/cli/azure/disk?view=azure-cli-latest#az-disk-delete for more information.
  3. Delete the VM network interface.
    1. List the network interface resources in your Resource Group.
      az network nic list -g $AZ_RESOURCE_GROUP
      The associated network interface will have the name of your VM as the base name.
    2. Delete the network interface resource.
      az network nic delete -g $AZ_RESOURCE_GROUP -n MyNic
    See https://docs.microsoft.com/en-us/cli/azure/network/nic?view=azure-cli-latest#az-network-nic-delete for more information.
  4. Delete the VM public IP address.
    1. List the public IPs in your Resource Group.
      az network public-ip list -g $AZ_RESOURCE_GROUP
      The associated public IP will have the name of your VM as the base name.
    2. Delete the public IP.
      az network public-ip delete -g $AZ_RESOURCE_GROUP -n MyIp 
    See https://docs.microsoft.com/en-us/cli/azure/network/public-ip?view=azure-cli-latest#az-network-public-ip-delete for more information.