Launching an NVIDIA GPU Cloud Image with the gcloud CLI

This section explains how to create a GPU Cloud instance using the gcloud CLI..

Using Example Python Scripts

A comprehensive set of example Python scripts for automating the CLI are provided at https://github.com/nvidia/ngc-examples/tree/master/ncsp. You can download the scripts and modify them to meet your requirements. The code examples that follow use similar environment variables and structure as the scripts.

Using the Instructions in this Chapter

This flow and the code snippets in this section are for Linux or Mac OS X. If you are using Windows, you can use the Windows Subsystem for Linux and use the bash shell (where you will be in Ubuntu Linux). Many of these CLI commands can have significant delays.

For more information about creating a deployment using gcloud CLI, see Creating a Deployment using gcloud or the API.

Installing and Setting Up gcloud CLI

Follow the instructions at https://cloud.google.com/sdk/docs/quickstarts. These include instructions for Linux, Mac, and Windows.

The instructions walk you through the platform specific install and initial gcloud login.

For at least the Mac, you will be given a large list of additional gcloud components to install such as extensions for GO, Python and Java. You can use the defaults for now, and use the gcloud components command later to list, install, or remove them.

Once the setup is complete, start a new shell since your environment has been updated.

Preparing the Create Instance Options

You will need to specify the following options when creating the custom GPU instance.

OPTION [1] VALUE NOTES
<instance-name> Name of your choosing. Ex. “my-ngc-instance” Must be all lowercase, with no spaces. Hyphens and numbers are allowed.
--project "<my-project-id>"

This is the project in which the VM will be created.

Use

gcloud projects list

to view PROJECT ID to use for this field.

--zone

One of the following zones that contain GPUs:

"us-west1-b"

"us-east1-c"

"us-east1-d"

"europe-west1-b"

"europe-west1-d"

"asia-east1-a"

"asia-east1-b"

Pick one nearest you and with the GPUs you want to use.
--machine-type

One of the following:

"custom-10-61440" (for 1x P100 or V100)

"custom-20-122880" (for 2x P100)

"custom-40-212992" (for 4x P100)

"custom-80-491520" (for 8x V100)

vCPU/Memory configuration of the VM in "custom-<#vCPUs>-<memory MB>" format.

Recommended ratio is 1 GPU : 10 vCPUs : 60 GB memory

--subnet “default”, or the name of the VPC network to use
--metadata "ssh-keys=<user-id>:ssh-rsa <ssh-key> <user-email>"
--maintenance-policy

"TERMINATE"

What to do with your instance when Google performs maintenance on the host
--service-account

Compute Engine identity attached to the instance.

Use

gcloud iam service-accounts list 
to view the email for your account.
--scope

"https://www.googleapis.com/auth/devstorage.read_only",

"https://www.googleapis.com/auth/logging.write",

"https://www.googleapis.com/auth/monitoring.write",

"https://www.googleapis.com/auth/servicecontrol",

"https://www.googleapis.com/auth/service.management.readonly",

"https://www.googleapis.com/auth/trace.append"

Default values (recommended). Specifies the permissions for your instance.
--accelerator

nvidia-tesla-p100,count=[1,2,4]

Which GPU to attach, and how many
--min-cpu-platform

"Intel Broadwell"

(for P100 instances)

--image Name of the latest NVIDIA GPU Cloud Image (See the NGC GCP VMI Release Notes for the current name.)
--image-project "nvidia-ngc-public" Project name in which the NVIDIA GPU Cloud Image is located
--boot-disk-size 32
--boot-disk-type "pd-standard"
--boot-disk-device-name Name of your choosing Recommend using the same name as your VM instance for easy correlation

Creating a Custom GPU Instance

Use the Python scripts provided at https://github.com/nvidia/ngc-examples/tree/master/ncsp to create your custom GPU instance. You can also enter the following, using the information gathered in the previous section:

gcloud compute \
--project "<project-id>" \
instances create "<instance-name>" \
--zone "<zone>" \
--machine-type "<vCPU-mem-config>" \
--subnet "<subnet-name>" \
--metadata "<your-public-ssh-key>" \
--maintenance-policy "<maintenance-policy>" \
--service-account "<service-account-email>" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring.write","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--accelerator type=<accelerator-type> \
--min-cpu-platform "<CPU-platform>" \
--image "<nvidia-gpu-cloud-image>" \
--image-project "<project-name>" \
--boot-disk-size "32" \
--boot-disk-type "pd-standard" \
--boot-disk-device-name "<boot-disk-name>"

The GPU Cloud instance starts running as soon as it is created.

Connecting to Your GPU Instance with SSH

If you ran the scripts from https://github.com/nvidia/ngc-examples/tree/master/ncsp you should be connected to your instance. Otherwise, run ssh to connect to your GPU instance, or enter the following gcloud command.

Command syntax:

gcloud compute --project "<project-id>" ssh --zone "<zone>" "<instance-name>"

See https://cloud.google.com/compute/docs/instances/connecting-to-instance for more information about connecting to your GPU instance.

The latest NVIDIA drivers must be installed on the NVIDIA GPU Cloud Image instance before running. If the drivers have not yet been installed on this instance, then upon connecting, the instance startup script asks if you want to download and install the latest NVIDIA drivers.

NVIDIA GPU Cloud (NGC) is an optimized software environment that requires the latest NVIDIA drivers to operate. If you do not download the NVIDIA drivers at this time, your instance will shut down. Would you like to download the latest NVIDIA drivers so NGC can finish installing? (Y/n)
  1. Press Y to install the latest NVIDIA drivers and proceed with the connection.

    If you press N, then the connection process will abort and the instance will be stopped.

    The script also initiates the Docker login process automatically, at which point you must enter your NGC API Key.
  2. Enter your NGC API Key to complete the login.

Stopping and Restarting Your GPU Instance

Once an instance is running, you can stop and (re)start your instance.

Stop:

gcloud compute instances stop <instance-name>

Start or Restart:

gcloud compute instances start <instance-name> <zone>

More Advanced CLI Usage

For more CLI documentation, visit the gcloud Compute Documentation.