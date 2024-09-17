NVIDIA makes available on Alibaba Cloud three different virtual machine images (VMIs). These are GPU-optimized VMIs for Alibaba Cloud VM instances with NVIDIA V100 or NVIDIA T4 GPUs.

NVIDIA GPU-Optimized Image for Deep Learning, Machine Learning & HPC The base GPU-Optimized image Includes Ubuntu Server, the NVIDIA driver, Docker CE, and the NVIDIA Container Runtime for Docker

NVIDIA GPU-Optimized Image for TensorFlow The base image with NVIDIA’s GPU-Accelerated TensorFlow container pre-installed

NVIDIA GPU-Optimized Image for PyTorch The base image with NVIDIA’s GPU-Accelerated PyTorch container pre-installed

For those familiar with the Alibaba Cloud platform, the process of launching the instance is as simple as logging in, selecting the NVIDIA GPU-optimized Image of choice, selecting and configuring a cloud instance with at least one supported NVIDIA GPU, and then launching the VM. After launching the VM, you can SSH into it and start using the wide range of GPU-accelerated containers, pre-trained models, and other resources available from the NGC Catalog.

This document provides step-by-step instructions for accomplishing this, including how to use the Alibaba Cloud CLI.

Prerequisites

These instructions assume the following:

You have an Alibaba account - https://home-intl.console.aliyun.com/ with permissions to create resources.

Browse the NGC website and identified an available NGC container and tag to run on the VMI.

Windows Users: The CLI code snippets are for bash on Linux or Mac OS X. If you are using Windows and want to use the snippets as-is, you can use the Windows Subsystem for Linux and use the bash shell (you will be in Ubuntu Linux).

Cloud security starts with the security policies of your CSP account. Refer to the following link for how to configure your security policies for your CSP:

Users must follow the security guidelines and best practices of their CSP to secure their VM and account.

If you do not already have SSH keys set up specifically for Alibaba, you will need to set one up and have it on the machine you will use to SSH to the VM. In the examples, the key is named "alibaba-key".



From a browser, log in to the ECS console - https://ecs.console.aliyun.com/. Open the left navigation menu tab and then click SSH Key Pairs from the Network & Security group. From the upper right of the screen, click Create SSH Key Pair. Give it a name, such as "alibaba-key" and click OK. A .pem file will immediately download. This is the ONLY time you can download it. After downloading the .pem file, move it to the .ssh directory. Copy Copied! mv alibaba-key.pem ~/.ssh/ chmod 400 ~/.ssh/alibaba-key.pem On Windows, the location will depend on the SSH client you use, so modify the path above and in the snippets or your SSH client configuration. See the Alibaba documentation for Creating an SSH key pair.

In order to create instances, you need to put them in a Security Group.



Log in to the ECS console - https://ecs.console.aliyun.com/. Open the left navigation menu tab and then click Security Groups from the Network & Security group. From the upper right of the screen, click Create Security Group. Give it a name and description, and create a Virutal Private Cloud (VPC) if one doesn't exist yet. Under the inbound tab, configure the following options. Add SSH and HTTPS. At Custom Port Range, select TCP and then enter 5000/5000. Set Authorization Object = 0.0.0.0/0 or the IP address from which you will access. Click OK. Security Warning It is important to use proper precautions and security safeguards prior to granting access, or sharing your AMI over the internet. By default, internet connectivity to the AMI instance is blocked. You are solely responsible for enabling and securing access to your AMI. Please refer to Alibaba guides for managing security groups.

Configure the following instance settings. Billing Method : Pay-As-You-Go

: Pay-As-You-Go Region : Select a region that has GPU instances (Note: Not all regions have GPUs)

: Select a region that has GPU instances (Note: Not all regions have GPUs) Instance Type : Select Heterogeneous Computing and select an instance type with NVIDIA V100 or T4 GPUs

: Select and select an instance type with NVIDIA V100 or T4 GPUs Image: Ensure the NVIDIAGPU-Optimized Image you chose previously is selected

Ensure the NVIDIAGPU-Optimized Image you chose previously is selected Storage: Add a disk for dataset storage by clicking Add Disk under Data Disk, and then entering the storage size. Recommended minimum dataset storage size is 1 TB (1024 GB) Click Next: Networking and select the security group you previously created in the Before You Get Started section. Click Next: System Configuration and select the SSH Key Pair you previously created in the Before You Get Started section. Click Preview, review the configuration and accept the terms of service, and then click Create Instance.

Click Console on the Create page. Wait until the status of your VM displays “Running” and then connect via SSH using the actions section of the VM details. Once started, you can SSH into your instance using the SSH key for the root user. If you followed the setup in this tutorial, your key is in ~/.ssh/ . Command Syntax $ ssh -i <KEYPATH> root@<IP> Example $ ssh -i ~/.ssh/alibaba-key.pem root@47.89.248.188 Refer to Connect to a Linux Instance for more instructions on connecting to your instance.

Navigate to Instances under the Instances & Images section in the navigation pane on the left. Select the virtual machine instance you wish to manage and use the options bar at the bottom to start/stop, and release to terminate the instance and delete any associated resources.





This flow and the code snippets in this section are for Linux or Mac OS X. If you are using Windows, you can use the Windows Subsystem for Linux and use the bash shell (where you will be in Ubuntu Linux).

Many of these CLI command can have significant delays.

For complete CLI documentation and sample scripts visit the Alibaba Documentation Center.

To use the Alibaba CLI, follow the Alibaba CLI Install Instructions and also install the ECS SDK.



Install the ECS SDK. Copy Copied! sudo pip install aliyun-python-sdk-ecs Configure the CLI with your keys. Copy Copied! aliyuncli configure

Once started, you can SSH into your instance using the SSH key for the root user. If you followed the setup in this tutorial, your key is in ~/.ssh/ .

You need to specify a source ImageID when creating an instance. Use this command to find the latest ImageID of the NVIDIA-GPU-Cloud-Machine-Image:

Copy Copied! aliyuncli ecs DescribeImages --RegionId us-west-1 \ --ImageName "NVIDIA-GPU-Cloud-Virtual-Machine" \ --output json --filter Images.Image[0].ImageId



It will output the Image ID such as "m-rj9iy0xjiod3ghkyhz4p"

Creating an instance with the CLI is done using the `aliyuncli ecs CreateInstance` command.

Full syntax documentation - https://www.alibabacloud.com/help/doc-detail/25499.htm



Recommended Instance Options

"--InternetMaxBandwidthOut 10" sets the peak outbound network bandwidth to 10 Mbps. The valid range is [1, 200].

"--InstanceChargeType PostPaid" sets the billing method to pay-as-you-go. Change this to "PrePaid" to set it to a subscription billing.

Other Notable Create Instance Options

The inbound network bandwidth defaults to 200 Mbps. Use "--InternetMaxBandwidthIn" to change this. The valid range is [1, 200].

To change the size of the system disk (default is 40 GB), use the "--SystemDiskSize" option. Valid values are [40, 500].

To add a data disk (up to 16), use the "--DataDiskNSize" and "--DataDiskNCategory" options where "N" is [1, 16]. Valid values are: DataDiskNCategory DataDiskNSize Description cloud [5, 2000] (default) Basic cloud disk cloud_efficiency [20, 32768] Ultra cloud disk cloud_ssd [20, 32768] Cloud SSD ephemeral_ssd [5, 800] Ephemeral SSD

Launch Example

Launch the instance and capture the resulting JSON:

Copy Copied! aliyuncli ecs CreateInstance \ --RegionId us-west-1 \ --ImageId "m-rj9iy0xjiod3ghkyhz4p" \ --SecurityGroupId "sg-rj94krsusal2k5l6gnnz" \ --InstanceType ecs.gn5-c4g1.xlarge \ --InstanceName "my-instance" \ --InternetMaxBandwidthOut 10 \ --InstanceChargeType PostPaid \ --KeyPairName alibaba-key

The output shows the instance ID.

Copy Copied! { "InstanceId": "i-rj9a0iw25hryafj0fm4v", "RequestId": "440ECC70-09F9-492C-AB9E-21AA9C4E0531" }

Instances created via CLI are not automatically given a public IP address.



To assign a public IP address to the instance you just created, run:

Copy Copied! aliyuncli ecs AllocatePublicIpAddress --RegionId us-west-1 \ --InstanceId "i-rj9a0iw25hryafj0fm4v"

Successful completion of the command will return the IP address:

Copy Copied! { "IpAddress": "47.89.248.188", "RequestId": "65EB59AE-FA75-446F-B5C7-2BA0F9A77CDC" }

Instances created via CLI are not started automatically.



To start the instance you just created, run:

Copy Copied! aliyuncli ecs StartInstance --InstanceId "i-rj9a0iw25hryafj0fm4v"

Once started, you can SSH into your instance using the SSH key for the root user. If you followed the setup in this tutorial, your key is in ~/.ssh/ .

Command syntax:

Copy Copied! ssh -i <KEYPATH> root@<IP>

Example:

Copy Copied! ssh -i ~/.ssh/alibaba-key.pem root@47.89.248.188



Refer to Connect to a Linux Instance for more instructions on connecting to your instance.

Once an instance is running, you can stop, (re)start, or delete your instance.

Stop:

Copy Copied! aliyuncli ecs StopInstance --InstanceId INSTANCE_ID

Start or Restart:

Copy Copied! aliyuncli ecs StartInstance --InstanceId INSTANCE_ID

Delete: