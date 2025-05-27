NVIDIA makes available on the Amazon Web Services (AWS) platform three different VMIs, known within the AWS ecosystem as an Amazon Machine Image (AMI). These are GPU-optimized AMIs for AWS instances with NVIDIA V100 (EC2 P3 instances), NVIDIA T4 GPUs (EC2 G4 instances), NVIDIA A100 GPUs (EC2 P4d). Additionally, the NVIDIA AMI also supports ARM64 (EC2 G5g) instances.

For those familiar with the AWS platform, the process of launching the instance is as simple as logging in, selecting the NVIDIA GPU-optimized image of choice, configuring settings as needed, then launching the VM. After launching the VM, you can SSH into it and start building a host of AI applications in deep learning, machine learning and data science by leveraging the wide range of GPU-accelerated containers, pre-trained models and resources available from the NGC Catalog.

This document provides step-by-step instructions for accomplishing this, including how to use the AWS CLI.

Prerequisites

These instructions assume the following:

You have an AWS account - https://aws.amazon.com

Browse the NGC website and identified an available NGC container and tag to run on the virtual machine instance (VMI).

Windows Users: The CLI code snippets are for bash on Linux or Mac OS X. If you are using Windows and want to use the snippets as-is, you can use the Windows Subsystem for Linux and use the bash shell (you will be in Ubuntu Linux).

Cloud security starts with the security policies of your CSP account. Refer to the following link for how to configure your security policies for your CSP:

Users must follow the security guidelines and best practices of their CSP to secure their VM and account.

Perform these preliminary setup tasks to simplify the process of launching the NVIDIA Deep Learning AMI.



If you do not already have Key Pairs defined, then you will need to setup your AWS Key Pair and have it on the machine on which you will use the AWS CLI, or from which you will SSH to the instance. In the examples, the key pair is named "my-key-pair".

Once you have your key pair downloaded, make sure they are only readable by you, and (Linux or OSX) move them to your ~/.ssh/ directory.

Copy Copied! chmod 400 my-key-pair*

Copy Copied! mv my-key-pair* ~/.ssh/

Security groups define the network connection restrictions you place on your virtual machine instance. In order to connect to your running instances you will need a Security Group allowing (at minimum) SSH access.



Log into the AWS Console (https://aws.amazon.com), then click EC2 under the Console section located within the All Services drop-down menu.. Enter the Security Groups screen, located on the left under "Network & Security", "Security Groups". Click Create Security Group. Give the Security Group a name (for example, "my-sg"), description, and then click Add Rule Under the "Inbound" section, click Add a rule with the following parameters to enable SSH: Type: SSH

Protocol: TCP

Port Range: 22

Source: My IP You may need to widen the resulting IP filter if you're not on a fixed IP address, or want to access the instance from multiple locations such as work and home. The following shows the filled-out Create Security Group form using the example naming. (Optional) Add additional rules. You may need to add additional rules for HTTP, HTTPS, or other Custom TCP ports depending on the deep learning frameworks you use. Continue adding additional rules by clicking Add Rule, then create rules as needed. Examples: For DIGITS4 Type: Custom TCP Rule Protocol: TCP Port Range: 3448 Source: My IP

For HTTPS secure web frameworks Type: HTTPS Protocol: TCP Port Range: 443 Source: My IP

Security Warning It is important to use proper precautions and security safeguards prior to granting access, or sharing your AMI over the internet. By default, internet connectivity to the AMI instance is blocked. You are solely responsible for enabling and securing access to your AMI. Please refer to AWS guides for managing security groups. Click Create Security Group to complete creation of the Security Group on the bottom right corner. Once created, the Group ID is listed in the Security Group table.

Log into the AWS Console (https://aws.amazon.com), then under the Compute section, click EC2 . Select the AWS Zone from the upper right of the top menu. In order to use NVIDIA Volta and Turing GPUs in AWS, you must select a region that has Amazon EC2 P3 or G4 instances available. The examples in this guide use instances in US West (Oregon) - us-west-2. Check with AWS for Amazom EC2 P3 or G4 instance availability in other regions.

NVIDIA publishes and maintains multiple flavors of a GPU-optimized AMI with all the software needed to pull and run content from NGC. These AMIs should be used to launch your GPU instances.



Click Launch Instance. Select the NVIDIA Deep Learning AMI. Select AWS Marketplace on the Step 1 of the process. Search for and select the NVIDIA GPU-optimized AMIs that best suits your purpose by simply typing in “nvidia” into the search bar. Click Continue on the details page.

Select one of the Amazon EC2 P3 or G4 instance types according to your GPU, CPU, and memory requirements. Click Review and Launch to review the default configuration settings, or continue with the instructions in the next section to configure each setting step-by-step After choosing an instance type, click Next: Configure Instance Details. There are no instance details that need to be configured, so you can proceed to the next step. Add storage. Click Next: Add Storage. While the default 32 GiB for the root volume can be changed, users should not use the root volume for storing datasets since the root volume is destroyed when the instance is terminated. Add tags. Naming your instances helps to keep multiple instances organized. Click Next: Add Tag. Click Add Tag and then fill in the following information: Key: "Name" Value: <instance name, such as "My GPU"> Configure a Security Group Click Next: Configure Security Group. Click Select an existing security group and select the Security Group you created during Before You Get Started.

Click Review and Launch. A window pops up and asks which key pair to use. Select Choose an existing key pair, select your key pair, then check the acknowledgement checkbox. Click Launch Instances.

After launching your instance, click View Instances, locate your instance from the list, then wait until it is in the ‘running’ state. When it is in the running state, select it from the list and then click Connect. Follow the instructions in the pop-up window to establish an SSH connection to the instance. Be sure to use 'ubuntu' for the username. If the instructions for SSH login do not work, see the AWS Connect to Your Linux Instance documentation for additional information.

Once you are done with your instance you can stop (to be started again later) or terminate (delete) it. Refer to the Instance Lifecycle in the AWS documentation for more information.

Instances can be controlled from the Instances page, using the "Actions”->"Instance State" menu to stop, start, or terminate Instances.

If you plan to use AWS CLI, then the CLI must be installed (Windows Users: inside the Windows Subsystem for Linux), updated to the latest version, and configured.

Some of the AWS CLI snippets in these instructions make use of jq , which should be installed on the machine from which you'll run the AWS CLI. You may paste these snippets into your own bash scripts or type them at the command line.



Set up the following environment variables which can be used in the commands for launching the VM instance:



Security Group

The Security Group ID is used as part of the instance creation process. Once created the Group ID can be looked up in the AWS Console, or retrieved by name with the following snippet, and stored in the $NVAWS_SG_ID environment variable.

Copy Copied! NVAWS_SG_NAME='my-sg'

Copy Copied! NVAWS_SG_ID=$(aws ec2 describe-security-groups --group-name "$NVAWS_SG_NAME" | jq .SecurityGroups[0].GroupId | sed 's/\"//g') && echo NVAWS_SG_ID=$NVAWS_SG_ID

Image ID

The following snippet will list the current "NVIDIA Deep Learning AMI" Image ID, and stored in the $NVAWS_IMAGE_ID environment variable.

Copy Copied! NVAWS_IMAGE_NAME='NVIDIA Deep Learning AMI'

Copy Copied! NVAWS_IMAGE_ID=$(aws ec2 describe-images --filters "Name=name,Values=$NVAWS_IMAGE_NAME" | jq .Images[0].ImageId | sed 's/\"//g') && echo NVAWS_IMAGE_ID=$NVAWS_IMAGE_ID

Other Environment Variables

Set up other env variables as follows, using your information:

Copy Copied! NVAWS_KEYNAME=my-key-pair

Copy Copied! NVAWS_KEYPATH=~/.ssh/

Copy Copied! NVAWS_REGION=us-west-2

Copy Copied! NVAWS_INSTANCE_TYPE=p3.2xlarge

Copy Copied! NVAWS_EBS_GB=32

Copy Copied! NVAWS_NAME_TAG='My GPU'

Be sure to set a unique NVAWS_NAME_TAG for each instance you launch.

Launch the instance and capture the resulting JSON:

Copy Copied! NVAWS_LAUNCH_JSON=$(aws ec2 run-instances --image-id $NVAWS_IMAGE_ID \ --instance-type $NVAWS_INSTANCE_TYPE \ --region $NVAWS_REGION \ --key-name $NVAWS_KEYNAME \ --security-group-ids $NVAWS_SG_ID \ --block-device-mapping "[{\"DeviceName\":\"/dev/sda1\",\"Ebs\":{\"VolumeSize\":$NVAWS_EBS_GB}}]" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$NVAWS_NAME_TAG}]") NVAWS_INSTANCE_ID=$(echo $NVAWS_LAUNCH_JSON | jq .Instances[0].InstanceId | sed 's/\"//g') && echo NVAWS_INSTANCE_ID=$NVAWS_INSTANCE_ID

The resulting Instance ID is stored in the NVAWS_INSTANCE_ID environment variable.

The launch process can take several minutes once a machine is available, and can be watched in the AWS Console Instances page or with the CLI using:

Copy Copied! aws ec2 describe-instance-status --instance-id $NVAWS_INSTANCE_ID | jq '.InstanceStatuses[0].InstanceState.Name + " " + .InstanceStatuses[0].SystemStatus.Status'

Once the instance is "running initializing", you will be able to get the Public DNS name with:

Copy Copied! NVAWS_DNS=$(aws ec2 describe-instances --instance-id $NVAWS_INSTANCE_ID | jq '.Reservations[0].Instances[0].PublicDnsName' | sed 's/\"//g') && \ echo NVAWS_DNS=$NVAWS_DNS

SSH should work shortly after the instance reaches "running ok".

If started with CLI snippets and environment variables above, the command to SSH to your instance is:

Copy Copied! ssh -i $NVAWS_KEYPATH/$NVAWS_KEYNAME.pem ubuntu@$NVAWS_DNS

Otherwise use your .pem key filename and the Public DNS name from the AWS Console to connect:

Copy Copied! ssh -i my-key-pair.pem ubuntu@public-dns-name

If these instructions for SSH login do not work, see the AWS Connect to Your Linux Instance documentation for additional information.

Once you are done with your instance you can stop (to be started again later) or terminate (delete) it. Refer to the Instance Lifecycle in the AWS documentation for more information.

Stop:

Copy Copied! aws ec2 stop-instances --instance-ids $NVAWS_INSTANCE_ID

Start:

Copy Copied! aws ec2 start-instances --instance-ids $NVAWS_INSTANCE_ID

Terminate:

Copy Copied! aws ec2 terminate-instances --instance-ids $NVAWS_INSTANCE_ID

You can create elastic block storage (EBS) from the AWS Console. EBS is used for persistent data storage, however, EBS cannot be shared across multiple VMs. To share persistent data storage, you need to use EFS.

The instructions set up a general purpose SSD volume type. However, you can specify a provisioned IOPS SSD for higher throughput, or set up software RAID, using mdadm, to create a volume with multiple EBS volumes.

See the Amazon documentation RAID Configuration on Linux for instructions on how to set up software RAID on local disks.

EBS is available in most regions with Amazon EC2 P3 or G4 instances.



Open the EBS Volumes Console. Go to the main AWS console, click EC2, then expand Elastic Block Storefrom the side menu, if necessary, and click Volumes. Click Create Volume. Make selections at the Create Volumepage. Select General Purpose SSD (GP2) for the Volume Type. If higher throughput is needed, select Provisioned IOPS SSD (IO1) .

for the Volume Type. Specify the volume size and Availability Zone.

(Optional) Add Tags.

Encryption is not needed if you are working with public datasets.

Snapshot ID is not needed. Review the options and then click Create Volume.

Once you have created the EBS volume, select the volume and then select Actions->Attach Volume. Specify your EC2-instance ID as well as a drive letter for the device name (for example, sdf), then click Attach. This creates a /dev/xvdf (or the driver letter that you picked) virtual disk on your EC2 instance. You can view the volume by running the lsblk command. Copy Copied! ~$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 128G 0 disk └─xvda1 202:1 0 128G 0 part / xvdf 202:16 0 250G 0 disk Create a filesystem on the EBS volume. Copy Copied! ~# mkfs.ext4 /dev/xvdf mke2fs 1.42.13 (17-May-2015) Creating filesystem with 65536000 4k blocks and 16384000 inodes Filesystem UUID: b0e3dee3-bf86-4e69-9488-cf4d4b57b367 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Mount the volume to a mount directory. Copy Copied! ~# mount /dev/xvdf /data To mount the volume automatically every time the instance is stopped and restarted, add an entry to /etc/fstab . Refer to Amazon Documentation Making a Volume Available for Use

Be aware that once you delete an EBS, you cannot undelete it.



Open the EBS Volumes Console. Go to the main AWS console, click EC2, then expand Elastic Block Storefrom the side menu, if necessary, and click Volumes. Select your EBS. Detach the volume from the EC2 instance. Select Actions->Detach Volume, then click Yes, Detach from the confirmation dialog. Delect the storage volume. Select Actions->Delete Volume and then click Yes, Delete from the confirmation dialog.

Once you have created the EBS volume, you can upload datasets to the volume.



Mount the EBS volume to /data . Issue the following to perform the one-time mount. Copy Copied! sudo mkdir /data sudo mount /dev/xvdf /data sudo chmod 777 /data Copy the dataset onto the EBS volume in /data . Copy Copied! scp -i <.pem> -r local_dataset_dir/ ubuntu@<ec2-instance>:/data

Mount the EFS storage to /data , using the EFS storage DNS name. Issue the following to perform the one-time mount. Copy Copied! sudo mkdir /efs sudo mount -t nfs4 -o \ nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \ EFS-DNS-NAME:/ /efs sudo chmod 777 /efs sudo cp -r /efs/<dataset> to /data Copy the dataset from the EFS to the EBS volume.. Copy Copied! sudo cp -r /efs/<dataset> to /data

It is recommended that you use the AWS Console for EBS management. If you need to manage EBS file systems with the CLI, NVIDIA has created scripts available on GitHub at https://github.com/nvidia/ngc-examples.

These scripts will let you perform basic EBS management and can serve as the basis for further automation.