NGC on AWS Virtual Machines

This NGC on AWS Virtual Machines documentation explains how to set up an NVIDIA Deep Learning AMI on Amazon EC2 services, and also provides release notes for each version of the NVIDIA image.

1. Using NGC on AWS Virtual Machines

NVIDIA makes available on the Amazon Web Services (AWS) platform three different VMIs, known within the AWS ecosystem as an Amazon Machine Image (AMI). These are GPU-optimized AMIs for AWS instances with NVIDIA V100 (EC2 P3 instances) or NVIDIA T4 GPUs (EC2 G4 instances).

  • NVIDIA GPU-Optimized Image for Deep Learning, Machine Learning & HPC

    The base GPU-Optimized image Includes Ubuntu Server, the NVIDIA driver, Docker CE, and the NVIDIA Container Runtime for Docker

  • NVIDIA GPU-Optimized Image for TensorFlow

    The base image with the NVIDIA GPU-Accelerated TensorFlow container pre-installed

  • NVIDIA GPU-Optimized Image for PyTorch

    The base image with the NVIDIA GPU-Accelerated PyTorch container pre-installed

  • NVIDIA HPC SDK Image

    The base image with the NVIDIA HPC SDK pre-installed

For those familiar with the AWS platform, the process of launching the instance is as simple as logging in, selecting the NVIDIA GPU-optimized image of choice, configuring settings as needed, then launching the VM. After launching the VM, you can SSH into it and start building a host of AI applications in deep learning, machine learning and data science by leveraging the wide range of GPU-accelerated containers, pre-trained models and resources available from the NGC Catalog.      

This document provides step-by-step instructions for accomplishing this, including how to use the AWS CLI.

Prerequisites

These instructions assume the following:

  • You have an AWS account - https://aws.amazon.com

  • Browse the NGC website and identified an available NGC container and tag to run on the virtual machine instance (VMI).
  • Windows Users: The CLI code snippets are for bash on Linux or Mac OS X. If you are using Windows and want to use the snippets as-is, you can use the Windows Subsystem for Linux and use the bash shell (you will be in Ubuntu Linux).

1.1. Beforce You Get Started

Perform these preliminary setup tasks to simplify the process of launching the NVIDIA Deep Learning AMI.

1.1.1. Setting Up Your AWS Key Pair

If you do not already have Key Pairs defined, then you will need to setup your AWS Key Pair and have it on the machine on which you will use the AWS CLI, or from which you will SSH to the instance. In the examples, the key pair is named "my-key-pair".

Once you have your key pair downloaded, make sure they are only readable by you, and (Linux or OSX) move them to your ~/.ssh/ directory.

chmod 400 my-key-pair*
mv my-key-pair* ~/.ssh/

1.1.2. Set Up Security Groups for the EC2 Instance

Security groups define the network connection restrictions you place on your virtual machine instance. In order to connect to your running instances you will need a Security Group allowing (at minimum) SSH access.

  1. Log into the AWS Console (https://aws.amazon.com), then click EC2 under the Console section located within the All Services drop-down menu..
  2. Enter the Security Groups screen, located on the left under "Network & Security", "Security Groups".

  3. Click Create Security Group.
  4. Give the Security Group a name (for example, "my-sg"), description, and then click Add Rule
  5. Under the "Inbound" section, click Add a rule with the following parameters to enable SSH:
    • Type: SSH
    • Protocol: TCP
    • Port Range: 22
    • Source: My IP

    You may need to widen the resulting IP filter if you're not on a fixed IP address, or want to access the instance from multiple locations such as work and home.

    The following shows the filled-out Create Security Group form using the example naming.

  6. (Optional) Add additional rules.

    You may need to add additional rules for HTTP, HTTPS, or other Custom TCP ports depending on the deep learning frameworks you use.

    Continue adding additional rules by clicking Add Rule, then create rules as needed.

    Examples:

    • For DIGITS4

      • Type: Custom TCP Rule
      • Protocol: TCP
      • Port Range: 3448
      • Source: My IP
    • For HTTPS secure web frameworks

      • Type: HTTPS
      • Protocol: TCP
      • Port Range: 443
      • Source: My IP
  7. Click Create Security Group to complete creation of the Security Group on the bottom right corner.

    Once created, the Group ID is listed in the Security Group table.

1.2. Creating an NGC Certified Virtual Machine using AWS Console

1.2.1. Log In and Select the AWS Region

  1. Log into the AWS Console (https://aws.amazon.com), then under the Compute section, click EC2 .
  2. Select the AWS Zone from the upper right of the top menu.

    In order to use NVIDIA Volta and Turing GPUs in AWS, you must select a region that has Amazon EC2 P3 or G4 instances available. The examples in this guide use instances in US West (Oregon) - us-west-2. Check with AWS for Amazom EC2 P3 or G4 instance availability in other regions.

1.2.2. Create a VM and Choose an NVIDIA GPU-Optimized AMI

NVIDIA publishes and maintains multiple flavors of a GPU-optimized AMI with all the software needed to pull and run content from NGC. These AMIs should be used to launch your GPU instances.

  1. Click Launch Instance.

  2. Select the NVIDIA Deep Learning AMI.
    1. Select AWS Marketplace on the Step 1 of the process.
    2. Search for and select the NVIDIA GPU-optimized AMIs that best suits your purpose by simply typing in “nvidia” into the search bar.
    3. Click Continue on the details page.

1.2.3. Select an Instance Type with GPUs and Configure Instance Settings

  1. Select one of the Amazon EC2 P3 or G4 instance types according to your GPU, CPU, and memory requirements.
  2. Click Review and Launch to review the default configuration settings, or continue with the instructions in the next section to configure each setting step-by-step
  3. After choosing an instance type, click Next: Configure Instance Details.

    There are no instance details that need to be configured, so you can proceed to the next step.

  4. Add storage.

    Click Next: Add Storage.

    While the default 32 GiB for the root volume can be changed, users should not use the root volume for storing datasets since the root volume is destroyed when the instance is terminated.

  5. Add tags.

    Naming your instances helps to keep multiple instances organized.

    1. Click Next: Add Tag.
    2. Click Add Tag and then fill in the following information:

      Key: "Name"

      Value: <instance name, such as "My GPU">

  6. Configure a Security Group
    1. Click Next: Configure Security Group.
    2. Click Select an existing security group and select the Security Group you created during Before You Get Started.

1.2.4. Launching Your VM Instance

  1. Click Review and Launch.

    A window pops up and asks which key pair to use.

  2. Select Choose an existing key pair, select your key pair, then check the acknowledgement checkbox.
  3. Click Launch Instances.

1.2.5. Connect to Your VM Instance

  1. After launching your instance, click View Instances, locate your instance from the list, then wait until it is in the ‘running’ state.
  2. When it is in the running state, select it from the list and then click Connect.
  3. Follow the instructions in the pop-up window to establish an SSH connection to the instance.

    Be sure to use 'ubuntu' for the username.

    If the instructions for SSH login do not work, see the AWS Connect to Your Linux Instance documentation for additional information.

1.2.6. Start/Stop/Terminate Your VM Instance

Once you are done with your instance you can stop (to be started again later) or terminate (delete) it. Refer to the Instance Lifecycle in the AWS documentation for more information.

Instances can be controlled from the Instances page, using the "Actions”->"Instance State" menu to stop, start, or terminate Instances.

1.3. Creating an NGC Certified Virtual Machine Through the AWS CLI

If you plan to use AWS CLI, then the CLI must be installed (Windows Users: inside the Windows Subsystem for Linux), updated to the latest version, and configured.

Some of the AWS CLI snippets in these instructions make use of jq, which should be installed on the machine from which you'll run the AWS CLI. You may paste these snippets into your own bash scripts or type them at the command line.

1.3.1. Set Up Environment Variables

Set up the following environment variables which can be used in the commands for launching the VM instance:

Security Group

The Security Group ID is used as part of the instance creation process. Once created the Group ID can be looked up in the AWS Console, or retrieved by name with the following snippet, and stored in the $NVAWS_SG_ID environment variable.

NVAWS_SG_NAME='my-sg'
NVAWS_SG_ID=$(aws ec2 describe-security-groups --group-name "$NVAWS_SG_NAME" | jq .SecurityGroups[0].GroupId | sed 's/\"//g') && echo NVAWS_SG_ID=$NVAWS_SG_ID

Image ID

The following snippet will list the current "NVIDIA Deep Learning AMI" Image ID, and stored in the $NVAWS_IMAGE_ID environment variable.

NVAWS_IMAGE_NAME='NVIDIA Deep Learning AMI'
NVAWS_IMAGE_ID=$(aws ec2 describe-images --filters "Name=name,Values=$NVAWS_IMAGE_NAME" | jq .Images[0].ImageId | sed 's/\"//g') && echo NVAWS_IMAGE_ID=$NVAWS_IMAGE_ID

Other Environment Variables

Set up other env variables as follows, using your information:

NVAWS_KEYNAME=my-key-pair
NVAWS_KEYPATH=~/.ssh/
NVAWS_REGION=us-west-2
NVAWS_INSTANCE_TYPE=p3.2xlarge
NVAWS_EBS_GB=32
NVAWS_NAME_TAG='My GPU'

Be sure to set a unique NVAWS_NAME_TAG for each instance you launch.

1.3.2. Launch Your VM Instance

Launch the instance and capture the resulting JSON:
NVAWS_LAUNCH_JSON=$(aws ec2 run-instances --image-id $NVAWS_IMAGE_ID \
  --instance-type $NVAWS_INSTANCE_TYPE \
  --region $NVAWS_REGION \
  --key-name $NVAWS_KEYNAME \
  --security-group-ids $NVAWS_SG_ID \
  --block-device-mapping
 "[{\"DeviceName\":\"/dev/sda1\",\"Ebs\":{\"VolumeSize\":$NVAWS_EBS_GB}}]" \
  --tag-specifications
 "ResourceType=instance,Tags=[{Key=Name,Value=$NVAWS_NAME_TAG}]")
NVAWS_INSTANCE_ID=$(echo $NVAWS_LAUNCH_JSON | jq .Instances[0].InstanceId | sed 's/\"//g') && echo NVAWS_INSTANCE_ID=$NVAWS_INSTANCE_ID

The resulting Instance ID is stored in the NVAWS_INSTANCE_ID environment variable.

The launch process can take several minutes once a machine is available, and can be watched in the AWS Console Instances page or with the CLI using:

aws ec2 describe-instance-status --instance-id $NVAWS_INSTANCE_ID | jq '.InstanceStatuses[0].InstanceState.Name + " " + .InstanceStatuses[0].SystemStatus.Status'

Once the instance is "running initializing", you will be able to get the Public DNS name with:

NVAWS_DNS=$(aws ec2 describe-instances --instance-id $NVAWS_INSTANCE_ID | jq '.Reservations[0].Instances[0].PublicDnsName' | sed 's/\"//g') && \  echo NVAWS_DNS=$NVAWS_DNS

1.3.3. Connect To Your VM Instance

SSH should work shortly after the instance reaches "running ok".

If started with CLI snippets and environment variables above, the command to SSH to your instance is:

ssh -i $NVAWS_KEYPATH/$NVAWS_KEYNAME.pem ubuntu@$NVAWS_DNS

Otherwise use your .pem key filename and the Public DNS name from the AWS Console to connect:

ssh -i my-key-pair.pem ubuntu@public-dns-name

If these instructions for SSH login do not work, see the AWS Connect to Your Linux Instance documentation for additional information.

1.3.4. Start/Stop/Terminate Your VM Instance

Once you are done with your instance you can stop (to be started again later) or terminate (delete) it. Refer to the Instance Lifecycle in the AWS documentation for more information.

Stop:

aws ec2 stop-instances --instance-ids $NVAWS_INSTANCE_ID

Start:

aws ec2 start-instances --instance-ids $NVAWS_INSTANCE_ID

Terminate:

aws ec2 terminate-instances --instance-ids $NVAWS_INSTANCE_ID

1.4. Persistent Data Storage for AWS Virtual Machines

You can create elastic block storage (EBS) from the AWS Console. EBS is used for persistent data storage, however, EBS cannot be shared across multiple VMs. To share persistent data storage, you need to use EFS.

The instructions set up a general purpose SSD volume type. However, you can specify a provisioned IOPS SSD for higher throughput, or set up software RAID, using mdadm, to create a volume with multiple EBS volumes.

See the Amazon documentation RAID Configuration on Linux for instructions on how to set up software RAID on local disks.

EBS is available in most regions with Amazon EC2 P3 or G4 instances.

1.4.1. Create an EBS

  1. Open the EBS Volumes Console.

    Go to the main AWS console, click EC2, then expand Elastic Block Store from the side menu, if necessary, and click Volumes.

  2. Click Create Volume.
  3. Make selections at the Create Volume page.
    • Select General Purpose SSD (GP2) for the Volume Type.

      If higher throughput is needed, select Provisioned IOPS SSD (IO1).

    • Specify the volume size and Availability Zone.
    • (Optional) Add Tags.
    • Encryption is not needed if you are working with public datasets.
    • Snapshot ID is not needed.
  4. Review the options and then click Create Volume.

1.4.2. Attach an EBS Volume to an EC2 Instance

  1. Once you have created the EBS volume, select the volume and then select Actions->Attach Volume.
  2. Specify your EC2-instance ID as well as a drive letter for the device name (for example, sdf), then click Attach.

    This creates a /dev/xvdf(or the driver letter that you picked) virtual disk on your EC2 instance.

    You can view the volume by running the lsblk command.
    ~$ lsblk 
    
    NAME     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT 
    xvda     202:0    0  128G 0 disk 
    └─xvda1 202:1   0  128G 0 part /
    xvdf     202:16   0  250G 0 disk
  3. Create a filesystem on the EBS volume.
    ~# mkfs.ext4 /dev/xvdf 
    
    mke2fs 1.42.13 (17-May-2015) 
    Creating filesystem with 65536000 4k blocks and 16384000 inodes 
    Filesystem UUID: b0e3dee3-bf86-4e69-9488-cf4d4b57b367 
    Superblock backups stored on blocks:
     32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
     4096000, 7962624, 11239424, 20480000, 23887872 
    Allocating group tables: done 
    Writing inode tables: done 
    Creating journal (32768 blocks): done 
    Writing superblocks and filesystem accounting information: done 
  4. Mount the volume to a mount directory.
    ~# mount /dev/xvdf /data

    To mount the volume automatically every time the instance is stopped and restarted, add an entry to /etc/fstab. Refer to Amazon Documentation Making a Volume Available for Use

1.4.3. Delete an EBS Volume

Be aware that once you delete an EBS, you cannot undelete it.

  1. Open the EBS Volumes Console.

    Go to the main AWS console, click EC2, then expand Elastic Block Store from the side menu, if necessary, and click Volumes.

  2. Select your EBS.
  3. Detach the volume from the EC2 instance.

    Select Actions->Detach Volume, then click Yes, Detach from the confirmation dialog.

  4. Delect the storage volume.

    Select Actions->Delete Volume and then click Yes, Delete from the confirmation dialog.

1.4.4. Add a Dataset to an EBS Volume

Once you have created the EBS volume, you can upload datasets to the volume.

1.4.4.1. Upload a Dataset to the EBS Volume

  1. Mount the EBS volume to /data.

    Issue the following to perform the one-time mount.

    sudo mkdir /data 
    sudo mount /dev/xvdf /data 
    sudo chmod 777 /data 
  2. Copy the dataset onto the EBS volume in /data.
    scp -i <.pem> -r local_dataset_dir/ ubuntu@<ec2-instance>:/data

1.4.4.2. Copy an Existing Dataset from EFS

  1. Mount the EFS storage to /data, using the EFS storage DNS name.

    Issue the following to perform the one-time mount.

    sudo mkdir /efs
    sudo mount -t nfs4 -o \
      nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
      EFS-DNS-NAME:/ /efs
    sudo chmod 777 /efs
    sudo cp -r /efs/<dataset> to /data
     
  2. Copy the dataset from the EFS to the EBS volume..
    sudo cp -r /efs/<dataset> to /data

1.4.5. Manage an EBS Volume Using AWS CLI

It is recommended that you use the AWS Console for EBS management. If you need to manage EBS file systems with the CLI, NVIDIA has created scripts available on GitHub at https://github.com/nvidia/ngc-examples.

These scripts will let you perform basic EBS management and can serve as the basis for further automation.

2. Release Notes for NVIDIA Virtual Machine Images on AWS

NVIDIA makes available on the Amazon Web Service (AWS) platform a customized Amazon Machine Instance (AMI) optimized for the latest generations of NVIDIA GPUs - NVIDIA Volta™ GPUs and NVIDIA Turing GPUs. Running NVIDIA® GPU Cloud containers on AWS instances with NVIDIA Volta or NVIDIA Turing GPUs provides optimum performance of NGC containers for deep learning, machine learning, and HPC workloads.

See the NGC AWS Setup Guide for instructions on setting up and using the AMI, including instructions on using the following features:

  • Automated login to the NGC container registry.

  • Elastic Block Storage (EBS) mounting.

2.1. Version 20.03.1

Image Name

  • NGC AMI: NVIDIA Deep Learning AMI 20.03.1
  • TensorFlow from NVIDIA AMI: NVIDIA Deep Learning tensorflow AMI 20.03.1
  • PyTorch from NVIDIA AMI: NVIDIA Deep Learning pytorch AMI 20.03.1

Contents of the NVIDIA Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  440.64.01
  • Docker Engine:   19.03.6
  • NVIDIA Container Toolkit v1.0.5-1

    Includes new command to run containers: docker run --gpus all <container>

  • TensorFlow container (TensorFlow from NVIDIA image):nvcr.io/nvidia/pytorch:20.02-tf1-py3, nvcr.io/nvidia/pytorch:20.02-tf2-py3
  • PyTorch container (PyTorch from NVIDIA image): nvcr.io/nvidia/tensorflow:20.02-py3

Key Changes

  • Updated Docker Engine to 19.03.6
  • Updated NVIDIA Driver to 440.64.01

Known Issues

Installing GPU drivers on the VM via a CUDA Install Succeeds Erroneously

Issue

Attempting to install CUDA on the VM will succeed, resulting in a potential conflict with the NVIDIA GPU driver included in the VM image.

Explanation

The configuration file to prevent driver installs is not working. This will be resolved in a later release of the VM image.

2.2. Version 19.11.3

Image Name

  • NGC AMI: NVIDIA Deep Learning AMI 19.11.3
  • TensorFlow from NVIDIA AMI: NVIDIA Deep Learning tensorflow AMI 19.11.3
  • PyTorch from NVIDIA AMI: NVIDIA Deep Learning pytorch AMI 19.11.3

Contents of the NVIDIA Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  440.33.01
  • Docker CE:   19.03.4-ce
  • NVIDIA Container Toolkit v1.0.5-1

    Includes new command to run containers: docker run --gpus all <container>

  • TensorFlow container (TensorFlow from NVIDIA image): nvcr.io/nvidia/tensorflow:19.10-py3
  • PyTorch container (PyTorch from NVIDIA image): nvcr.io/nvidia/pytorch:19.10-py3

Key Changes

  • Updated Docker-CE to 19.03.4
  • Updated NVIDIA Driver to 440.33.01

Known Issues

Installing GPU drivers on the VM via a CUDA Install Succeeds Erroneously

Issue

Attempting to install CUDA on the VM will succeed, resulting in a potential conflict with the NVIDIA GPU driver included in the VM image.

Explanation

The configuration file to prevent driver installs is not working. This will be resolved in a later release of the VM image.

2.3. Version 19.10.2

Image Name

  • NGC AMI: NVIDIA Deep Learning AMI 19.10.2
  • TensorFlow from NVIDIA AMI: NVIDIA Deep Learning tensorflow AMI 19.10.2
  • PyTorch from NVIDIA AMI: NVIDIA Deep Learning pytorch AMI 19.10.2

Contents of the NVIDIA Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  418.87.00
  • Docker CE:   19.03.2-ce
  • NVIDIA Container Toolkit v1.0.5-1
  • TensorFlow container (TensorFlow from NVIDIA image): nvcr.io/nvidia/tensorflow:19.09-py3
  • PyTorch container (PyTorch from NVIDIA image): nvcr.io/nvidia/pytorch:19.09-py3

Key Changes

  • Updated Docker-CE to 19.03.2
  • Replaced the NVIDIA Container Runtime for Docker with the NVIDIA Container Toolkit

    Includes new command to run containers: docker run --gpus all <container>

Known Issues

Installing GPU drivers on the VM via a CUDA Install Succeeds Erroneously

Issue

Attempting to install CUDA on the VM will succeed, resulting in a potential conflict with the NVIDIA GPU driver included in the VM image.

Explanation

The configuration file to prevent driver installs is not working. This will be resolved in a later release of the VM image.

2.4. Version 19.08.0

Image Name

  • NGC AMI: NVIDIA Deep Learning AMI 19.08.0
  • TensorFlow from NVIDIA AMI: NVIDIA Deep Learning tensorflow AMI 19.08.0
  • PyTorch from NVIDIA AMI: NVIDIA Deep Learning pytorch AMI 19.08.0

Contents of the NVIDIA Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  418.87
  • Docker CE:   18.09.8-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.1.0-1
  • TensorFlow container (TensorFlow from NVIDIA image): nvcr.io/nvidia/tensorflow:19.06-py3
  • PyTorch container (PyTorch from NVIDIA image): nvcr.io/nvidia/pytorch:19.06-py3

Key Changes

  • Updated NVIDIA Driver to version 418.87
  • Updated Docker-CE to 18.09.8
  • Updated NVIDIA Container Runtime for Docker to v2.1.0-1

Known Issues

Installing GPU drivers on the VM via a CUDA Install Succeeds Erroneously

Issue

Attempting to install CUDA on the VM will succeed, resulting in a potential conflict with the NVIDIA GPU driver included in the VM image.

Explanation

The configuration file to prevent driver installs is not working. This will be resolved in a later release of the VM image.

2.5. Version 19.07.0

Image Name

  • NGC Image: NVIDIA Deep Learning AMI 19.07.0
  • TensorFlow from NVIDIA Image: NVIDIA Deep Learning tensorflow AMI 19.07.0
  • PyTorch from NVIDIA Image: NVIDIA Deep Learning pytorch AMI 19.07.0

Contents of the NVIDIA Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  418.67
  • Docker CE:   18.09.7-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3
  • TensorFlow container (TensorFlow from NVIDIA image): nvcr.io/nvidia/tensorflow:19.06-py3
  • PyTorch container (PyTorch from NVIDIA image): nvcr.io/nvidia/pytorch:19.06-py3

Key Changes

Known Issues

Installing GPU drivers on the VM via a CUDA Install Succeeds Erroneously

Issue

Attempting to install CUDA on the VM will succeed, resulting in a potential conflict with the NVIDIA GPU driver included in the VM image.

Explanation

The configuration file to prevent driver installs is not working. This will be resolved in a later release of the VM image.

Version 19.05.1

Image Name

  • NGC Image: NVIDIA Driver 418.67 NVIDIA Deep Learning AMI
  • TensorFlow from NVIDIA Image: NVIDIA Driver 418.67 NVIDIA Deep Learning tensorflow AMI
  • PyTorch from NVIDIA Image: NVIDIA Driver 418.67 NVIDIA Deep Learning pytorch AMI

Contents of the NVIDIA Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  418.67
  • Docker CE:   18.09.4-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3
  • TensorFlow container (TensorFlow from NVIDIA image): nvcr.io/nvidia/tensorflow:19.04-py3
  • PyTorch container (PyTorch from NVIDIA image): nvcr.io/nvidia/pytorch:19.04-py3

Key Changes

19.05.1

19.05.0

  • Initial release of PyTorch from NVIDIA and TensorFlow from NVIDIA images
  • Updated the NVIDIA Driver to 418.67
  • Updated Docker to 18.09.4-ce

Known Issues

Installing GPU drivers on the VM via a CUDA Install Succeeds Erroneously

Issue

Attempting to install CUDA on the VM will succeed, resulting in a potential conflict with the NVIDIA GPU driver included in the VM image.

Explanation

The configuration file to prevent driver installs is not working. This will be resolved in a later release of the VM image.

2.7. Version 19.03.0

Image Name

NVIDIA Volta Deep Learning AMI 19.03.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  418.40.04
  • Docker CE:   18.09.2-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA Driver to 418.40.04
  • Updated Docker to 18.09.2-ce

Known Issues

There are no known issues in this release.

2.8. Version 19.02.0

Image Name

NVIDIA Volta Deep Learning AMI 19.02.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  410.104
  • Docker CE:   18.09.1-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA Driver to 410.104
  • Updated Docker to 18.09.1-ce

Known Issues

There are no known issues in this release.

2.9. Version 19.01.0

Image Name

NVIDIA Volta Deep Learning AMI 19.01.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 18.04 LTS
  • NVIDIA Driver:  410.79
  • Docker CE:   18.06.1
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the Ubuntu Server to 18.04 LTS.

Known Issues

There are no known issues in this release.

2.10. Version 18.11.1

Image Name

NVIDIA Volta Deep Learning AMI 18.11.1

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 16.04 LTS
  • NVIDIA Driver:  410.79
  • Docker CE:   18.06.1
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA driver to 410.79.

Known Issues

There are no known issues in this release.

2.11. Version 18.09.1

Image Name

NVIDIA Volta Deep Learning AMI 18.09.1

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 16.04 LTS
  • NVIDIA Driver:  410.48
  • Docker CE:   18.06.1
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA driver to 410.48.
  • Updated Docker CE to 18.06.1

Known Issues

There are no known issues in this release.

2.12. Version 18.08.0

Image Name

NVIDIA Volta Deep Learning AMI 18.08.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 16.04 LTS
  • NVIDIA Driver:  396.44
  • Docker CE:   18.06-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA driver to 396.44.
  • Updated Docker CE to 18.06

Known Issues

There are no known issues in this release.

2.14. Version 18.07.0

Image Name

NVIDIA Volta Deep Learning AMI 18.07.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 16.04 LTS
  • NVIDIA Driver:  396.37
  • Docker CE:   18.03.1-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA driver to 396.37.

Known Issues

There are no known issues in this release.

Version 18.06.0

Image Name

NVIDIA Volta Deep Learning AMI 18.06.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 16.04 LTS
  • NVIDIA Driver:  396.26
  • Docker CE:   18.03.1-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA driver to 396.26.

Known Issues

There are no known issues in this release.

2.15. Version 18.05.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu Server: 16.04 LTS
  • NVIDIA Driver:  384.125
  • Docker CE:   18.03.1-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Includes Ubuntu 16.04 security updates
  • Updated Docker CE to version 18.03.1-ce

Known Issues

There are no known issues in this release.

2.16. Version 18.04.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu: 16.04 LTS
  • NVIDIA Driver:  384.125
  • Docker CE:   18.03.0-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Changes

  • Updated the NVIDIA Driver to version 384.125
  • Updated Docker CE to version 18.03.0-ce

Known Issues

There are no known issues in this release.

2.17. Version 18.03.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu: 16.04 LTS
  • NVIDIA Driver:  384.111
  • Docker CE:   17.12.1-ce
  • NVIDIA Container Runtime for Docker: (nvidia-docker2) v2.0.3

Key Features and Enhancements

  • Installs available Ubuntu updates at boot.
  • Updated the NVIDIA Driver to version 384.111
  • Updated Docker CE to version 17.12.0-ce
  • Updated the NVIDIA Container Runtime for Docker (nvidia-docker2) to v2.0.3

Known Issues

There are no known issues in this release.

2.18. Version 18.01.0

Contents of the NVIDIA Volta Deep Learning AMI

NVIDIA is providing updates to help mitigate the Intel CPU security issues and maintain compatibility with recent Linux updates for these security issues.

  • Ubuntu: 16.04 LTS
  • NVIDIA Driver:  384.111
  • Docker CE:   17.12.0-ce
  • NVIDIA Container Runtime for Docker (nvidia-docker v2.0)

For details on the vulnerability, refer to Security Bulletin 4611 for more information. To see NVIDIA security bulletins, subscribe to security bulletin notifications, or learn more about NVIDIA's product security management process, go to NVIDIA Product Security.

Key Features and Enhancements

  • Installs available Ubuntu updates at boot.
  • Updated the NVIDIA Driver to version 384.111
  • Updated Docker CE to version 17.12.0-ce
  • Updated to the NVIDIA Container Runtime for Docker (nvidia-docker2 v2.0)

Known Issues

There are no known issues in this release.

2.19. Version 17.10.1

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu: 16.04.3
  • NVIDIA Driver:  384.81
  • Docker CE:   17.09.0-ce
  • Docker Engine Utility for NVIDIA GPUs: 1.0.1

Key Features and Enhancements

  • Installs available Ubuntu updates at boot.

  • Provided a new MNIST example script with correct container image tags.

    • Removed the mnist_tensorflow.sh and mnist_pytorch.sh scripts.
    • Added the mnist_example.sh script.

Known Issues

There are no known issues in this release.

Version 17.10.0

Contents of the NVIDIA Volta Deep Learning AMI

  • Ubuntu : 16.04.3
  • NVIDIA Driver :  384.81
  • Docker CE :   17.09.0-ce
  • Docker Engine Utility for NVIDIA GPUs : 1.0.1

Key Features and Enhancements

  • Installs available Ubuntu updates at boot.

Known Issues

  • Container Tags in Example Scripts are Incorrect

    • Description

      Two example scripts are provided as part of the AMI to show how to run NGC Deep Learning containers. These scripts are in the Ubuntu home directory, named mnist_pytorch.sh and mnist_tensorflow.sh. These scripts reference container tags 17.09 instead of 17.10.

    • Workaround

      Edit 17.09 to 17.10 in both scripts before running them.  This issue will be fixed at the next release of the NVIDIA Volta Deep Learning AMI.

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA and the NVIDIA logo are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.