NGC on AWS Virtual Machines

This NGC on AWS Virtual Machines documentation explains how to set up an NVIDIA AMI on Amazon EC2 services, and also provides release notes for each version of the NVIDIA image.

1. Using NGC on AWS Virtual Machines

NVIDIA makes available on the Amazon Web Services (AWS) platform three different VMIs, known within the AWS ecosystem as an Amazon Machine Image (AMI). These are GPU-optimized AMIs for AWS instances with NVIDIA V100 (EC2 P3 instances), NVIDIA T4 GPUs (EC2 G4 instances), NVIDIA A100 GPUs (EC2 P4d). Additionally, the NVIDIA AMI also supports ARM64 (EC2 G5g) instances.

For those familiar with the AWS platform, the process of launching the instance is as simple as logging in, selecting the NVIDIA GPU-optimized image of choice, configuring settings as needed, then launching the VM. After launching the VM, you can SSH into it and start building a host of AI applications in deep learning, machine learning and data science by leveraging the wide range of GPU-accelerated containers, pre-trained models and resources available from the NGC Catalog.      

This document provides step-by-step instructions for accomplishing this, including how to use the AWS CLI.

Prerequisites

These instructions assume the following:

  • You have an AWS account - https://aws.amazon.com

  • Browse the NGC website and identified an available NGC container and tag to run on the virtual machine instance (VMI).
  • Windows Users: The CLI code snippets are for bash on Linux or Mac OS X. If you are using Windows and want to use the snippets as-is, you can use the Windows Subsystem for Linux and use the bash shell (you will be in Ubuntu Linux).

1.1. Beforce You Get Started

Perform these preliminary setup tasks to simplify the process of launching the NVIDIA Deep Learning AMI.

1.1.1. Setting Up Your AWS Key Pair

If you do not already have Key Pairs defined, then you will need to setup your AWS Key Pair and have it on the machine on which you will use the AWS CLI, or from which you will SSH to the instance. In the examples, the key pair is named "my-key-pair".

Once you have your key pair downloaded, make sure they are only readable by you, and (Linux or OSX) move them to your ~/.ssh/ directory.

chmod 400 my-key-pair*
mv my-key-pair* ~/.ssh/

1.1.2. Set Up Security Groups for the EC2 Instance

Security groups define the network connection restrictions you place on your virtual machine instance. In order to connect to your running instances you will need a Security Group allowing (at minimum) SSH access.

  1. Log into the AWS Console (https://aws.amazon.com), then click EC2 under the Console section located within the All Services drop-down menu..
  2. Enter the Security Groups screen, located on the left under "Network & Security", "Security Groups".

  3. Click Create Security Group.
  4. Give the Security Group a name (for example, "my-sg"), description, and then click Add Rule
  5. Under the "Inbound" section, click Add a rule with the following parameters to enable SSH:
    • Type: SSH
    • Protocol: TCP
    • Port Range: 22
    • Source: My IP

    You may need to widen the resulting IP filter if you're not on a fixed IP address, or want to access the instance from multiple locations such as work and home.

    The following shows the filled-out Create Security Group form using the example naming.

  6. (Optional) Add additional rules.

    You may need to add additional rules for HTTP, HTTPS, or other Custom TCP ports depending on the deep learning frameworks you use.

    Continue adding additional rules by clicking Add Rule, then create rules as needed.

    Examples:

    • For DIGITS4

      • Type: Custom TCP Rule
      • Protocol: TCP
      • Port Range: 3448
      • Source: My IP
    • For HTTPS secure web frameworks

      • Type: HTTPS
      • Protocol: TCP
      • Port Range: 443
      • Source: My IP
  7. Click Create Security Group to complete creation of the Security Group on the bottom right corner.

    Once created, the Group ID is listed in the Security Group table.

1.2. Creating an NGC Certified Virtual Machine using AWS Console

1.2.1. Log In and Select the AWS Region

  1. Log into the AWS Console (https://aws.amazon.com), then under the Compute section, click EC2 .
  2. Select the AWS Zone from the upper right of the top menu.

    In order to use NVIDIA Volta and Turing GPUs in AWS, you must select a region that has Amazon EC2 P3 or G4 instances available. The examples in this guide use instances in US West (Oregon) - us-west-2. Check with AWS for Amazom EC2 P3 or G4 instance availability in other regions.

1.2.2. Create a VM and Choose an NVIDIA GPU-Optimized AMI

NVIDIA publishes and maintains multiple flavors of a GPU-optimized AMI with all the software needed to pull and run content from NGC. These AMIs should be used to launch your GPU instances.

  1. Click Launch Instance.

  2. Select the NVIDIA Deep Learning AMI.
    1. Select AWS Marketplace on the Step 1 of the process.
    2. Search for and select the NVIDIA GPU-optimized AMIs that best suits your purpose by simply typing in “nvidia” into the search bar.
    3. Click Continue on the details page.

1.2.3. Select an Instance Type with GPUs and Configure Instance Settings

  1. Select one of the Amazon EC2 P3 or G4 instance types according to your GPU, CPU, and memory requirements.
  2. Click Review and Launch to review the default configuration settings, or continue with the instructions in the next section to configure each setting step-by-step
  3. After choosing an instance type, click Next: Configure Instance Details.

    There are no instance details that need to be configured, so you can proceed to the next step.

  4. Add storage.

    Click Next: Add Storage.

    While the default 32 GiB for the root volume can be changed, users should not use the root volume for storing datasets since the root volume is destroyed when the instance is terminated.

  5. Add tags.

    Naming your instances helps to keep multiple instances organized.

    1. Click Next: Add Tag.
    2. Click Add Tag and then fill in the following information:

      Key: "Name"

      Value: <instance name, such as "My GPU">

  6. Configure a Security Group
    1. Click Next: Configure Security Group.
    2. Click Select an existing security group and select the Security Group you created during Before You Get Started.

1.2.4. Launching Your VM Instance

  1. Click Review and Launch.

    A window pops up and asks which key pair to use.

  2. Select Choose an existing key pair, select your key pair, then check the acknowledgement checkbox.
  3. Click Launch Instances.

1.2.5. Connect to Your VM Instance

  1. After launching your instance, click View Instances, locate your instance from the list, then wait until it is in the ‘running’ state.
  2. When it is in the running state, select it from the list and then click Connect.
  3. Follow the instructions in the pop-up window to establish an SSH connection to the instance.

    Be sure to use 'ubuntu' for the username.

    If the instructions for SSH login do not work, see the AWS Connect to Your Linux Instance documentation for additional information.

1.2.6. Start/Stop/Terminate Your VM Instance

Once you are done with your instance you can stop (to be started again later) or terminate (delete) it. Refer to the Instance Lifecycle in the AWS documentation for more information.

Instances can be controlled from the Instances page, using the "Actions”->"Instance State" menu to stop, start, or terminate Instances.

1.3. Creating an NGC Certified Virtual Machine Through the AWS CLI

If you plan to use AWS CLI, then the CLI must be installed (Windows Users: inside the Windows Subsystem for Linux), updated to the latest version, and configured.

Some of the AWS CLI snippets in these instructions make use of jq, which should be installed on the machine from which you'll run the AWS CLI. You may paste these snippets into your own bash scripts or type them at the command line.

1.3.1. Set Up Environment Variables

Set up the following environment variables which can be used in the commands for launching the VM instance:

Security Group

The Security Group ID is used as part of the instance creation process. Once created the Group ID can be looked up in the AWS Console, or retrieved by name with the following snippet, and stored in the $NVAWS_SG_ID environment variable.

NVAWS_SG_NAME='my-sg'
NVAWS_SG_ID=$(aws ec2 describe-security-groups --group-name "$NVAWS_SG_NAME" | jq .SecurityGroups[0].GroupId | sed 's/\"//g') && echo NVAWS_SG_ID=$NVAWS_SG_ID

Image ID

The following snippet will list the current "NVIDIA Deep Learning AMI" Image ID, and stored in the $NVAWS_IMAGE_ID environment variable.

NVAWS_IMAGE_NAME='NVIDIA Deep Learning AMI'
NVAWS_IMAGE_ID=$(aws ec2 describe-images --filters "Name=name,Values=$NVAWS_IMAGE_NAME" | jq .Images[0].ImageId | sed 's/\"//g') && echo NVAWS_IMAGE_ID=$NVAWS_IMAGE_ID

Other Environment Variables

Set up other env variables as follows, using your information:

NVAWS_KEYNAME=my-key-pair
NVAWS_KEYPATH=~/.ssh/
NVAWS_REGION=us-west-2
NVAWS_INSTANCE_TYPE=p3.2xlarge
NVAWS_EBS_GB=32
NVAWS_NAME_TAG='My GPU'

Be sure to set a unique NVAWS_NAME_TAG for each instance you launch.

1.3.2. Launch Your VM Instance

Launch the instance and capture the resulting JSON:
NVAWS_LAUNCH_JSON=$(aws ec2 run-instances --image-id $NVAWS_IMAGE_ID \
  --instance-type $NVAWS_INSTANCE_TYPE \
  --region $NVAWS_REGION \
  --key-name $NVAWS_KEYNAME \
  --security-group-ids $NVAWS_SG_ID \
  --block-device-mapping
 "[{\"DeviceName\":\"/dev/sda1\",\"Ebs\":{\"VolumeSize\":$NVAWS_EBS_GB}}]" \
  --tag-specifications
 "ResourceType=instance,Tags=[{Key=Name,Value=$NVAWS_NAME_TAG}]")
NVAWS_INSTANCE_ID=$(echo $NVAWS_LAUNCH_JSON | jq .Instances[0].InstanceId | sed 's/\"//g') && echo NVAWS_INSTANCE_ID=$NVAWS_INSTANCE_ID

The resulting Instance ID is stored in the NVAWS_INSTANCE_ID environment variable.

The launch process can take several minutes once a machine is available, and can be watched in the AWS Console Instances page or with the CLI using:

aws ec2 describe-instance-status --instance-id $NVAWS_INSTANCE_ID | jq '.InstanceStatuses[0].InstanceState.Name + " " + .InstanceStatuses[0].SystemStatus.Status'

Once the instance is "running initializing", you will be able to get the Public DNS name with:

NVAWS_DNS=$(aws ec2 describe-instances --instance-id $NVAWS_INSTANCE_ID | jq '.Reservations[0].Instances[0].PublicDnsName' | sed 's/\"//g') && \  echo NVAWS_DNS=$NVAWS_DNS

1.3.3. Connect To Your VM Instance

SSH should work shortly after the instance reaches "running ok".

If started with CLI snippets and environment variables above, the command to SSH to your instance is:

ssh -i $NVAWS_KEYPATH/$NVAWS_KEYNAME.pem ubuntu@$NVAWS_DNS

Otherwise use your .pem key filename and the Public DNS name from the AWS Console to connect:

ssh -i my-key-pair.pem ubuntu@public-dns-name

If these instructions for SSH login do not work, see the AWS Connect to Your Linux Instance documentation for additional information.

1.3.4. Start/Stop/Terminate Your VM Instance

Once you are done with your instance you can stop (to be started again later) or terminate (delete) it. Refer to the Instance Lifecycle in the AWS documentation for more information.

Stop:

aws ec2 stop-instances --instance-ids $NVAWS_INSTANCE_ID

Start:

aws ec2 start-instances --instance-ids $NVAWS_INSTANCE_ID

Terminate:

aws ec2 terminate-instances --instance-ids $NVAWS_INSTANCE_ID

1.4. Persistent Data Storage for AWS Virtual Machines

You can create elastic block storage (EBS) from the AWS Console. EBS is used for persistent data storage, however, EBS cannot be shared across multiple VMs. To share persistent data storage, you need to use EFS.

The instructions set up a general purpose SSD volume type. However, you can specify a provisioned IOPS SSD for higher throughput, or set up software RAID, using mdadm, to create a volume with multiple EBS volumes.

See the Amazon documentation RAID Configuration on Linux for instructions on how to set up software RAID on local disks.

EBS is available in most regions with Amazon EC2 P3 or G4 instances.

1.4.1. Create an EBS

  1. Open the EBS Volumes Console.

    Go to the main AWS console, click EC2, then expand Elastic Block Store from the side menu, if necessary, and click Volumes.

  2. Click Create Volume.
  3. Make selections at the Create Volume page.
    • Select General Purpose SSD (GP2) for the Volume Type.

      If higher throughput is needed, select Provisioned IOPS SSD (IO1).

    • Specify the volume size and Availability Zone.
    • (Optional) Add Tags.
    • Encryption is not needed if you are working with public datasets.
    • Snapshot ID is not needed.
  4. Review the options and then click Create Volume.

1.4.2. Attach an EBS Volume to an EC2 Instance

  1. Once you have created the EBS volume, select the volume and then select Actions->Attach Volume.
  2. Specify your EC2-instance ID as well as a drive letter for the device name (for example, sdf), then click Attach.

    This creates a /dev/xvdf(or the driver letter that you picked) virtual disk on your EC2 instance.

    You can view the volume by running the lsblk command.
    ~$ lsblk 
    
    NAME     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT 
    xvda     202:0    0  128G 0 disk 
    └─xvda1 202:1   0  128G 0 part /
    xvdf     202:16   0  250G 0 disk
  3. Create a filesystem on the EBS volume.
    ~# mkfs.ext4 /dev/xvdf 
    
    mke2fs 1.42.13 (17-May-2015) 
    Creating filesystem with 65536000 4k blocks and 16384000 inodes 
    Filesystem UUID: b0e3dee3-bf86-4e69-9488-cf4d4b57b367 
    Superblock backups stored on blocks:
     32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
     4096000, 7962624, 11239424, 20480000, 23887872 
    Allocating group tables: done 
    Writing inode tables: done 
    Creating journal (32768 blocks): done 
    Writing superblocks and filesystem accounting information: done 
  4. Mount the volume to a mount directory.
    ~# mount /dev/xvdf /data

    To mount the volume automatically every time the instance is stopped and restarted, add an entry to /etc/fstab. Refer to Amazon Documentation Making a Volume Available for Use

1.4.3. Delete an EBS Volume

Be aware that once you delete an EBS, you cannot undelete it.

  1. Open the EBS Volumes Console.

    Go to the main AWS console, click EC2, then expand Elastic Block Store from the side menu, if necessary, and click Volumes.

  2. Select your EBS.
  3. Detach the volume from the EC2 instance.

    Select Actions->Detach Volume, then click Yes, Detach from the confirmation dialog.

  4. Delect the storage volume.

    Select Actions->Delete Volume and then click Yes, Delete from the confirmation dialog.

1.4.4. Add a Dataset to an EBS Volume

Once you have created the EBS volume, you can upload datasets to the volume.

1.4.4.1. Upload a Dataset to the EBS Volume

  1. Mount the EBS volume to /data.

    Issue the following to perform the one-time mount.

    sudo mkdir /data 
    sudo mount /dev/xvdf /data 
    sudo chmod 777 /data 
  2. Copy the dataset onto the EBS volume in /data.
    scp -i <.pem> -r local_dataset_dir/ ubuntu@<ec2-instance>:/data

1.4.4.2. Copy an Existing Dataset from EFS

  1. Mount the EFS storage to /data, using the EFS storage DNS name.

    Issue the following to perform the one-time mount.

    sudo mkdir /efs
    sudo mount -t nfs4 -o \
      nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
      EFS-DNS-NAME:/ /efs
    sudo chmod 777 /efs
    sudo cp -r /efs/<dataset> to /data
     
  2. Copy the dataset from the EFS to the EBS volume..
    sudo cp -r /efs/<dataset> to /data

1.4.5. Manage an EBS Volume Using AWS CLI

It is recommended that you use the AWS Console for EBS management. If you need to manage EBS file systems with the CLI, NVIDIA has created scripts available on GitHub at https://github.com/nvidia/ngc-examples.

These scripts will let you perform basic EBS management and can serve as the basis for further automation.

2. NVIDIA Virtual Machine Images on AWS

NVIDIA makes available on the Amazon Web Service (AWS) platform a customized Amazon Machine Instance (AMI) optimized for the latest generations of NVIDIA GPUs - NVIDIA Volta™ GPUs and NVIDIA Turing GPUs. Running NVIDIA® GPU Cloud containers on AWS instances with NVIDIA Volta or NVIDIA Turing GPUs provides optimum performance of NGC containers for deep learning, machine learning, and HPC workloads.

See the NGC AWS Setup Guide for instructions on setting up and using the AMI, including instructions on using the following features:

  • Automated login to the NGC container registry.

  • Elastic Block Storage (EBS) mounting.

NVIDIA GPU-Optimized AMI

Information

The NVIDIA GPU-Optimized AMI is a virtual machine image for accelerating your Machine Learning, Deep Learning, Data Science and HPC workloads. Using this AMI, you can spin up a GPU-accelerated EC2 VM instance in minutes with a pre-installed Ubuntu OS, GPU driver, Docker and NVIDIA container toolkit.

Moreover, this AMI provides easy access to NVIDIA's NGC Catalog, a hub for GPU-optimized software, for pulling & running performance-tuned, tested, and NVIDIA certified docker containers. NGC provides free access to containerized AI, Data Science, and HPC applications, pre-trained models, AI SDKs and other resources to enable data scientists, developers, and researchers to focus on building solutions, gathering insights, and delivering business value.

This GPU-optimized AMI is provided free of charge for developers with an enterprise support option. For more information on enterprise support, please visit NVIDIA AI Enterprise.

Release Notes

Version 22.06.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 515.48.07
  • Docker-ce 20.10.17
  • NVIDIA Container Toolkit 1.10.0-1
  • NVIDIA Container Runtime 3.10.0-1
  • AWS Command Line Interface (CLI)
  • Miniconda 4.13.0
  • JupyterLab 3.4.3 and other Jupyter core packages
  • NGC-CLI 3.0.0
  • Git, Python3-PIP
Key Changes
  • Updated NVIDIA Driver to 515.48.07
  • Updated Docker-ce to 20.10.17
  • Updated Nvidia Container Toolkit to Version 1.10.0-1
  • Updated Nvidia Container Runtime to Version 3.10.0-1
  • Packaged additional tools: Miniconda, JupyterLab, NGC-CLI, Git, Python3-PIP

Version 22.03.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 470.103.01
  • Docker-ce 20.10.12
  • NVIDIA Container Toolkit 1.8.1
  • NVIDIA Container Runtime 3.8.1
  • AWS Command Line Interface (CLI)

NVIDIA Deep Learning AMI (to be Deprecated June 30, 2022)

Information

The NVIDIA Deep Learning AMI is an optimized environment for running the Deep Learning, Data Science, and HPC containers available from NVIDIA's NGC Catalog. The Docker containers available on the NGC Catalog are tuned, tested, and certified by NVIDIA to take full advantage of NVIDIA Ampere, Volta and Turing Tensor Cores, the driving force behind artificial intelligence. Deep Learning, Data Science, and HPC containers from the NGC Catalog require this AMI for the best GPU acceleration on AWS P4D, P3 and G4 instances.

Note: This AMI will be deprecated by June 30th 2022. Please use the “NVIDIA GPU-Optimized AMI” instead

Release Notes

Version 21.11.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 470.82.01
  • Docker-ce 20.10.10
  • NVIDIA Container Toolkit 1.5.1-1
  • NVIDIA Container Runtime 3.5.0-1 AWS CLI

PyTorch from NVIDIA AMI

Information

NVIDIA's GPU-optimized distribution of PyTorch.

PyTorch is a GPU accelerated tensor computational framework with a Python front end. Functionality can be easily extended with common Python libraries such as NumPy, SciPy and Cython.

This image bundles NVIDIA's container for PyTorch into the NGC base image for AWS. The NGC base image is an optimized environment for running the GPU optimized containers for Deep Learning and HPC available from the NGC (NGC) container registry. The Docker containers available on the NGC container registry are tuned, tested, and certified by NVIDIA to take full advantage of NVIDIA GPU's.

Release Notes

Version 22.03.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 470.103.01
  • Docker-ce 20.10.12
  • NVIDIA Container Toolkit 1.8.1
  • NVIDIA Container Runtime 3.8.1
  • NVIDIA's GPU-optimized PyTorch container 22.02-py3
Key Changes
  • Updated NVIDIA Driver to 470.103.01
  • Updated Docker Engine to 20.10.12
  • Updated NVIDIA Container Toolkit to 1.8.1
  • Updated NVIDIA Container Runtime to 3.8.1
  • Updated NVIDIA PyTorch container to 22.02-py3

Version 21.11.0

  • Ubuntu Server 20.04
  • NVIDIA Driver Version: 470.82.01
  • Docker Version: 20.10.10
  • NVIDIA Container Toolkit Version: 1.5.1-1
  • NVIDIA Container Runtime Version: 3.5.0-1
  • NVIDIA's GPU-optimized PyTorch container 21.10-py3

TensorFlow from NVIDIA AMI

Information

TensorFlow is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers for the purposes of conducting machine learning and deep neural networks research.

This image bundles NVIDIA's GPU-optimized TensorFlow container along with the base NGC AMI. The NGC AMI is an optimized environment for running the containers available on the NGC container registry. The Docker containers available on the NGC container registry are tuned, tested, and certified by NVIDIA to take full advantage of NVIDIA GPU's, the driving force behind innovations in artificial intelligence.

Release Notes

Version 22.03.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 470.103.01
  • Docker-ce 20.10.12
  • NVIDIA Container Toolkit 1.8.1
  • NVIDIA Container Runtime 3.8.1
  • NVIDIA's distribution of TensorFlow 1 and 2, tags 22.02-tf2-py3 and 22.02-tf1-py3
Key Changes
  • Updated NVIDIA Driver to 470.103.01
  • Updated Docker Engine to 20.10.12
  • Updated NVIDIA Container Toolkit to 1.8.1
  • Updated NVIDIA Container Runtime to 3.8.1
  • Updated NVIDIA Tensorflow container to 22.02-py3

Version 21.11.0

  • Ubuntu Server 20.04
  • NVIDIA Driver Version: 470.82.01
  • Docker-CE Version: 20.10.10
  • NVIDIA Container Toolkit Version: 1.5.1-1
  • NVIDIA Container Runtime: 3.5.0-1
  • NVIDIA's Distribution of TensorFlow 1 and 2: Tags 21.10-tf1-py3 and 21.10-tf2-py3

NVIDIA HPC SDK GPU-Optimized AMI

Information

The NVIDIA HPC SDK is a comprehensive suite of compilers, libraries and tools essential to maximizing developer productivity and the performance and portability of HPC applications.

The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC directives, and CUDA. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming. Performance profiling and debugging tools simplify porting and optimization of HPC applications, and containerization tools enable easy deployment on-premises or in the cloud.

Key features of the NVIDIA HPC SDK for Linux include:
  • Support for NVIDIA Ampere Architecture GPUs with FP16, TF32 and FP64 tensor cores
  • NVC++ ISO C++17 compiler with Parallel Algorithms acceleration on GPUs, OpenACC and OpenMP
  • NVFORTRAN ISO Fortran 2003 compiler with array intrinsics acceleration on GPUs, CUDA Fortran, OpenACC and OpenMP
  • NVC ISO C11 compiler with OpenACC and OpenMP
  • NVCC NVIDIA CUDA C++ compiler
  • NVIDIA Math Libraries including cuBLAS, cuSOLVER, cuSPARSE, cuFFT, cuTENSOR and cuRAND
  • Thrust, CUB, and libcu++ GPU-accelerated libraries of C++ parallel algorithms and data structures
  • NCCL, NVSHMEM and Open MPI libraries for fast multi-GPU/multi-node communications
  • NVIDIA Nsight Systems/Compute for interactive HPC applications performance profiler

Release Notes

Version 22.03.0

  • Ubuntu Server 20.04
  • NVIDIA Driver Version: 470.103.01
  • Docker Version: 20.10.12
  • NVIDIA Container Toolkit Version: 1.8.1-1
  • NVIDIA Container Runtime Version: 3.8.1-1
  • MOFED Version: 5.5-1.0.3.2
  • NVIDIA Peer Memory Version: 1.3
  • NVIDIA HPC SDK Version: 22.3
Key Changes
  • Updated Docker-ce to 20.10.12
  • Updated NVIDIA Container Toolkit to Version 1.8.1-1
  • Updated NVIDIA Container Runtime to Version 3.8.1-1
  • Updated NVIDIA MOFED to Version 5.5-1.0.3.2
  • Updated NVIDIA Peer Memory to Version 1.3
  • Updated NVIDIA HPC SDK Version: 22.3

Version 22.01.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 470.103.01
  • Docker-ce 20.10.11
  • NVIDIA Container Toolkit 1.7.0-1
  • NVIDIA Container Runtime 3.7.0-1
  • MOFED Version: 5.4-1.0.3.0
  • NVIDIA Peer Memory Version: 1.2
  • NVIDIA HPC SDK Version: 22.1

NVIDIA Deep Learning AMI (ARM64)

Information

The NVIDIA Deep Learning AMI (ARM64) is an optimized environment for running the Deep Learning, Data Science, and HPC containers available from NVIDIA's NGC Catalog. The Docker containers available on the NGC Catalog are tuned, tested, and certified by NVIDIA to take full advantage of NVIDIA GPUs with ARM CPU instances. Deep Learning, Data Science, and HPC containers from the NGC Catalog require this AMI for the best GPU acceleration on AWS ARM64 GPU instances

Release Notes

Version 21.11.0

  • Ubuntu Server 20.04
  • NVIDIA Driver 460.73.01
  • Docker-ce 20.10.6
  • NVIDIA Container Toolkit 1.5.0-1
  • NVIDIA Container Runtime 3.5.0-1

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA and the NVIDIA logo are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.