This NGC Best Practices Guide provides recommendations to help administrators and users work with NGC. This NGC Best Practices Guide provides recommendations to help administrators and users work with NGC. NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing. NGC includes NGC containers, the NGC container registry, the NGC website, and platform software for running the deep learning containers.

1. NVIDIA NGC Cloud Services Best Practices For AWS

The NVIDIA® GPU Cloud™ (NGC) runs on associated cloud providers such as Amazon Web Services (AWS). This section provides some tips and best practices for using NVIDIA NGC Cloud Services.

The following tips and best practices are from NVIDIA and should not be taken as best practices from AWS. It’s best to consult with AWS before implementing any of these best practices. For specific AWS documentation, see the Amazon Web Services web page.

1.1. Users And Authentication

The first step in using NGC is to follow the instructions provided in the NGC Getting Started Guide. Your AWS credentials are tied to a specific region; therefore, if you are going to change regions, be sure you use the correct key for that region. A good practice is to name the key file with the region in the actual name.

Next, spend some time getting to know AWS IAM (Identity, Authentication, and Management). At a high level, IAM allows you to securely create, manage, and control user (individual) and group access to your AWS account. It is very flexible and provides a rich set of tools and policies for managing your account.

AWS provides some best practices around IAM that you should read immediately after creating your AWS account. There are some very important points in regard to IAM. The first thing you should be aware of is that when you create your account on AWS, you are essentially creating a root account. If someone gains access to your root credentials, they can do anything they want to your account including locking you out and running up a large bill. Therefore, you should immediately lock away your root account access keys.

After you've secured your root credentials, create an individual IAM user. This is very similar to creating a user on a *nix system. It allows you to create a unique set of security credentials which can be applied to a group of users or to individual users.

You should also assign a user to a group. The groups can have pre-assigned permissions to resources - much like giving permissions to *nix users. This allows you to control access to resources. AWS has some pre-defined groups that you can use. For more information about pre-defined groups on AWS, see Creating Your First IAM Admin User and Group. For IAM best practices, see the AWS Identity And Access Management User Guide.

1.1.1. User Credentials In Other Regions

The credentials that you created are only good for the region where you created them. If you created them in us-east-1 (Virginia), then you can’t use them for the region in Japan. If you want to only use the region where you created your credentials, then no action is needed. However, if you want the option to run in different regions, then you have two choices:
  • Option 1: create credentials in every region where you plan to run, or
  • Option 2: copy your credentials from your initial region to all other regions.

Option 1 isn’t difficult but it can be tedious depending upon how many regions you might use. To keep track of the different keys, you should include the region name in the key name.

Option 2 isn’t too difficult thanks to a quick and simple bash script:
if [ ! -f "${myKEYFILE}" ]; then
  echo "I can't find that file: ${myKEYFILE}"
  exit 2
myKEY=`cat ${myKEYFILE}`
for region in $( aws --output text ec2 describe-regions | cut -s -f3 | sort ); do
        echo "importing ${myKEYNAME} into region ${region}"
        aws --region ${region} ec2 import-key-pair --key-name ${myKEYNAME} --public-key-material "${myKEY}"

In this script, the keyname for your first region is bb-key and is assigned to myKEYNAME. The file that contains the key is located in ~/.ssh/ida_rsa.pub. After defining those two variables, you can run the script and it will import that key to all other AWS regions.

Before running GPU enabled instances, it is a good idea to check with AWS on what regions have GPU enabled instances (not all of them currently have them).

1.2. Data Transfer Best Practices

One of the fundamental questions users have around best practices for AWS is uploading and downloading data from AWS. This can be a very complicated question and it’s best to engage with AWS to discuss the various options. For more information about uploading, downloading, and managing objects, see the Amazon Simple Storage Service Console User Guide.

In the meantime, to help you get started, the following sections offer ideas for how to upload data to AWS.

1.2.1. Upload Directly To EC2 Instance

When you first begin to use AWS, you may have some data on their laptop, workstation or company system, and want to upload it to an EC2 instance that is running. This means that you want to directly upload data to the compute instance they started. A quick and easy way to do this is to use scp to copy the data from your local system to the running instance. You’ll need the IP address or name of the instance as well as your AWS key. An example command using scp is the following:
$ cd data
$ ssh -i my-key-pair.pem -r * ubuntu@public-dns-name:/home/ubuntu

In this example, the training data is located in a subdirectory called data on your system. You cd into that directory and then recursively uploaded all the data in that directory to the EC2 instance that has been started with the NVIDIA Volta Deep Learning AMI. You will need to use your AWS keys to upload to the instance. The -r option means recursive so everything in the data directory, including subdirectories, are copied to the AWS instance.

Finally, you need to specify the user on the instance (ubuntu), the machine name (NVAWS_DNS) and the fill path where the data is to be uploaded (/home/ubuntu which is the default home directory for the ubuntu user).

There are a few key points in using scp. The first is that you need to have the SSH port (port 22) open on your AWS instance and your local system. This is done via security groups.
Note: There are other ways to open and block ports in AWS, however, they are not covered in this guide.

The second thing to note is that scp is single-threaded. That is, a single thread on your system is doing the data transfer. This many not be enough to saturate your NIC (Network Interface Card) on your system. In that case, you might want to break up the data into chunks and upload them to the instance. You can upload them serially (one after the other), or you can load them in groups (in essence in parallel).

There are a couple of options you can use for uploading the data that might help. The first one is using tar to create a tar file of a directory and all subdirectories. You can then upload that tar file to the running AWS EC2 instance.

Another option is to compress the tar file using one of many compression utilities (for example, gzip, bzip2, xz, lzma, or 7zip). There are also parallel versions of compression tools such as pigz (parallel gzip), lzip (uses lzlib), pbzip2 (parallel bzip2), pxz (parallel xz), or lrzip (parallel lzma utility).
Note: You can use your favorite compression tool in combination with tar via the following option:
$ tar --use-compress-program=… cf file.tar  
The combination allows you to specify the path to the compression utility you want to use with the --use-compress-program option.
After taring and compressing the data, upload the file to the instance using scp. Then, ssh into the instance and uncompress and untar the file before running your framework.
Note: When compressing or creating a tar file, the process actually encrypts it. Encryption is not covered in this guide, however, scp will encrypt the file during the transfer unless you have specifically told it not to encrypt.

Another utility that might increase the upload speed is to use bbcp. It is a point-to-point network file copy application that can use multiple threads to increase the upload speed.

As explained, there are many options for uploading data directly to an AWS EC2 instance. There are also some things working against you to reduce the upload speed. One big impediment to improving upload speeds is your connection to the Internet and the network between you and the AWS instance.

If you have a 100Mbps connection to the Internet or are connecting from home using a cable or phone modem, then your upload speeds might be limited compared to a 1 Gbps connection (or faster). The best advice is to test data transfer speeds using a variety of file sizes and number of files. You don’t have to do an exhaustive search but running some tests should help you get a feel for data upload speeds.

Another aspect you have to consider is the packet size on your network. The network inside your company or inside your home may be using jumbo frames which set the frame size to 9,000 (MTU of 9,000). This is great for closed networks because the frame size can be controlled so that you get jumbo frames from one system to the next. However, as soon as those data packets hit the Internet, they drop to the normal frame size of 1,500. This means you have to send many more packets to upload the data. This causes more CPU usage on both sides of the data transfer.

Jumbo frames also reduce the percentage of the packet that is devoted to overhead (not data). Jumbo frames are therefore more efficient when sending data from system to system. But as soon as the data hits the Internet, the percentage devoted to overhead increases and you end up having to send more packets to transfer the data.

1.2.2. Upload Data To S3

Another option is to upload the data to an AWS S3 bucket. S3 is an object store that basically has unlimited capacity. It is a very resilient and durable storage system so it’s not necessary to store your data in multiple locations. However, S3 is not POSIX compliant so you can’t use applications that read and write directly to S3 without rewriting the IO portions of your code.

S3 is a solution for storing your input and output data for your applications because it’s so reliable and durable. To use the data, you copy it from S3 to the instances you are using and copy data from the instance to S3 for longer-term storage. This allows you to shut down your instances and only pay for the data stored in S3.

Fundamentally, S3 is an object store (not POSIX compliant), that can scale to extremely large sizes and is very durable and resilient. S3 does not understand the concept of directories or folders, meaning the storage is flat. However, you still use folders and directories to create a hierarchy. These directories just become part of the name of the object in S3. Applications that understand how to read the object names can present you a view of the objects that includes directories or folders.

There are a multiple ways to copy data into S3 before you start up your instances. AWS makes a set of CLI (Command Line Interface) tools available that can do data transfer for you. The basic command is simple. Here is an example:
$ aws s3 cp <local-file> s3://mybucket/<location>

This command copies a local file on your laptop or server to your S3 bucket. In the command, s3://mybucket/<location> is the location in your S3 bucket. This command doesn’t use any directories or folders, instead, it puts everything into the root of your S3 bucket.

A slightly more complex command might look like the following:
$ aws s3 cp /work/TEST s3://data-compression/example2 -recursive -include “*”

This copies an entire directory on your host system (such as your laptop), to a directory on S3 with the name data-compression/example. It copies the entire contents of the local directory because of the -recursive flag and the -include “*” option. The command will create subdirectories on S3 as needed. Remember, subdirectories don’t really exist, therefore they are part of the object name on S3.

S3 has the concept of a multi-part upload. This was designed for uploading large files to S3 so that if a network error is encountered you don’t have to start the upload all over again. It breaks the object into parts and uploads these parts, sometimes in parallel, to S3 and re-assembles them into the object once all of the uploads are done.

Each part is a contiguous portion of the object’s data. If you want to do multi-part upload manually, then you can control how the object parts are uploaded to S3. They can be in any order and can even be done in parallel. After all of the parts are uploaded, you then have to assemble them into the final object. The general rule of thumb is that when the object is greater than 100MB, using multi-part upload is a good idea. For objects larger than 5GB, multi-part upload is mandatory.

While multi-part upload was designed to upload large files, it also helps improve your throughput since you can upload the parts in parallel. One of the nice features of the AWS CLI tools is that all aws s3 cp commands use multi-part automatically. This includes aws s3 mv and aws s3 sync. You don’t have to do anything manually. Consequently, any uploads using this tool can be very fast.

Another option is to use open-source tools for uploading data to S3 in parallel. The concept is not to use multi-part upload but to upload objects in parallel to improve performance .One tool that is worth examining is s3-parallel-put.

You can also use tar and a compression tool to collect many objects and compress them before uploading to S3. This can result in fast performance because the number of files has been reduced and the amount of data to be transferred is reduced. However, S3 isn’t a POSIX compliant file system so you cannot uncompress nor untar the data within S3 itself. You would need to copy the data to a POSIX file system first and then perform the actions. Alternatively, you could use AWS Lambda to perform these operations, but that is outside the scope of this document.

For a video tutorial about S3, see AWS S3 Tutorial For Beginners - Amazon Simple Storage Service. S3 Data Upload Examples

To understand how the various upload options impact performance, let’s look at three examples. All three examples test uploading data from an EC2 instance to an S3 bucket. A d2.8xlarge instance is used because it has a large amount of memory (244GB). The instance has a 10GbE connection along with 36 cores (18 HT cores).

All data is created using /dev/urandom. Each example has a varying number of files and file sizes. Example 1: Testing s3-parallel-put For Uploading

This example is fairly simple. It follows an astronomy pattern for the sake of discussion. It has two file sizes, 500MB and 5GB. For every three 500MB files, there are two 5GB files. All of the files were created in a single directory with a total 50 files consuming 115GB. In total there are 20x 5GB files and 30x 500 MB files.

This test used the s3-parallel-put tool to upload all of the files. The wall time was recorded when the process started and when it ended giving an elapsed time for the upload. The number of simultaneous uploads was varied from 1 to 32 which indicated how many files were being uploaded at the same time. The data was then normalized by the run time for uploading one file at a time.

The results are presented in the chart below along with the theoretical speedup (perfect scaling).
Figure 1. Using the s3-parallel-put tool to upload Using the s3-parallel-put tool to upload

Notice that the scaling is fairly good to about 8 processes. After that, the results from using the tool are slower than the theoretical from 24 to 32 processes. There is basically no improvement in upload time. Example 2: Testing s3-parallel-put, AWS CLI, And Tar For Uploading

This example uploads a large number of smaller files and uploads them from the instance to an S3 bucket. For this test, the following file distribution was used:
  • 500,000 2KB files
  • 9,500,000 100KB files
All of the files were evenly split across 10 directories.
The tests uploaded the files individually but creating a compress tar file and uploading it to S3 was also tested. The specific tests were:
  1. Upload using s3-parallel-put
  2. Upload using AWS CLI tools
  3. Tar all of the files first, then use AWS CLI tools to upload the tar.gz file (no data compression)
The tests were run with the wall clock time recorded at the start and at the end. The results are shown below.
Note: The y-axis has been normalized to an arbitrary value but the larger the value, the longer it takes to upload the data.
Figure 2. Comparing the s3-parallel-put tool, AWS CLI, and tar to upload Comparing the s3-parallel-put tool, AWS CLI, and tar to upload
From the chart you can see that the AWS CLI tool is about 3x faster than s3-parallel-put. However, the fastest upload time is when all of the files were first tarred together and then uploaded. That is about 33% faster than not tarring the files.
Note: Only the actual upload of the tar file is about ¼ of the time to upload all of the files.
Remember that instead of having individual files in S3 (individual objects), you have one large object which is a tar file. Example 3: Testing The AWS CLI For Uploading

This examples goes back to the first example, increases the number of files in the same proportion, and adds a very small file (less than 1.9KB). There are 40x 5GB files, 60x 500MB files for a total of 100 files. Two files were added to the data set to force the uploads to contend with one very small file (1.9KB), and one large file (50GB). This is a grand total of 102 files.

The AWS CLI tools were tested. While using the CLI tool, a few combinations of using the tool along with tar and various compression tools were also used.
  1. Tar files into single .tar file, upload with CLI
  2. Tar files into single .tar file with compress, upload with CLI
  3. Tar files into single .tar file, compress it, upload with CLI
  4. Tar files into single .tar file with parallel compression (pigz), upload with CLI
  5. Tar files into single .tar file, parallel compress, upload with CLI
The time to complete the tar and to complete the data compression are including in the overall time.
Note: The y-axis has been normalized to an arbitrary value but the larger the value, the longer it takes to upload the data.
Figure 3. Using the AWS CLI tool to upload Using the AWS CLI tool to upload
From the testing results, the following observations were made:
  • The CLI tool alone is the fastest
  • Using serial compression tools such as bzip and tar, greatly increases the total upload time (fourth bar from the right).
  • Tarring all of the files together while using pigz (parallel gzip), results in the second fastest upload time (second bar from the right). Just remember that the files are now in one large, compressed file on S3.
  • Using separate tasks for tar and then compression slows down the overall upload time
  • pigz appears to be about 6 times faster than gzip on this EC2 instance

1.2.3. S3 Object Keys

Since S3 does not understand the concept of directories or folders, the full path becomes part of the object name. In essence, S3 acts like a key-value store so that each key points to its associated object (S3 is more sophisticated than a key-value store but at a fundamental level, it acts simply like a key-value store). An object key might be something like the following:
The directory has several directories (folders) before the actual file name which is jtables.js.

Keys are unique with a bucket. They do not contain meta-data beyond the key name. Meta-data is kept in a different object that is also associated with the object. While patterns in keys will not necessarily improve upload performance, they can improve subsequent read/write performance.

An S3 bucket begins with a single index to all objects. Depending upon what your data looks like, this can become a performance bottleneck. The reason is that all queries go through the partition associated with the index regardless of the key name. Ideally, it would be good to spread your objects across multiple partitions (multiple indices) to improve performance. This means that more storage servers are used which has more effective CPU resources, memory, and network performance.

The partitions are based on the object key (plus the bucket key and an version number that might be associated with the object). If the first few characters of the object key are all the same, then S3 will assign them to the same partition, resulting in only one server servicing any data requests.

S3 will try to spread the object keys across multiple partitions as best it can to satisfy a number of constraints but also trying to increase the number of partitions. However, as you are uploading data, S3 will not be able to create partitions on the fly. You are likely to be using one partition. You can contact AWS prior to your data upload and discuss “pre-warming” partitions for a bucket. Over time, as you use the data, more partitions are added as needed. The exact rules that determine how and when partitions are created (or not), is a proprietary implementation of AWS. But it is guaranteed that if the object keys are fairly similar, particularly in the first few characters, they will all be on a single partition. The best way around this is to add some randomness to the first few characters of an object key. S3 Object Key Example

To better understand how you might introduce some randomness into key names let’s look at a simple example. Below is a table of objects in a bucket. It includes the bucket key and the object key for each object. In storing data, a common pattern is to the date as the first part of the name. This carries over to the object keys as shown below.
Figure 4. Objects in a bucket Objects in a bucket

Notice that the object key is the same for each file for the first 5 characters since the “year” was used first. There is not much variation in the year especially if you are working with recent data within the last year or two.

One option to improve randomness for the first few characters is to reverse the date on the object keys.
Figure 5. Objects in a bucket - reverse date Objects in a bucket - reverse date

This results in more randomness in the object keys. There are now 30 options for the first two characters (01 to 31). This gives S3 more leeway in creating partitions for the data.

One problem you might have with introducing more randomness into the object key is you might have to change your applications to read the date backwards and then convert it. But, if you are doing a great deal of data processing, it might be worth the code change to get improved S3 performance.

Another way to introduce randomness in the object key is to use make the first four characters of each object, the first four characters from the md5sum of the object as below.
Figure 6. Objects in a bucket - reverse first four characters Objects in a bucket - reverse first four characters

This introduces a great deal of randomness in the first four characters while still allowing you keep the classic format for the date. But again, you may have to modify your application to drop the first 5 characters from the file name.

1.3. Storage

AWS has many storage options both network storage, object storage, network block storage, and even local storage in the instance which is referred to as ephemeral storage. To plan your use of storage within AWS, it’s best to discuss the options with your AWS contact. The following sections discuss your storage options for artificial intelligence, deep learning, and machine learning.

When you use the NVIDIA Volta Deep Learning AMI by default, you have a single AWS EBS volume (Elastic Block Storage) that is formatted with a file system and mounted as / on the instance. The EBS is a general purpose (gp2) type volume. While not entirely accurate, you can think of an EBS volume as an iSCSI volume that you might mount on a system from a SAN. However, EBS volumes have some features that you might not get from a SAN, such as very high durability and availability, encryption (at rest and in transit), snapshots, and elastically scalable.

Note about your options:
  • Using encryption at rest and in transit will impact throughput. It’s up to you to make the decision between performance and security.
  • You can resize EBS volumes on the fly. However, this doesn’t resize the file system that is using volumes. Therefore, you will need to know how to grow the file system using the EBS volumes:
    • The current size limit on EBS volumes is 16TB. For anything greater than that, you either need to use EFS or use a second volume with a RAID level.
  • Snapshots are a great way to save current data for the future.
  • There are performance limits of an EBS volume. This includes throughput and IOPS. Remember that in general, deep learning IO is usually fairly heavy IOPS driven (read IOPS).
  • Some instances are EBS optimized, meaning they have a much better connection to EBS volumes for improved performance.

Before training your model, you may have to upload your data from a local host to the running instance. Before uploading your data, ensure you have enough EBS capacity to store everything. You might estimate the size on your local host first before starting the instance. Then you can increase the EBS volume size so that it’s larger than the data set. Another option is to upload your data to Amazon’s S3 object storage. Your applications won’t be able to read or write directly to S3, but you can copy the data from S3 to your NVIDIA Volta Deep Learning instance, train the model, upload any results back to S3. This keeps all of the data within AWS and if the instance and your S3 bucket are in the same region which can help data transfer throughput. You can upload your data file by file to S3 or you can create a tar file or compress tar file, and upload everything to S3. Your S3 data can also be encrypted.

1.3.1. Network Storage

In the previous section, the simple option of using a single EBS volume with the NVIDIA Volta Deep Learning AMI was discussed. In this section, other options will be presented such as using EFS or using multiple EBS volumes in a RAID group. Elastic File System (EFS)

The AWS Elastic File System (EFS) can be thought of as “NFS as a service”. It allows you to create an NFS service that has very high durability and available that you can mount on instances in a specific region in multiple AZ’s (in other words, EFS is a regionally based service). The amount of storage space in EFS is elastic so that it can grow into the Petabyte scale region. It uses NFSv4.1 (NFSv3 is not supported) to improve security. As you add data to EFS, it’s performance increases. It also allows you to encrypt the data in the file system for more security.

Perhaps the best feature of EFS is that it is fully managed. You don’t have to create an instance to act as a NFS server and allocate storage to attach to that server. Instead, you create an EFS file system and a mount point for the clients. As you add data, EFS will automatically increase the storage as needed, in other words, you don’t have to add storage and extend the file system.

For a brand new EFS file system, the performance is likely to be fairly low. The AWS documentation indicates that for every TB of data, you get about 50 MB/s of guaranteed throughput. For NGC, EFS is a great AWS product for easily creating a very durable NFS storage system but the performance may be low until the file system contains a large amount of data (multiple TB’s).

One important thing to remember is that your throughput performance will be governed by the network of your instance type. If your instance type has a 10Gbs network, then that will govern your NFS performance. EBS Volumes In RAID-0

As mentioned previously, the NVIDIA Volta Deep Learning AMI comes with a single EBS volume. Currently, EBS volumes are limited to 16TB. To get more than 16TB, you will have to take two or more EBS volumes and combine them with Linux Software RAID (mdadm).

Linux Software RAID (mdadm) allows you to create all kinds of RAID levels. EBS volumes are already durable and available, which means that the RAID levels provide some resiliency in the effect of a block device failure such as RAID-5, are not necessary. Therefore, it’s recommended to use RAID-0.

You can combine a fairly large number of EBS into a RAID group. This allows you to create a 160TB RAID group for an instance. However, this should be done for capacity reasons only. Adding EBS volumes doesn’t improve the IO performance of single threaded applications. Single thread IO is very common in deep learning applications.





NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.


NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station, GRID, Jetson, Kepler, NVIDIA GPU Cloud, Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, Tesla and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.