Abstract

This Best Practices section provides recommendations to help administrators and users work with Docker, extend frameworks, and administer and manage DGX products. Learn more about what these best practices sections cover.

1. About Best Practices Documentation

The best practices sections provide recommendations to help administrators and users work with Docker®, extend frameworks, and administer and manage the DGX-1™ , DGX Station™ , and NVIDIA® GPU Cloud™ (NGC) products. Although this entire guide provides best practices, whenever possible, the reasons behind those recommendations, the most effective recommendations are labeled as such:
Important:

The best practices sections do not provide step-by-step instructions. For additional procedural instruction, see the Preparing To Use NVIDIA Containers Getting Started Guide and the NVIDIA Containers for Deep Learning Frameworks User Guide.

2. Introduction

The DGX-1, DGX Station, and the NVIDIA NGC Cloud Services are designed to run containers. Containers hold the application as well as any libraries or code that are needed to run the application. Containers are portable within an operating system family. For example, you can create a container using Red Hat Enterprise Linux and run it on an Ubuntu system, or vice versa. The only common thread required between the two operating systems is that they each need to have the container software so they can run containers.

Using containers allows you to create the software on whatever OS you are comfortable with and then run the application wherever you want. It also allows you to share the application with other users without having to rebuild the application on the OS they are using.

Containers are different than a virtual machine (VM) such as VMware. A VM has a complete operating system and possibly applications and data files. Containers do not contain a complete operating system. They only contain the software needed to run the application. The container relies on the host OS for things such as file system services, networking, and an OS kernel. The application in the container will always run the same anywhere, regardless of the OS/compute environment.

All three products, the DGX-1, the DGX Station, and the NVIDIA NGC Cloud Services uses Docker. Docker is one of the most popular container services available and is very commonly used by developers in the Artificial Intelligence (AI) space. There is a public Docker repository that holds pre-built Docker containers. These containers can be a simple base OS such as CentOS, or they may be a complete application such as TensorFlow™ . You can use these Docker containers for running the applications that they contain. You can use them as the basis for creating other containers, for example for extending a container.

To enable portability in Docker images that leverage GPUs, NVIDIA developed the NVIDIA Container Runtime for Docker, also known as nvidia-docker. We will refer to the NVIDIA Container Runtime for Docker simply as nvidia-docker for the remainder of these sections.

With the three products, the DGX-1, DGX Station, and the NVIDIA NGC Cloud Services, NVIDIA provides access to Docker containers that have been especially built, tuned, and optimized for NVIDIA GPUs. This is done through NVIDIA’s private Docker Repository, nvcr.io. Some of these containers are for deep learning frameworks and some contain the building blocks of GPU applications. They are there for your use, but are only licensed for use on these three systems, the DGX-1, DGX Station, and the NVIDIA NGC Cloud Services. You are not restricted to using only the nvidia-docker containers, you can use public Docker containers or other Docker containers on these systems as well.

Containers are not difficult to use. There are just a few basic commands. It’s also not difficult to build a container, particularly if you are starting with an existing container and building upon it. If you are new to containers, especially Docker containers, the next section provides some best practices around Docker and its commands.

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station, GRID, Jetson, Kepler, NVIDIA GPU Cloud, Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, Tesla and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.