This NVIDIA Docker Containers for Deep Learning Frameworks User Guide provides a detailed overview into using, customizing, and extending containers and frameworks.

1. NVIDIA Docker Containers

Over the last few years there has been a dramatic rise in the use of software containers for simplifying deployment of data center applications at scale. Containers encapsulate an application along with its libraries and other dependencies to provide reproducible and reliable execution of applications and services without the overhead of a full virtual machine.

NVIDIA® Docker enables GPU-based applications that are portable across multiple machines, in a similar way to how Docker enables CPU-based applications to be deployed across multiple machines. It accomplishes this through the use of Docker containers.

Docker container
A Docker container is an instance of a Docker image. A Docker container deploys a single application or service per container.
Docker image
A Docker image is simply the software (including the filesystem and parameters) that you run within a Docker container.

1.1. What is a Docker Container?

A Docker container is a mechanism for bundling a Linux application with all of its libraries, data files, and environment variables so that the execution environment is always the same, on whatever Linux system it runs and between instances on the same host.

Unlike a VM which has its own isolated kernel, containers use the host system kernel. Therefore, all kernel calls from the container are handled by the host system kernel. DGX-1 uses Docker containers as the mechanism for deploying deep learning frameworks.

A Docker container is the running instance of a Docker image.

1.2. Why Use a Container?

One of the many benefits to using containers is that you can install your application, dependencies and environment variables one time into the container image; rather than on each system you run on. In addition, the key benefits to using containers also include:

  • Install your application, dependencies and environment variables one time into the container image; rather than on each system you run on.
  • There is no risk of conflict with libraries that are installed by others.
  • Containers allow use of multiple different deep learning frameworks, which may have conflicting software dependencies, on the same server.
  • After you build your application into a container, you can run it on lots of other places, especially servers, without having to install any software.
  • Legacy accelerated compute applications can be containerized and deployed on newer systems, on premise, or in the cloud.
  • Specific GPU resources can be allocated to a container for isolation and better performance.
  • You can easily share, collaborate, and test applications across different environments.
  • Multiple instances of a given deep learning framework can be run concurrently with each having one or more specific GPUs assigned.
  • Containers can be used to resolve network-port conflicts between applications by mapping container-ports to specific externally-visible ports when launching the container.

2. Installing Docker and NVIDIA Docker

To enable portability in Docker images that leverage GPUs, NVIDIA developed nvidia-docker, an open-source project that provides a command line tool to mount the user mode components of the NVIDIA driver and the GPUs into the Docker container at launch.

By default, Docker containers run with root privilege, so consult your IT department for assistance on how to properly setup Docker to conform to your organizations security policies.

These instructions describe command line entries made from the DGX-1 Linux shell.


The instructions below are provided as a convenient method for accessing Docker containers; however, the resulting docker group is equivalent to the root user, which may violate your organizations security policies. See the Docker Daemon Attack Surface for information on how this can impact security in your system. Always consult your IT department to make sure the installation is in accordance with the security policies of your data center.
Ensure your environment meets the prerequisites before installing Docker. For more information, see Getting Started with Docker.
  1. Install Docker.
    $ sudo apt-key adv --keyserver
    hkp://p80.pool.sks-keyservers.net:80 --recv-keys
    $ echo deb https://apt.dockerproject.org/repo ubuntu-trusty main
    | sudo tee /etc/apt/sources.list.d/docker.list
    $ sudo apt-get update
    $ sudo apt-get -y install docker-engine=1.12.6-0~ubuntu-trusty
  2. Edit the /etc/default/docker file.
    To prevent IP address conflicts between Docker and the DGX-1.
    To ensure that the DGX-1 can access the network interfaces for nvidia-docker containers, the nvidia-docker containers should be configured to use a subnet distinct from other network resources used by the DGX-1. By default, Docker uses the subnet. If addresses within this range are already used on the DGX-1 network, the nvidia-docker network can be changed by either modifying the /etc/docker/daemon.json file or modify the /etc/systemd/system/docker.service.d/docker-override.conf file specifying the DNS, Bridge IP address, and container address range to be used by nvidia-docker containers.
    For example, if your DNS server exists at IP address, and the subnet is not otherwise needed by the DGX-1, you can add the following line:
    DOCKER_OPTS=”--dns --bip= --fixedcidr=”
    To use the Overlay2 storage driver.
    The Overlay2 storage driver is preferable to the default AUFS storage driver. Add the following option to the DOCKER_OPTS line as previously mentioned.
    If you are using the base OS, the dgx-docker-options package already sets the storage driver to Overlay2 by default. However, the following lists shows what the NVIDIA recommended docker options are:
    • use the Overlay2 storage driver
    • disable the use of legacy registries
    • increase the stack size to 64 MB
    • unlimited locked memory size
    To use proxies to access external websites or repositories (if applicable).
    If your network requires use of a proxy, then edit the file /etc/apt/apt.conf.d/proxy.conf and make sure the following lines are present:
    Acquire::ftp::proxy "ftp://<username>:<password>@<host>:<port>/";

    If you will be using the DGX-1 in base OS mode, then after installing Docker on the system, refer to the information at Control and configure Docker with systemd. This is to ensure that Docker is able to access the DGX Container Registry through the proxy.

    Save and close the /etc/default/docker file when done.

  3. Restart Docker with the new configuration.
    $ sudo service docker restart
  4. Install NVIDIA Docker.
    1. Install nvidia-docker and nvidia-docker-plugin. The following example installs both nvidia-docker and the nvidia-docker-plugin.
      $ wget -P /tmp  
      $ sudo dpkg -i /tmp/nvidia-docker*.deb && rm
    2. Choose which users will have access to the docker group. This is required for users who want to be able to launch containers with docker and nvidia-docker. To add a user to the docker group, first see which groups the user already belongs too.
      $ groups <username>
      1. If the user is not part of the docker group, then they can be easily added using the following command.
        Note: This command requires sudo access, therefore, this step should be performed by a system administrator.
        $ sudo usermod -a -G docker <username>
    3. If there are no user accounts on the machine, then add a user by performing the following steps:
      1. Create a user account to associate with this docker group usage. This is needed so that accounts already on the system can use Docker. In the following steps, replace <user1> with the actual user name.
      2. Add the user.
        $ sudo useradd <user1>
      3. Setup the password.
        $ sudo passwd <user1>
        Enter a password at the prompts:
        Enter new UNIX password:
        Retype new UNIX password:
        passwd: password updated successfully
      4. Add the user to the docker group.
        $ sudo usermod -a -G docker <user1>
      5. Switch to the new user.
        $ su <user1>

2.1. Getting Your NVIDIA DGX Cloud Services API Key

Your NVIDIA DGX Cloud Services API key authenticates your access to DGX Container Registry from the command line.


You need to generate your NVIDIA DGX Cloud Services API key only once. Anyone with your API key can access all the services and resources to which you are entitled through your NVIDIA DGX Cloud Services account. Therefore, keep your API key secret and do not share it or store it where others can see or copy it.

  1. Use a web browser to log in to your NVIDIA DGX Cloud Services account on the DGX Cloud Services website.
  2. In the top right corner, click your user account icon and select API KEY.
  3. In the API Key page that opens, click GENERATE API KEY.
    Note: If you misplace your API key, you can get a new API key from the DGX Cloud Services website whenever you need it. When you get your API key, a new key is generated, which invalidates any keys you may have obtained previously.
  4. In response to the warning that your old API key will become invalid, click CONTINUE. Your NVIDIA DGX Cloud Services API key is displayed with examples of how to use it.
    Tip: You can copy your API key to the clipboard by clicking the Copy icon to the right of the API key.

2.2. Accessing DGX™ Container Registry

You can access the DGX™ Container Registry by running a Docker command from your client computer. You are not limited to using your NVIDIA DGX platform to access the DGX™ Container Registry. You can use any Linux computer with Internet access on which Docker is installed.
Before accessing DGX™ Container Registry, ensure that the following prerequisites are met:
  • Your NVIDIA® DGX™ Cloud Services account is activated.
  • You have an NVIDIA® DGX™ Cloud Services API key for authenticating your access to DGX™ Container Registry.
  • You are logged in to your client computer as an administrator user.

An alternate approach for enabling other users to run containers without giving them sudo privilege, and without having to type sudo before each Docker command, is to add each user to the docker group, with the command:

$ sudo usermod -aG docker $USER

While this approach is more convenient and commonly used, it is less secure because any user who can send commands to the docker engine can escalate privilege and run root level operations. If you choose to use this method, only add users to the docker group who you would trust with root privileges.

  1. Log in to the DGX™ Container Registry.
    $ docker login nvcr.io
  2. When prompted for your user name, enter the following text:

    The $oauthtoken user name is a special user name that indicates that you will authenticate with an API key and not a user name and password.

  3. When prompted for your password, enter your NVIDIA® DGX™ Cloud Services API key as shown in the following example.
    Username: $oauthtoken
    Password: k7cqFTUvKKdiwGsPnWnyQFYGnlAlsCIRmlP67Qxa
    Tip: When you get your API key, copy it to the clipboard so that you can paste the API key into the command shell when you are prompted for your password.

3. Pulling a Container

You can pull (download) an NVIDIA container that is already built, tested, tuned, and ready to run. Each NVIDIA deep learning container includes the code required to build the framework so that you can make changes to the internals. The containers do not contain sample data-sets or sample model definitions unless they are included with the source for the framework.

Containers are available for download from the DGX™ Container Registry. NVIDIA has provided a number of containers for download from the DGX™ Container Registry. If your organization has provided you with access to any custom containers, you can download them as well.

The location of the framework source is in /opt/<framework> in each container, where <framework> is the name of your container.

You can use the docker pull command to pull images from the NVIDIA DGX Container Registry.

Before pulling an NVIDIA Docker container, ensure that the following prerequisites are met:
  • You have read access to the registry space that contains the container.
  • You are logged into DGX™ Container Registry as explained in Accessing DGX™ Container Registry.
  • You are member of the docker group, which enables you to use docker commands.
Tip: To browse the available containers in the DGX™ Container Registry, use a web browser to log in to your NVIDIA® DGX™ Cloud Services account on the DGX Cloud Services website.

To pull a container from the registry, use the following procedure.

  1. Run the command to download the container that you want from the registry.
    $ docker pull nvcr.io/nvidia/<repository>:<tag>
    where nvcr.io is the name of the NVIDIA Docker repository. For example, you could issue the following command.
    $ docker pull nvcr.io/nvidia/caffe:17.03
    In this case, the container is being pulled from the caffe repository and is version 17.03 (the tag is 17.03).
  2. To confirm that the container was downloaded, list the Docker images on your system.
    $ docker images

3.1. Pulling a Container from NVIDIA Container Registry

A Docker registry is the service that stores Docker images. The service can be on the internet, on the company intranet, or on a local machine. For example, http://nvcr.io/ is the location of the NVIDIA DGX Container Registry for NVIDIA Docker images.

All http://nvcr.io/ Docker images use explicit version-tags to avoid ambiguous versioning which can result from using the latest tag. For example, a locally tagged latest version of an image may actually override a different latest version in the registry.

For more information pertaining to your specific container, refer to the /workspace/README.md file inside the container.

Before you can pull a container from the DGX Container Registry, you must have Docker installed. Ensure that you have installed Docker and NVIDIA Docker. For more information, see Installing Docker and NVIDIA Docker.

The following task assumes:
  1. You have a DGX-1 and it is connected to the network.
  2. Your DGX-1 has Docker installed.
  3. You have access to a browser to go to https://compute.nvidia.com and your DGX Cloud Services account is activated.
  4. You now want to pull a container onto your client machine.
  5. You want to push the container onto your private registry.
  6. You want to pull and run the container on your DGX-1. You will need to have a terminal window open to an SSH session on your DGX-1 to complete this step.
  1. Open a web browser and log onto DGX Cloud Services.
  2. Select the container that you want to pull from the left navigation. For example, click caffe.
  3. In the Tags section, locate the release that you want to run. For example, hover over release 17.03.
  4. In the Actions column, hover over the Download icon. Click the Download icon to display the docker pull command.
  5. Copy the docker pull command and click Close.
  6. Open a command prompt and paste:
    docker pull
    The pulling of the container image begins. Ensure the pull completes successfully.
  7. After you have the Docker container file on your local system, load the container into your local Docker registry.
  8. Verify that the image is loaded into your local Docker registry.

4. NVIDIA Docker Images

As previously mentioned, the DGX-1 has the ability to use pre-built NVIDIA containers for various frameworks. The containers are contained in a NVIDIA Docker repository called nvcr.io. As you read in the previous section, these containers can be “pulled” from the repository and used for deep learning.

A Docker image is simply a file-system that a developer builds. An NVIDIA Docker image serves as the template for the container, and is a software stack that consists of several layers. Each layer depends on the layer below it in the stack.

From a Docker image, a container is formed. When creating a container, you add a writable layer on top of the stack. A Docker image with a writable container layer added to it is a container. A container is simply a running instance of that image. All changes and modifications made to the container are made to the writable layer. You can delete the container; however, the Docker image remains untouched.

Figure 1 depicts the NVIDIA Docker stack for the DGX-1. Notice that the NVIDIA Docker tools sit above the host OS and the NVIDIA Drivers. The tools are used to create and use NVIDIA containers - these are the layers above the NVIDIA Docker layer. These containers have applications, deep learning SDK’s, and CUDA Toolkits. The NVIDIA Docker tools take care of mounting the appropriate NVIDIA Drivers.
Figure 1. NVIDIA Docker mounts the user mode components of the NVIDIA driver and the GPUs into the Docker container at launch. NVIDIA Docker mounts the user mode components of the NVIDIA driver and the GPUs into the Docker container at launch.

4.1. NVIDIA Docker Images Versions

Each release of an NVIDIA Docker deep learning framework image is identified by the year and month of its release. For example, the 17.01 release of an image was released in January, 2017.

An image name consists of two parts separated by a colon. The first part is the name of the container in the repository and the second part is the “tag” associated with the container. These two pieces of information are shown in Figure 2, which is the output from issuing the docker images command.

Figure 2. Output from docker images command Output from docker images command
Figure 2 shows simple examples of image names, such as:
  • nvidia-cuda:8.0-devel
  • ubuntu:latest
  • nvcr.io/nvidia/tensorflow:17.01
If you choose not to add a tag to an image, by default the word “latest ” is added as the tag.

In the next sections, you will use these image names for running containers. Later in the document, there is also a section on creating your own containers or customizing and extending existing containers.

5. Running a Container

To run a container, you must issue the nvidia-docker run command, specifying the registry, repository, and tags.

Before you can run an NVIDIA Docker deep learning framework container, you must have nvidia-docker installed. For more information, see Installing Docker and NVIDIA Docker.
  1. As a user, run the container interactively.
    $ nvidia-docker run --rm -ti nvcr.io/nvidia/<framework>

    The following example runs the December 2016 release (16.12) of the NVIDIA Caffe container in interactive mode. The container is automatically removed when the user exits the container.

    $ nvidia-docker run --rm -ti nvcr.io/nvidia/caffe:16.12
    == Caffe ==
    NVIDIA Release 16.12 (build 6217)
    Container image Copyright (c) 2016, NVIDIA CORPORATION.  All rights reserved.
    Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
    All rights reserved.
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
  2. From within the container, start the job that you want to run. The precise command to run depends on the deep learning framework in the container that you are running and the job that you want to run. For details see the /workspace/README.md file for the container.

    The following example runs the caffe time command on one GPU to measure the execution time of the deploy.prototxt model.

    # caffe time -model models/bvlc_alexnet/ -solver deploy.prototxt -gpu=0
  3. Optional: Run the December 2016 release (16.12) of the same NVIDIA Caffe container but in non-interactive mode.
    % nvidia-docker run --rm nvcr.io/nvidia/caffe:16.12 caffe time -model
          /workspace/models/bvlc_alexnet -solver /workspace/deploy.prototxt -gpu=0

5.1. nvidia-docker run

When you run the nvidia-docker run command:

  • The Docker engine loads the image into a container which runs the software.
  • You define the runtime resources of the container by including additional flags and settings that are used with the command. These flags and settings are described in the following sections.
  • The GPUs are explicitly defined for the Docker container (defaults to all GPUs, can be specified using NV_GPU environment variable).

5.2. Specifying a User

Unless otherwise specified, the user inside the container is the root user.

When running within the container, files created on the host operating system or network volumes can be accessed by the root user. This is unacceptable for some users and they will want to set the ID of the user in the container. For example, to set the user in the container to be the currently running user, issue the following:
% nvidia-docker run -ti --rm -u $(id -u):$(id -g) nvcr.io/nvidia/<repository>:<tag>
Typically, this results in warnings due to the fact that the specified user and group do not exist in the container. You might see a message similar to the following:
groups: cannot find name for group ID 1000I have no name! @c177b61e5a93:/workspace$
The warning can usually be ignored.

5.3. Setting the Remove Flag

By default, Docker containers remain on the system after being run. Repeated pull or run operations use up more and more space on the local disk, even after exiting the container. Therefore, it is important to clean up the Docker containers after exiting.
Note: Do not use the --rm flag if you have made changes to the container that you want to save, or if you want to access job logs after the run finishes.
To automatically remove a container when exiting, add the --rm flag to the run command.
% nvidia-docker run --rm nvcr.io/nvidia/<repository>:<tag>

5.4. Setting the Interactive Flag

By default, containers run in batch mode; that is, the container is run once and then exited without any user interaction. Containers can also be run in interactive mode as a service.

To run in interactive mode, add the -ti flag to the run command.
% nvidia-docker run -ti --rm nvcr.io/nvidia/<repository>:<tag>

5.5. Setting the Volumes Flag

There are no data sets included with the containers, therefore, if you want to use data sets, you need to mount volumes into the container from the host operating system. For more information, see Manage data in containers.

Typically, you would use either Docker volumes or host data volumes. The primary difference between host data volumes and Docker volumes is that Docker volumes are private to Docker and can only be shared amongst Docker containers. Docker volumes are not visible from the host operating system, and Docker manages the data storage. Host data volumes are any directory that is available from the host operating system. This can be your local disk or network volumes.

Example 1
Mount a directory /raid/imagedata on the host operating system as /images in the container.
% nvidia-docker run -ti --rm -v /raid/imagedata:/images
Example 2
Mount a local docker volume named data (must be created if not already present) in the container as /imagedata.
% nvidia-docker run -ti --rm -v data:/imagedata nvcr.io/nvidia/<repository>:<tag>

5.6. Setting the Mapping Ports Flag

Applications such as DIGITS open a port for communications. You can control whether that port is open only on the local system or is available to other computers on the network outside of the local system.

Using DIGITS as an example, in DIGITS 5.0 starting in container image 16.12, by default the DIGITS server is open on port 5000. However, after the container is started, you may not easily know the IP address of that container. To know the IP address of the container, you can choose one of the following ways:
  • Expose the port using the local system network stack (--net=host) where port 5000 of the container is made available as port 5000 of the local system.
  • Map the port (-p 8080:5000) where port 5000 of the container is made available as port 8080 of the local system.

In either case, users outside the local system have no visibility that DIGITS is running in a container. Without publishing the port, the port is still available from the host, however not from the outside.

5.7. Setting the Shared Memory Flag

Certain applications, such as PyTorch and the Cognitive Toolkit, use shared memory buffers to communicate between processes. Shared memory can also be required by single process applications, such as MXNet and TensorFlow, which use the NCCL library.

By default Docker containers are allotted 64MB of shared memory. This can be insufficient, particularly when using all 8 GPUs. To increase the shared memory limit to a specified size, for example 1GB, include the --shm-size=1g flag in your docker run command.

Alternatively, you can specify the --ipc=host flag to re-use the host’s shared memory space inside the container. Though this latter approach has security implications as any data in shared memory buffers could be visible to other containers.

5.8. Setting the Restricting Exposure of GPUs Flag

From inside the container, the scripts and software are written to take advantage of all available GPUs. To coordinate the usage of GPUs at a higher level, you can use this flag to restrict the exposure of GPUs from the host to the container. For example, if you only want GPU 0 and GPU 1 to be seen in the container, you would issue the following:
$ NV_GPU=0,1 nvidia-docker run ...

This flag creates a temporary environment variable that restricts which GPUs are used.

Specified GPUs are defined per container using the docker device-mapping feature, which is currently based on Linux cgroups.

5.9. Container Lifetime

The state of an exited container is preserved indefinitely if you do not pass the --rm flag to the nvidia-docker run command. You can list all of the saved exited containers and their size on the disk with the following command:
$ docker ps --all --size --filter Status=exited

The container size on the disk depends on the files created during the container execution, therefore the exited containers take only a small amount of disk space.

You can permanently remove a exited container by issuing:
docker rm [CONTAINER ID]
By saving the state of containers after they have exited, you can still interact with them using the standard Docker commands. For example:
  • You can examine logs from a past execution by issuing the docker logs command.
    $ docker logs 9489d47a054e
  • You can extract files using the docker cp command.
    $ docker cp 9489d47a054e:/log.txt .
  • You can restart a stopped container using the docker restart command.
    $ docker restart <container name>
    For the Caffe container, issue this command:
    $ docker restart caffe
  • You can save your changes by creating a new image using the docker commit command. For more information, see Example 3: Customizing a Container using docker commit.
    Note: Use care when committing docker container changes, as data files created during use of the container will be added to the resulting image. In particular, core dump files and logs can dramatically increase the size of the resulting image.

6. NVIDIA Deep Learning Software Stack

The NVIDIA Deep Learning Software Developer Kit (SDK) is everything that is on the DGX Registry in the NVIDIA registry area; including CUDA Toolkit, DIGITS workflow and all of the deep learning frameworks.

The NVIDIA Deep Learning SDK accelerates widely-used deep learning frameworks such as Caffe, Caffe2, Cognitive Toolkit, MXNet, PyTorch, TensorFlow, Theano, and Torch.

The software stack provides containerized versions of these frameworks optimized for the system. These frameworks, including all necessary dependencies, are pre-built, tested, tuned, and ready to run. For users who need more flexibility to build custom deep learning solutions, each framework container image also includes the framework source code to enable custom modifications and enhancements, along with the complete software development stack.

The design of the platform software is centered around a minimal OS and driver install on the server, and provisioning of all application and SDK software in NVIDIA Docker containers through NVIDIA Docker Registry. Figure 1 presents a graphical layout of the layers of the software stack.

6.1. OS Layer

Within the software stack, the lowest layer (or base layer) is the user space of the OS. The software in this layer includes all of the security patches that are available within the month of the release.

6.2. CUDA Layer

CUDA is a parallel computing platform and programming model created by NVIDIA to give application developers access to the massive parallel processing capability of GPUs. CUDA is the foundation for GPU acceleration of deep learning as well as a wide range of other computation- and memory-intensive applications ranging from astronomy, to molecular dynamics simulation, to computational finance.

6.2.1. CUDA Runtime

The CUDA runtime layer provides the components needed to execute CUDA applications in the deployment environment. The CUDA runtime is packaged with the toolkit and includes all of the shared libraries, but none of the CUDA compiler components.

6.2.2. CUDA Toolkit

The NVIDIA CUDA Toolkit provides a development environment for developing optimized GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize and deploy your applications to GPU-accelerated embedded systems, desktop workstations, enterprise data-centers and the cloud. The toolkit includes libraries, tools for debugging and optimization, a compiler and a runtime library to deploy your application.

The following library provides GPU-accelerated primitives for deep neural networks:
CUDA Basic Linear Algebra Subroutines library (cuBLAS)
cuBLAS is a GPU-accelerated version of the complete standard BLAS library that delivers significant speedup running on GPUs. The cuBLAS generalized matrix-matrix multiplication (GEMM) routine is a key computation used in deep neural networks, for example in computing fully connected layers.

6.3. Deep Learning Libraries Layer

The following libraries are critical to Deep Learning on NVIDIA GPUs. These libraries are a part of the NVIDIA Deep Learning Software Development Kit (SDK).

6.3.1. NCCL

The NVIDIA Collective Communications Library (NCCL, pronounced “Nickel”) is a library of multi-GPU collective communication primitives that are topology-aware and can be easily integrated into applications.

Collective communication algorithms employ many processors working in concert to aggregate data. NCCL is not a full-blown parallel programming framework; rather, it is a library focused on accelerating collective communication primitives. The following collective operations are currently supported:
  • AllReduce
  • Broadcast
  • Reduce
  • AllGather
  • ReduceScatter

Tight synchronization between communicating processors is a key aspect of collective communication. CUDA based collectives would traditionally be realized through a combination of CUDA memory copy operations and CUDA kernels for local reductions. NCCL, on the other hand, implements each collective in a single kernel handling both communication and computation operations. This allows for fast synchronization and minimizes the resources needed to reach peak bandwidth.

NCCL conveniently removes the need for developers to optimize their applications for specific machines. NCCL provides fast collectives over multiple GPUs both within and across nodes. It supports a variety of interconnect technologies including PCIe, NVLINK, InfiniBand Verbs, and IP sockets. NCCL also automatically patterns its communication strategy to match the system’s underlying GPU interconnect topology.

Next to performance, ease of programming was the primary consideration in the design of NCCL. NCCL uses a simple C API, which can be easily accessed from a variety of programming languages. NCCL closely follows the popular collectives API defined by MPI (Message Passing Interface). Anyone familiar with MPI will thus find NCCL’s API very natural to use. In a minor departure from MPI, NCCL collectives take a “stream” argument which provides direct integration with the CUDA programming model. Finally, NCCL is compatible with virtually any multi-GPU parallelization model, for example:
  • single-threaded
  • multi-threaded, for example, using one thread per GPU
  • multi-process, for example, MPI combined with multi-threaded operation on GPUs

NCCL has found great application in Deep Learning Frameworks, where the AllReduce collective is heavily used for neural network training. Efficient scaling of neural network training is possible with the multi-GPU and multi node communication provided by NCCL.

6.3.2. cuDNN Layer

The CUDA Deep Neural Network library (cuDNN) provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

Frameworks do not all progress at the same rate and the lack of backward compatibility within the cuDNN library forces it to be in its own container. This means that there will be multiple CUDA and cuDNN containers available, but they will each have their own tag which the framework will need to specify in its Dockerfile.

6.4. Framework Containers

The framework layer includes all of the requirements for the specific deep learning framework. The primary goal of this layer is to provide a basic working framework. The frameworks can be further customized by a Platform Container layer specification.

Within the frameworks layer, you can choose to:
  • Run a framework exactly as delivered by NVIDIA; in which case, the framework is built and ready to run inside that container image.
  • Start with the framework as delivered by NVIDIA and modify it a bit; in which case, you can start from NVIDIA’s container image, apply your modifications and recompile it inside the container.
  • Start from scratch and build whatever application you want on top of the CUDA and cuDNN and NCCL layer that NVIDIA provides.

In the next section, the NVIDIA deep learning framework containers are presented.

7. NVIDIA Deep Learning Framework Containers

A deep learning framework is part of a software stack that consists of several layers. Each layer depends on the layer below it in the stack. This software architecture has many advantages:
  • Because each deep learning framework is in a separate container, each framework can use different versions of libraries such as, libc, cuDNN, and others, and not interfere with each other.
  • A key reason for having layered containers is that one can target the experience for what the user requires.
  • As deep learning frameworks are improved for performance or bug fixes, new versions of the containers are made available in the registry.
  • The system is easy to maintain, and the OS image stays clean since applications are not installed directly on the OS.
  • Security updates, driver updates and OS patches can be delivered seamlessly.

The following sections present the framework containers that are in nvcr.io.

7.1. Why Use a Framework?

Frameworks have been created to make researching and applying deep learning more accessible and efficient. The key benefits of using frameworks include:

  • Frameworks provide highly optimized GPU enabled code specific to the computations required for training Deep Neural Networks (DNN).
  • NVIDIA's frameworks are tuned and tested for the best possible GPU performance.
  • Frameworks provide access to code through simple command line or scripting language interfaces such as Python.
  • Many powerful DNNs can be trained and deployed using these frameworks without ever having to write any GPU or complex compiled code but while still benefiting from the training speed-up afforded by GPU acceleration.

7.2. Caffe

Caffe is a deep learning framework made with flexibility, speed, and modularity in mind. It was originally developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

NVIDIA Caffe is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations. NVIDIA Caffe includes:
  • Supports 16-bit (half) floating point train and inference.
  • Mixed-precision support . It allows to store and/or compute data in either 64, 32 or 16-bit formats. Precision can be defined on each layer (forward and backward phases might be different too), or it can be set to a default for the whole Net.
  • Integration with cuDNN v6.
  • Automatic selection of the best cuDNN convolution algorithm.
  • Integration with v1.3.4 of NCCL library for improved multi-GPU scaling.
  • Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
  • Parallel data parser and transformer for improved I/O performance.
  • Parallel back-propagation and gradient reduction on multi-GPU systems.
  • Fast solvers implementation with fused CUDA kernels for weights and history update.
  • Multi-GPU test phase for even memory load across multiple GPUs.
  • Backward compatibility with BVLC Caffe and NVIDIA Caffe 0.15.
  • Extended set of optimized models (including 16-bit floating point examples).

7.3. Caffe2

Caffe2 is a deep-learning framework designed to easily express all model types, for example, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and more, in a friendly python-based API, and execute them using a highly efficiently C++ and CUDA backend.

It allows a large amount of flexibility for the user to assemble their model, whether for inference or training, using combinations of high-level and expressive operations, before running through the same python interface allowing for easy visualization, or serializing the created model and directly using the underlying C++ implementation.

Caffe2 supports single and multi-GPU execution, along with support for multi-node execution.

The following list summarizes the DGX-1 Caffe2 optimizations and changes:
  • Use of the latest cuDNN release
  • Performance fine-tuning
  • GPU-accelerated image input pipeline
  • Automatic selection of the best convolution algorithm

7.4. Cognitive Toolkit

The Cognitive Toolkit (CNTK), is a unified deep learning toolkit that allows users to easily realize and combine popular model types such as feed-forward deep neural networks (DNNs), CNNs, and RNNs.

The Cognitive Toolkit implements Stochastic Gradient Descent (SGD) learning with automatic differentiation and parallelization across multiple GPUs and servers. The Cognitive Toolkit can be called as a library from Python or C++ applications, or executed as a standalone tool using the BrainScript model description language.

NVIDIA and Microsoft worked closely together to accelerate the Cognitive Toolkit on GPU-based systems such as DGX-1 and Azure N-Series virtual machines. This combination offers startups and major enterprises alike, tremendous ease of use and scalability since a single framework can be used to first train models on premises with the DGX-1 and later deploy those models at scale in the Microsoft Azure cloud.

The following list summarizes the DGX-1 Cognitive Toolkit optimizations and changes:
  • Use of the latest cuDNN release
  • Integration of the latest version of NCCL with NVLink support for improved multi-GPU scaling. NCCL with NVLink boosts the training performance of ResNet-50 by 2x when using data parallel SGD.
  • Image reader pipeline improvements allow AlexNet to train at over 12,000 images/second.
  • Reduced GPU memory overhead for multi-GPU training by up to 2 GB per GPU.
  • Dilated convolution support
  • Optimizations reducing the memory footprint needed for cuDNN workspaces

7.5. MXNet

MXNet is a deep learning framework designed for both efficiency and flexibility, which allows you to mix the symbolic and imperative programming to maximize efficiency and productivity.

At the core of MXNet is a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of the scheduler makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, and scales to multiple GPUs and multiple machines.

The following list summarizes the DGX-1 MXNet optimizations and changes:
  • Use of the latest cuDNN release
  • Improved input pipeline for image processing
  • Optimized embedding layer CUDA kernels
  • Optimized tensor broadcast and reduction CUDA kernels

7.6. TensorFlow

TensorFlow is an open-source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

For visualizing TensorFlow results, the TensorFlow Docker image also contains TensorBoard. TensorBoard is a suite of visualization tools. For example, you can view the training histories as well as what the model looks like.

The following list summarizes the DGX-1 TensorFlow optimizations and changes:
  • Use of the latest cuDNN release
  • Integration of the latest version of NCCL with NVLink support for improved multi-GPU scaling. NCCL with NVLink boosts the training performance of ResNet-50 by 2x when using data parallel SGD.
  • Support for fused color adjustment kernels by default
  • Support for use of non-fused Winograd convolution algorithms by default

7.7. Theano

Theano is a Python library that allows you to efficiently define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays. Theano has been powering large-scale computationally intensive scientific investigations since 2007.

The following list summarizes the DGX-1 Theano optimizations and changes:
  • Use of the latest cuDNN release
  • Runtime code generation: evaluate expressions faster
  • Extensive unit-testing and self-verification: detect and diagnose many types of errors

7.8. Torch

Torch is a scientific computing framework with wide support for deep learning algorithms. Torch is easy to use and efficient, thanks to an easy and fast scripting language, Lua, and an underlying C/CUDA implementation.

Torch offers popular neural network and optimization libraries that are easy to use yet provide maximum flexibility to build complex neural network topologies.

The following list summarizes the DGX-1 Torch optimizations and changes:
  • Use of the latest cuDNN release
  • Integration on the latest version of NCCL with NVLink support for improved multi-GPU scaling. NCCL with NVLink boosts the training performance of ResNet-50 by 2x when using data parallel SGD.
  • Buffering of parameters to be communicated by NCCL to reduce latency overhead
  • cuDNN bindings for re-currents networks (RNN, GRU, LSTM), including persistent versions, which greatly improving the performance of small batch training.
  • Dilated convolution support
  • Support for 16- and 32-bit floating point (FP16 and FP32) data input to cuDNN routines
  • Support for operations on FP16 tensors (using FP32 arithmetic)

7.9. PyTorch

PyTorch is a Python package that provides two high-level features:
  • Tensor computation (like numpy) with strong GPU acceleration
  • Deep Neural Networks built on a tape-based autograd system

You can reuse your favorite Python packages such as numpy, scipy and Cython to extend PyTorch when needed.

The following list summarizes the DGX-1 PyTorch optimizations and changes:
  • Use of latest cuDNN release
  • Integration of the latest version of NCCL with NVLink support
  • Buffering of parameters to be communicated by NCCL to reduce latency overhead
  • Dilated convolution support
  • Optimizations to avoid unnecessary copies of data and zeroing of buffers

7.10. DIGITS

The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning into the hands of engineers and data scientists.

DIGITS is not a framework. DIGITS is a wrapper for Caffe and Torch; which provides a graphical web interface to those frameworks rather than dealing with them directly on the command-line.

DIGITS can be used to rapidly train highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks. DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.

8. Customizing and Extending Containers and Frameworks

NVIDIA Docker images come prepackaged, tuned, and ready to run; however, you may want to build a new image from scratch or augment an existing image with custom code, libraries, data, or settings for your corporate infrastructure. This section will guide you through exercises that will highlight how to create a container from scratch, customize a container, extend a deep learning framework to add features, develop some code using that extended framework from the developer environment, then package that code as a versioned release.

By default, you do not need to build a container. The DGX-1 container repository from NVIDIA, nvcr.io, has a number of containers that can be used immediately. These include containers for deep learning as well as containers with just the CUDA Toolkit.

One of the great things about containers is that they can be used as starting points for creating new containers. This can be referred to as “customizing” or “extending” a container. You can create a container completely from scratch, however, since these containers are likely to run on the DGX-1, it is recommended that you are least start with a nvcr.io container that contains the OS and CUDA. However, you are not limited to this and can create a container that runs on the CPUs in the DGX-1 which does not use the GPUs. In this case, you can start with a bare OS container from the Docker Hub. However, to make development easier, you can still start with a container with CUDA - it is just not used when the container is used.

The customized or extended containers can be saved to a user’s private container repository. They can also be shared with other users of the DGX-1 but this requires some administrator help.

It is important to note that all NVIDIA Docker deep learning framework images include the source to build the framework itself as well as all of the prerequisites.
Attention: Do not install an NVIDIA driver into the docker image at docker build time. nvidia-docker is essentially a wrapper around docker that transparently provisions a container with the necessary components to execute code on the GPU.

A best-practice is to avoiddocker commit usage for developing new docker images, and to use Dockerfiles instead. The Dockerfile method provides visibility and capability to efficiently version-control changes made during development of a docker image. The docker commit method is appropriate for short-lived, disposable images only.

For more information on writing a docker file, see the best practices documentation.

8.1. Customizing a Container

NVIDIA provides a large set of images in the Docker Registry that are already tested, tuned, and are ready to run. You can pull any one of these images to create a container and add software or data of your choosing.

A best-practice is to avoid docker commit usage for developing new docker images, and to use Dockerfiles instead. The Dockerfile method provides visibility and capability to efficiently version-control changes made during development of a docker image. The docker commit method is appropriate for short-lived, disposable images only (see Example 3: Customizing a Container using docker commit for an example).

For more information on writing a docker file, see the best practices documentation.

8.1.1. Benefits and Limitations to Customizing a Container

You can customize a container to fit your specific needs for numerous reasons; for example, you depend upon specific software that is not included in the container that NVIDIA provides. No matter your reasons, you can customize a container.

The container images do not contain sample data-sets or sample model definitions unless they are included with the framework source. Be sure to check the container for sample data-sets or models.

8.1.2. Example 1: Building a Container from Scratch

Docker uses Dockerfiles to create or build a Docker image. Dockerfiles are scripts that contain commands that Docker uses successively to create a new Docker image. Simply put, a Dockerfile is the source code for the container image. Dockerfiles always start with a base image to inherit from.

For more information, see Best practices for writing Dockerfiles.

  1. Create a working directory on your local hard-drive.
  2. In that directory, open a text editor and create a file called Dockerfile. Save the file to your working directory.
  3. Open your Dockerfile and include the following:
    FROM ubuntu:14.04
    RUN apt-get update && apt-get install -y curl
    CMD echo "hello from inside a container"
    Where the last line CMD, executes the indicated command when creating the container. This is a way to check that the container was built correctly.

    For this example, we are also pulling the container from the Docker repository and not the DGX-1 repository. There will be subsequent examples using the NVIDIA repository.

  4. Save and close your Dockerfile.
  5. Build the image. Issue the following command to build the image and create a tag.
    $ docker build -t <new_image_name>:<new_tag> .
    Note: This command was issued in the same directory where the Dockerfile is located.

    The output from the docker build process lists “Steps”; one for each line in the Dockerfile.

    For example, let us name the container test1 and tag it with latest. Also, for illustrative purposes, let us assume our private DGX repository is called nvidian_sas. The command below builds the container. Some of the output is shown below so you know what to expect.
    $ docker build -t test1:latest .
    Sending build context to Docker daemon 3.072 kB
    Step 1/3 : FROM ubuntu:14.04
    14.04: Pulling from library/ubuntu
    Step 2/3 : RUN apt-get update && apt-get install -y curl
    Step 3/3 : CMD echo "hello from inside a container"
     ---> Running in 1f491b9235d8
     ---> 934785072daf
    Removing intermediate container 1f491b9235d8
    Successfully built 934785072daf

    For information about building your image, see docker build. For information about tagging your image, see docker tag.

  6. Verify that the build was successful. You should see a message similar to the following:
    Successfully built  934785072daf
    This message indicates that the build was successful. Any other message and the build was not successful.
    Note: The number, 934785072daf, is assigned when the image is built and is random.
  7. Confirm you can view your image. Issue the following command and view your container.
    $ docker images
    REPOSITORY      TAG            IMAGE ID        CREATED                SIZE
    test1           latest         934785072daf    19 minutes ago         222 MB
    The new container is now available to be used.
    Note: The container is local to this DGX-1. If you want to store the container in your private repository, follow the next step.
  8. Store the container in your private Docker repository by pushing it.
    1. The first step in pushing it, is to tag it.
      $ docker tag test1 nvcr.io/nvidian_sas/test1:latest
    2. Now that the image has been tagged, you can push it.
      $ docker push nvcr.io/nvidian_sas/test1:latest
      The push refers to a repository [nvcr.io/nvidian_sas/test1]
    3. Verify that the container appears in the nvidian_sas repository by going to DGX Cloud Services website and looking for the container in that repository.

8.1.3. Example 2: Customizing a Container using Dockerfile

This example uses a Dockerfile to customize the caffe container in nvcr.io. Before customizing the container, you should ensure the caffe 17.03 container has been loaded into the registry using the docker pull command before proceeding.
$ docker pull nvcr.io/nvidia/caffe:17.03

As mentioned earlier in this document, the Docker containers on nvcr.io also provide a sample Dockerfile that explains how to patch a framework and rebuild the Docker image. In the directory /workspace/docker-examples, there are two sample Dockerfiles. For this example, we will use the Dockerfile.customcaffe file as a template for customizing a container.

  1. Create a working directory called my_docker_images on your local hard drive.
  2. Open a text editor and create a file called Dockerfile. Save the file to your working directory.
  3. Open your Dockerfile again and include the following lines in the file:
    FROM nvcr.io/nvidia/caffe:17.03
    # Bring in changes from outside container to /tmp
    # (assumes my-caffe-modifications.patch is in same directory as
    #COPY my-caffe-modifications.patch /tmp
    # Change working directory to NVCaffe source path
    WORKDIR /opt/caffe
    # Apply modifications
    #RUN patch -p1 < /tmp/my-caffe-modifications.patch
    # Note that the default workspace for caffe is /workspace
    RUN mkdir build && cd build && \
    -DCUDA_ARCH_PTX="61" .. && \
      make -j"$(nproc)" install && \
      make clean && \
      cd .. && rm -rf build
    # Reset default working directory
    WORKDIR /workspace
    Save the file.
  4. Build the image using the docker build command and specify the repository name and tag. In the following example, the repository name is corp/caffe and the tag is 17.03.1PlusChanges. For the case, the command would be the following:
    $ docker build -t corp/caffe:17.03.1PlusChanges .
  5. Run the Docker image using the nvidia-docker run command. For example:
    $ nvidia-docker run -ti --rm corp/caffe:17.03.1PlusChanges .

8.1.4. Example 3: Customizing a Container using docker commit

This example uses the docker commit command to flush the current state of the container to a Docker image. This is not a recommended best practice, however, this is useful when you have a container running to which you have made changes and want to save them. In this example, we are using the apt-get tag to install packages which requires that the user run as root.
  • The Caffe image release 17.04 is used in the example instructions for illustrative purposes.
  • Do not use the --rm flag when running the container. If you use the --rm flag when running the container, your changes will be lost when exiting the container.
  1. Pull the Docker container from the nvcr.io repository to the DGX-1 system. For example, the following command will pull the Caffe container:
    $ docker pull nvcr.io/nvidia/caffe:17.04
  2. Run the container on the DGX-1 using nvidia-docker.
    $ nvidia-docker run -ti nvcr.io/nvidia/caffe:17.04
    == NVIDIA Caffe ==
    NVIDIA Release 17.04 (build 26740)
    Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
    Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
    All rights reserved.
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for NVIDIA Caffe.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
  3. You should now be the root user in the container (notice the prompt). You can use the command apt to pull down a package and put it in the container.
    Note: The NVIDIA containers are built using Ubuntu which uses the apt-get package manager. Check the container release notes Deep Learning Documentation for details on the specific container you are using.
    In this example, we will install octave; the GNU clone of MATLAB, into the container.
    # apt-get update
    # apt install octave
    Note: You ahve to first issue apt-get update before you install octave using apt.
  4. Exit the workspace.
    # exit
  5. Display the list of containers using docker ps -a. As an example, here is some of the output from the docker ps -a command:
    $ docker ps -a
    CONTAINER ID    IMAGE                        CREATED       ...
    1fe228556a97    nvcr.io/nvidia/caffe:17.04   3 minutes ago ...
  6. Now you can create a new image from the container that is running where you have installed octave. You can commit the container with the following command.
    $ docker commit 1fe228556a97 nvcr.io/nvidian_sas/caffe_octave:17.04
  7. Display the list of images.
    $ docker images
    REPOSITORY                 	TAG             	IMAGE ID     ...
    nvidian_sas/caffe_octave   	17.04           	75211f8ec225 ...
  8. To verify, let's run the container again and see if Octave is actually there.
    $ nvidia-docker run -ti nvidian_sas/caffe_octave:17.04
    == NVIDIA Caffe ==
    NVIDIA Release 17.04 (build 26740)
    Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved. Copyright (c) 2014, 2015, The Regents of the University of California (Regents) All rights reserved.
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for NVIDIA Caffe.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
    root@2fc3608ad9d8:/workspace# octave
    octave: X11 DISPLAY environment variable not set
    octave: disabling GUI features
    GNU Octave, version 4.0.0
    Copyright (C) 2015 John W. Eaton and others.
    This is free software; see the source code for copying conditions.
    FITNESS FOR A PARTICULAR PURPOSE.  For details, type 'warranty'.
    Octave was configured for "x86_64-pc-linux-gnu".
    Additional information about Octave is available at http://www.octave.org.
    Please contribute if you find this software useful.
    For more information, visit http://www.octave.org/get-involved.html
    Read http://www.octave.org/bugs.html to learn how to submit bug reports.
    For information about changes from previous versions, type 'news'.

    Since the octave prompt displayed, Octave is installed.

  9. If you want to save the container into your private repository (Docker uses the phrase “push”), then you can use the command docker push ....
    $ docker push nvcr.io/nvidian_sas/caffe_octave:17.04

The new Docker image is now available for use. You can check your local Docker repository for it.

8.1.5. Example 4: Developing a Container using Docker

There are two primary use cases for a developer to extend a container:
  1. Create a development image that contains all of the immutable dependencies for the project, but not the source code itself.
  2. Create a production or testing image that contains a fixed version of the source and all of the software dependencies.

The datasets are not packaged in the container image. Ideally, the container image is designed to expect volume mounts for datasets and results.

In these examples, we mount our local dataset from /raid/datasets on our host to /dataset as a read-only volume inside the container. We also mount a job specific directory to capture the output from a current run.

In these examples, we will create a timestamped output directory on each container launch and map that into the container at /output. Using this method, the output for each successive container launch is captured and isolated.

Including the source into a container for developing and iterating on a model has many awkward challenges that can over complicate the entire workflow. For instance, if your source code is in the container, then your editor, version control software, dotfiles, etc. also need to be in the container.

However, if you create a development image that contains everything you need to run your source code, you can map your source code into the container to make use of your host workstation’s developer environment. For sharing a fixed version of a model, it is best to package a versioned copy of the source code and trained weights with the development environment.

As an example, we will work though a development and delivery example for the open source implementation of the work found in Image-to-Image Translation with Conditional Adversarial Networks by Isola et. al. and is available at pix2pix. Pix2Pix is a Torch implementation for learning a mapping from input images to output images using a Conditional Adversarial Network. Since online projects can change over time, we will focus our attention on the snapshot version d7e7b8b557229e75140cbe42b7f5dbf85a67d097 change-set.

In this section, we are using the container as a virtual environment, in that the container has all the programs and libraries needed for our project.
Note: We have kept the network definition and training script separate from the container image. This is a useful model for iterative development because the files that are actively being worked on are persistent on the host and only mapped into the container at runtime.

The differences to the original project can be found here Comparing changes.

If the machine you are developing on is not the same machine on which you will be running long training sessions, then you may want to package your current development state in the container.

  1. Create a working directory on your local hard-drive.
    mkdir Projects
    $ cd ~/Projects
  2. Git clone the Pix2Pix Git repository.
    $ git clone https://github.com/phillipi/pix2pix.git
    $ cd pix2pix
  3. Run the git checkout command.
    $ git checkout -b devel d7e7b8b557229e75140cbe42b7f5dbf85a67d097
  4. Download the dataset:
    bash ./datasets/download_dataset.sh facades
    I want to put the dataset on my fast /raid storage.
    $ mkdir -p /raid/datasets
    $ mv ./datasets/facades /raid/datasets
  5. Create a file called Dockerfile, and add the following lines:
    FROM nvcr.io/nvidia/torch:17.03
    RUN luarocks install nngraph
    RUN luarocks install 
    WORKDIR /source
  6. Build the development Docker container image (build-devel.sh).
    docker build -t nv/pix2pix-torch:devel .
  7. Create the following train.sh script:
    #!/bin/bash -x
    DATA_ROOT=$DATA_ROOT name="${DATASET}_generation"
    which_direction=BtoA th train.lua

    If you were actually developing this model, you would be iterating by making changes to the files on the host and running the training script which executes inside the container.

  8. Optional: Edit the files and execute the next step after each change.
  9. Run the training script (run-devel.sh).
    nvidia-docker run --rm -ti -v $PWD:/source  -v
    /raid/datasets:/datasets nv/pix2pix-torch:devel ./train.sh

Example 4.1: Package the Source into the Container

Packaging the model definition and script into the container is very simple. We simply add a COPY step to the Dockerfile.

We’ve updated the run script to simply drop the volume mounting and use the source packaged in the container. The packaged container is now much more portable than our devel container image because the internal code is fixed. It would be good practice to version control this container image with a specific tag and store it in a container registry.

The updates to run the container are equally subtle. We simply drop the volume mounting of our local source into the container.

8.2. Customizing a Framework

Each Docker image contains the code required to build the framework so that you can make changes to the framework itself. The location of the framework source in each image is in the /workspace directory.

8.2.1. Benefits and Limitations to Customizing a Framework

Customizing a framework is useful if you have patches or modifications you want to make to the framework outside of the NVIDIA repository or if you have a special patch that you want to add to the framework.

8.2.2. Example 1: Customizing a Framework

This example illustrates how you can customize a framework and rebuild the container. For this example, we will use the caffe 17.03 framework.

Currently, the Caffe framework returns the following output message to stdout when a network layer is created:
“Creating Layer”
For example, you can see this output by running the following command from a bash shell in a caffe 17.03 container.
# which caffe
# caffe time --model /workspace/models/bvlc_alexnet/deploy.prototxt
I0523 17:57:25.603410 41 net.cpp:161] Created Layer data (0)
I0523 17:57:25.603426 41 net.cpp:501] data -> data
I0523 17:57:25.604748 41 net.cpp:216] Setting up data

The following steps show you how to change the message “Created Layer” in caffe to “Just Created Layer”. This example illustrates how you might modify an existing framework.

Ensure you run the framework container in interactive mode.

  1. Locate the caffe 17.03 container from the nvcr.io repository.
    $ docker pull nvcr.io/nvidia/caffe:17.03
  2. Run the container on the DGX-1.
    $ nvidia-docker run --rm -ti nvcr.io/nvidia/caffe:17.03
    Note: This will make you the root user in the container. Notice the change in the prompt.
  3. Edit a file in the NVIDIA Caffe source file, /opt/caffe/src/caffe/net.cpp. The line you want to change is around line 162.
    # vi /opt/caffe/src/caffe/net.cpp
    :162 s/Created Layer/Just Created Layer
    Note: This uses vi. The idea is change “Created Layer” to “Just Created Layer”.
  4. Rebuild Caffe. Follow the steps below to accomplish this.
    # cd /opt/caffe
    61" -DCUDA_ARCH_PTX="61" ..
    # make -j"$(proc)" install
    # make install
    # ldconfig
  5. Before running the updated Caffe framework, ensure the updated caffe binary is in the correct location, for example, /usr/local/.
    # which caffe
  6. Run Caffe and look for a change in the output to stdout:
    # caffe time --model /workspace/models/bvlc_alexnet/deploy.prototxt
    I0523 18:29:06.942697  7795 net.cpp:161] Just Created Layer data (0)
    I0523 18:29:06.942711  7795 net.cpp:501] data -> data
    I0523 18:29:06.944180  7795 net.cpp:216] Setting up data
  7. Save your container to your private DGX registry (see Example 2: Customizing a Container using Dockerfile for an example).

9. Troubleshooting

For more information about NVIDIA-Docker containers, visit the GitHub site: NVIDIA-Docker GitHub and also the NVIDIA-Docker blog: NVIDIA-Docker blog.

For deep learning frameworks release notes and additional product documentation, see the Deep Learning Documentation website: Release Notes for Deep Learning Frameworks.





NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.


NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, Jetson, Kepler, NVIDIA Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.