



NVIDIA Docker images come prepackaged, tuned, and ready to run; however, you may want to build a new image from scratch or augment an existing image with custom code, libraries, data, or settings for your corporate infrastructure. This section will guide you through exercises that will highlight how to create a container from scratch, customize a container, extend a deep learning framework to add features, develop some code using that extended framework from the developer environment, then package that code as a versioned release.

By default, you do not need to build a container. The NGC container registry, nvcr.io , has a number of containers that can be used immediately. These include containers for deep learning, scientific computing and visualization, as well as containers with just the CUDA Toolkit.

One of the great things about containers is that they can be used as starting points for creating new containers. This can be referred to as “customizing” or “extending” a container. You can create a container completely from scratch, however, since these containers are likely to run on a GPU system, it is recommended that you are least start with a nvcr.io container that contains the OS and CUDA. However, you are not limited to this and can create a container that runs on the CPUs in the system which does not use the GPUs. In this case, you can start with a bare OS container from Docker. However, to make development easier, you can still start with a container with CUDA - it is just not used when the container is used.

In the case of DGX systems, you can push or save your modified/extended containers to the NGC container registry, nvcr.io . They can also be shared with other users of the DGX system but this requires some administrator help.

It is important to note that all deep learning framework images include the source to build the framework itself as well as all of the prerequisites.

Attention: Do not install an NVIDIA driver into the Docker image at Docker build time.

NVIDIA provides a large set of images in the NGC container registry that are already tested, tuned, and are ready to run. You can pull any one of these images to create a container and add software or data of your choosing.

A best-practice is to avoid docker commit usage for developing new docker images, and to use Dockerfiles instead. The Dockerfile method provides visibility and capability to efficiently version-control changes made during development of a docker image. The docker commit method is appropriate for short-lived, disposable images only (see Example 3: Customizing A Container Using docker commit for an example).

For more information on writing a Docker file, see the best practices documentation.



You can customize a container to fit your specific needs for numerous reasons; for example, you depend upon specific software that is not included in the container that NVIDIA provides. No matter your reasons, you can customize a container.

The container images do not contain sample data-sets or sample model definitions unless they are included with the framework source. Be sure to check the container for sample data-sets or models.

About this task

Docker uses Dockerfiles to create or build a Docker image. Dockerfiles are scripts that contain commands that Docker uses successively to create a new Docker image. Simply put, a Dockerfile is the source code for the container image. Dockerfiles always start with a base image to inherit from even if you are just using a base OS.

For best practices on writing Dockerfiles, see Best practices for writing Dockerfiles.

As an example, let’s create a container from a Dockerfile that uses Ubuntu 20.04 as a base OS. Let’s also update the OS when we create our container.



Procedure

Create a working directory on your local hard-drive. In that directory, open a text editor and create a file called Dockerfile . Save the file to your working directory. Open your Dockerfile and include the following: Copy Copied! FROM ubuntu:20.04 RUN apt-get update && apt-get install -y curl CMD echo "hello from inside a container" Where the last line CMD , executes the indicated command when creating the container. This is a way to check that the container was built correctly. In this example, we are also pulling the container from the Docker repository and not the NGC repository. There will be subsequent examples using the NVIDIA® repository. Save and close your Dockerfile . Build the image. Issue the following command to build the image and create a tag. Copy Copied! $ docker build -t <new_image_name>:<new_tag> . Note: This command was issued in the same directory where the Dockerfile is located. The output from the docker build process lists "Steps"; one for each line in the Dockerfile . For example, let's name the container test1 and tag it with latest . Also, for illustrative purposes, let's assume our private DGX system repository is called nvidian_sas (the exact name depends upon how you registered the DGX. This is typically the company name in some fashion.) The command below builds the container. Some of the output is shown below so you know what to expect. Copy Copied! $ docker build -t test1:latest . Sending build context to Docker daemon 8.012 kB Step 1/3 : FROM ubuntu:20.04 14.04: Pulling from library/ubuntu ... Step 2/3 : RUN apt-get update && apt-get install -y curl ... Step 3/3 : CMD echo "hello from inside a container" ---> Running in 1f391b9285d8 ---> 934785072daf Removing intermediate container 1f391b9285d8 Successfully built 934785072daf For information about building your image, see docker build. For information about tagging your image, see docker tag. Verify that the build was successful. You should see a message similar to the following: Copy Copied! Successfully built 934785072daf This message indicates that the build was successful. Any other message and the build was not successful. Note: The number, 934785072daf , is assigned when the image is built and is random. Confirm you can view your image. Issue the following command to view your container. Copy Copied! $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE test1 latest 934785072daf 19 minutes ago 222 MB The new container is now available to be used. Note: The container is local to this DGX system. If you want to store the container in your private repository, follow the next step. Note: You need to have a DGX system to do this. Store the container in your private Docker repository by pushing it. The first step in pushing it, is to tag it. Copy Copied! $ docker tag test1 nvcr.io/nvidian_sas/test1:latest Now that the image has been tagged, you can push it, for example, to a private project on nvcr.io named nvidian_sas . Copy Copied! $ docker push nvcr.io/nvidian_sas/test1:latest The push refers to a repository [nvcr.io/nvidian_sas/test1] … Verify that the container appears in the nvidian_sas repository.

About this task

This example uses a Dockerfile to customize the PyTorch container in nvcr.io . Before customizing the container, you should ensure the PyTorch 21.02 container has been loaded into the registry using the docker pull command before proceeding.

Copy Copied! $ docker pull nvcr.io/nvidia/pytorch:21.02-py3

As mentioned earlier in this document, the Docker containers on nvcr.io also provide a sample Dockerfile that explains how to patch a framework and rebuild the Docker image. In the directory /workspace/docker-examples , there are two sample Dockerfiles. For this example, we will use the Dockerfile.customcaffe file as a template for customizing a container.



Procedure

Create a working directory called my_docker_images on your local hard drive. Open a text editor and create a file called Dockerfile . Save the file to your working directory. Open your Dockerfile again and include the following lines in the file: Copy Copied! FROM nvcr.io/nvidia/pytorch:21.02 # APPLY CUSTOMER PATCHES TO PYTORCH # Bring in changes from outside container to /tmp # (assumes my-pytorch-modifications.patch is in same directory as Dockerfile) #COPY my-pytorch-modifications.patch /tmp # Change working directory to PyTorch source path WORKDIR /opt/pytorch # Apply modifications #RUN patch -p1 < /tmp/my-pytorch-modifications.patch # Note that the default workspace for caffe is /workspace RUN mkdir build && cd build && \ cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local -DUSE_NCCL=ON -DUSE_CUDNN=ON -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="35 52 60 61" -DCUDA_ARCH_PTX="61" .. && \ make -j"$(nproc)" install && \ make clean && \ cd .. && rm -rf build # Reset default working directory WORKDIR /workspace Save the file. Build the image using the docker build command and specify the repository name and tag. In the following example, the repository name is corp/pytorch and the tag is 21.02.1PlusChanges .. For this case, the command would be the following: Copy Copied! $ docker build -t corp/pytorch:21.02.1PlusChanges . Run the Docker image. Copy Copied! docker run --gpus all -ti --rm corp/pytorch:21.02.1PlusChanges .

About this task

This example uses the docker commit command to flush the current state of the container to a Docker image. This is not a recommended best practice, however, this is useful when you have a container running to which you have made changes and want to save them. In this example, we are using the apt-get tag to install packages which requires that the user run as root.

Note: The NVCaffe image release 17.04 is used in the example instructions for illustrative purposes.

Do not use the --rm flag when running the container. If you use the --rm flag when running the container, your changes will be lost when exiting the container.

Procedure

Pull the Docker container from the nvcr.io repository to the DGX system. For example, the following command will pull the NVCaffe container: Copy Copied! $ docker pull nvcr.io/nvidia/caffe:17.04 Run the container on the DGX system. Copy Copied! docker run --gpus all -ti nvcr.io/nvidia/caffe:17.04 Copy Copied! ================== == NVIDIA Caffe == ================== NVIDIA Release 17.04 (build 26740) Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved. Copyright (c) 2014, 2015, The Regents of the University of California (Regents) All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for NVIDIA Caffe. NVIDIA recommends the use of the following flags: docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@1fe228556a97:/workspace# You should now be the root user in the container (notice the prompt). You can use the command apt to pull down a package and put it in the container. Note: The NVIDIA containers are built using Ubuntu which uses the apt-get package manager. Check the container release notes Deep Learning Documentation for details on the specific container you are using. In this example, we will install Octave; the GNU clone of MATLAB, into the container. Copy Copied! # apt-get update # apt install octave Note: You have to first issue apt-get update before you install Octave using apt . Exit the workspace. Copy Copied! # exit Display the list of containers using docker ps -a . As an example, here is a snippet of output from the docker ps -a command: Copy Copied! $ docker ps -a CONTAINER ID IMAGE CREATED ... 1fe228556a97 nvcr.io/nvidia/caffe:17.04 3 minutes ago ... Now you can create a new image from the container that is running where you have installed Octave. You can commit the container with the following command. Copy Copied! $ docker commit 1fe228556a97 nvcr.io/nvidian_sas/caffe_octave:17.04 sha256:0248470f46e22af7e6cd90b65fdee6b4c6362d08779a0bc84f45de53a6ce9294 Display the list of images. Copy Copied! $ docker images REPOSITORY TAG IMAGE ID ... nvidian_sas/caffe_octave 17.04 75211f8ec225 ... To verify, let's run the container again and see if Octave is actually there. Note: This only works for the DGX-1 and the DGX Station. Copy Copied! docker run --gpus all -ti nvidian_sas/caffe_octave:17.04 Copy Copied! ================== == NVIDIA Caffe == ================== NVIDIA Release 17.04 (build 26740) Container image Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved. Copyright (c) 2014, 2015, The Regents of the University of California (Regents) All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for NVIDIA Caffe. NVIDIA recommends the use of the following flags: docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ... root@2fc3608ad9d8:/workspace# octave octave: X11 DISPLAY environment variable not set octave: disabling GUI features GNU Octave, version 4.0.0 Copyright (C) 2015 John W. Eaton and others. This is free software; see the source code for copying conditions. There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. For details, type 'warranty'. Octave was configured for "x86_64-pc-linux-gnu". Additional information about Octave is available at http://www.octave.org. Please contribute if you find this software useful. For more information, visit http://www.octave.org/get-involved.html Read http://www.octave.org/bugs.html to learn how to submit bug reports. For information about changes from previous versions, type 'news'. octave:1> Since the Octave prompt displayed, Octave is installed. If you want to save the container into your private repository (Docker uses the phrase “push”), then you can use the command docker push ... . Copy Copied! $ docker push nvcr.io/nvidian_sas/caffe_octave:17.04

Results

The new Docker image is now available for use. You can check your local Docker repository for it.

About this task

There are two primary use cases for a developer to extend a container:

Create a development image that contains all of the immutable dependencies for the project, but not the source code itself. Create a production or testing image that contains a fixed version of the source and all of the software dependencies.

The datasets are not packaged in the container image. Ideally, the container image is designed to expect volume mounts for datasets and results.

In these examples, we mount our local dataset from /raid/datasets on our host to /dataset as a read-only volume inside the container. We also mount a job specific directory to capture the output from a current run.

In these examples, we will create a timestamped output directory on each container launch and map that into the container at /output . Using this method, the output for each successive container launch is captured and isolated.

Including the source into a container for developing and iterating on a model has many challenges that can over complicate the entire workflow. For instance, if your source code is in the container, then your editor, version control software, dotfiles, etc. also need to be in the container.

However, if you create a development image that contains everything you need to run your source code, you can map your source code into the container to make use of your host workstation’s developer environment. For sharing a fixed version of a model, it is best to package a versioned copy of the source code and trained weights with the development environment.

As an example, we will work through a development and delivery example for the open source implementation of the work found in Image-to-Image Translation with Conditional Adversarial Networks by Isola et. al. and is available at pix2pix. Pix2Pix is a Torch implementation for learning a mapping from input images to output images using a Conditional Adversarial Network. Since online projects can change over time, we will focus our attention on the snapshot version d7e7b8b557229e75140cbe42b7f5dbf85a67d097 change-set.

In this section, we are using the container as a virtual environment, in that the container has all the programs and libraries needed for our project.

Note: We have kept the network definition and training script separate from the container image. This is a useful model for iterative development because the files that are actively being worked on are persistent on the host and only mapped into the container at runtime.

The differences to the original project can be found here Comparing changes.

If the machine you are developing on is not the same machine on which you will be running long training sessions, then you may want to package your current development state in the container.



Procedure

Create a working directory on your local hard-drive. Copy Copied! mkdir Projects $ cd ~/Projects Git clone the Pix2Pix Git repository. Copy Copied! $ git clone https://github.com/phillipi/pix2pix.git $ cd pix2pix Run the git checkout command. Copy Copied! $ git checkout -b devel d7e7b8b557229e75140cbe42b7f5dbf85a67d097 Download the dataset. Copy Copied! bash ./datasets/download_dataset.sh facades I want to put the dataset on my fast /raid storage. $ mkdir -p /raid/datasets $ mv ./datasets/facades /raid/datasets Create a file called Dockerfile and add the following lines: Copy Copied! FROM nvcr.io/nvidia/torch:17.03 RUN luarocks install nngraph RUN luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec WORKDIR /source Build the development Docker container image ( build-devel.sh ). Copy Copied! docker build -t nv/pix2pix-torch:devel . Create the following train.sh script: Copy Copied! #!/bin/bash -x ROOT="${ROOT:-/source}" DATASET="${DATASET:-facades}" DATA_ROOT="${DATA_ROOT:-/datasets/$DATASET}" DATA_ROOT=$DATA_ROOT name="${DATASET}_generation" which_direction=BtoA th train.lua If you were actually developing this model, you would be iterating by making changes to the files on the host and running the training script which executes inside the container. Optional: Edit the files and execute the next step after each change. Run the training script ( run-devel.sh ). Copy Copied! docker run --gpus all --rm -ti -v $PWD:/source -v /raid/datasets:/datasets nv/pix2pix-torch:devel ./train.sh

About this task

Packaging the model definition and script into the container is very simple. We simply add a COPY step to the Dockerfile.

We’ve updated the run script to simply drop the volume mounting and use the source packaged in the container. The packaged container is now much more portable than our devel container image because the internal code is fixed. It would be good practice to version control this container image with a specific tag and store it in a container registry.

The updates to run the container are equally subtle. We simply drop the volume mounting of our local source into the container.

Each Docker image contains the code required to build the framework so that you can make changes to the framework itself. The location of the framework source in each image is in the /workspace directory.

For specific directory locations, see the Deep Learning Framework Release Notes for your specific framework.



Customizing a framework is useful if you have patches or modifications you want to make to the framework outside of the NVIDIA repository or if you have a special patch that you want to add to the framework.

About this task

This Dockerfile example illustrates a method to apply patches to the source code in the NVCaffe container image and to rebuild NVCaffe. The RUN command included below will rebuild NVCaffe in the same way as it was built in the original image.

By applying customizations through a Dockerfile and docker build in this manner rather than modifying the container interactively, it will be straightforward to apply the same changes to later versions of the NVCaffe container image.

For more information, see Dockerfile reference.



Procedure

Create a working directory for the Dockerfile. Copy Copied! $ mkdir docker $ cd docker Open a text editor and create a file called Dockerfile and add the following lines: Copy Copied! FROM nvcr.io/nvidia/caffe:17.04 RUN apt-get update && apt-get install bc Bring in changes from outside the container to /tmp . Note: This assumes my-caffe-modifications.patch is in same directory as Dockerfile. Copy Copied! COPY my-caffe-modifications.patch /tmp Change your working directory to the NVCaffe source path. Copy Copied! WORKDIR /opt/caffe Apply your modifications. Copy Copied! RUN patch -p1 < /tmp/my-caffe-modifications.patch Rebuild NVCaffe. Copy Copied! RUN mkdir build && cd build && \ cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local -DUSE_NCCL=ON -DUSE_CUDNN=ON \ -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="35 52 60 61" -DCUDA_ARCH_PTX="61" .. && \ make -j"$(nproc)" install && \ make clean && \ cd .. && rm -rf build Reset the default working directory. Copy Copied! WORKDIR /workspace

About this task

This example illustrates how you can customize a framework and rebuild the container. For this example, we will use the NVCaffe 17.03 framework.

Currently, the NVCaffe framework returns the following output message to stdout when a network layer is created:

Copy Copied! “Creating Layer”

For example, you can see this output by running the following command from a bash shell in a NVCaffe 17.03 container.

Copy Copied! # which caffe /usr/local/bin/caffe # caffe time --model /workspace/models/bvlc_alexnet/deploy.prototxt --gpu=0 … I0523 17:57:25.603410 41 net.cpp:161] Created Layer data (0) I0523 17:57:25.603426 41 net.cpp:501] data -> data I0523 17:57:25.604748 41 net.cpp:216] Setting up data …

The following steps show you how to change the message “Created Layer” in NVCaffe to “Just Created Layer” . This example illustrates how you might modify an existing framework.



Before you begin

Ensure you run the framework container in interactive mode.

Procedure

Locate the NVCaffe 17.03 container from the nvcr.io repository. Copy Copied! $ docker pull nvcr.io/nvidia/caffe:17.03 Run the container on the DGX system. Copy Copied! docker run --gpus all --rm -ti nvcr.io/nvidia/caffe:17.03 Note: This will make you the root user in the container. Notice the change in the prompt. Edit a file in the NVCaffe source file, /opt/caffe/src/caffe/net.cpp . The line you want to change is around line 162 . Copy Copied! # vi /opt/caffe/src/caffe/net.cpp :162 s/Created Layer/Just Created Layer Note: This uses vi. Change “Created Layer” to “Just Created Layer” . Rebuild NVCaffe. Copy Copied! # cd /opt/caffe # cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local -DUSE_NCCL=ON -DUSE_CUDNN=ON -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="35 52 60 61" -DCUDA_ARCH_PTX="61" .. # make -j"$(proc)" install # make install # ldconfig Before running the updated NVCaffe framework, ensure the updated NVCaffe binary is in the correct location, for example, /usr/local/ . Copy Copied! # which caffe /usr/local/bin/caffe Run NVCaffe and look for a change in the output to stdout : Copy Copied! # caffe time --model /workspace/models/bvlc_alexnet/deploy.prototxt --gpu=0 /usr/local/bin/caffe … I0523 18:29:06.942697 7795 net.cpp:161] Just Created Layer data (0) I0523 18:29:06.942711 7795 net.cpp:501] data -> data I0523 18:29:06.944180 7795 net.cpp:216] Setting up data ... Save your container to your private DGX repository on nvcr.io or your private Docker repository (see Example 2: Customizing A Container Using Dockerfile for an example).

The Docker container format using layers was specifically designed to limit the amount of data that would need to be transferred when a container image is instantiated. When a Docker container image is instantiated or “pulled” from a repository, Docker may need to copy the layers from the repository to the local host. It checks what layers it already has on the host using the hash for each layer. If it already has it on the local host, it won’t ”re-download” it saving time, and to a smaller degree, network usage.

This is particularly useful for NVIDIA’s NGC because all the containers are built with the same base OS and libraries. If you run one container image from NGC, then run another, it is likely that many of the layers from the first container are used in the second container, reducing the time to pull down the second container image so the container can be started quickly.

You can put almost anything you want into a container allowing users or container developers to create very large (GB+) containers. Even though it is not recommended to put data in your Docker container image, users and developers do this (there are some good reasons). This can further inflate the size of the container image. This increases the amount of time to download a container image or it’s various layers. Users and developers are now asking for ways to reduce the size of the container image or the individual layers.

The following subsections present some options that you can use if the container image or the layer sizes are too large or you want them smaller. There is no single option that works best, so be sure to try them on your container images.



In a Dockerfile, using one line for each RUN command is very convenient. The code is easy to read since you can see each command. However, Docker will create a layer for each command. Each layer keeps some information (metadata) about its origins, when the layer was created, what is contained in the layer, and a hash for each layer. If you have a large number of commands, you are going to have a large amount of metadata.

A simple way to reduce the size of the container image is to put all of the RUN commands that you can into a single RUN statement. This may result in a very large RUN command, however, it greatly reduces the amount of metadata. It is recommended that you group as many RUN commands together as possible. Depending upon your Dockerfile, you may not be able to put all RUN commands into a single RUN statement. Do your best to reduce the number of RUN commands but make it logical.

Below is a simple Dockerfile example used to build a container image.

Copy Copied! $ cat Dockerfile FROM ubuntu:20.04 RUN date > /build-info.txt RUN uname -r >> /build-info.txt Notice there are two RUN commands in this simple Dockerfile. The container image can be built using the following command and associated output. $ docker build -t first-image -f Dockerfile . … Step 2/3 : RUN date > /build-info.txt ---> Using cache ---> af12c4b34f91 Step 3/3 : RUN uname -r >> /build-info.txt ---> Running in 0f883f37e3c8 …

Notice that the RUN commands each created a layer in the container image.

Let’s examine the container image for details on the layers.

Copy Copied! $ docker run --rm -it first-image cat /build-info.txt Mon Jan 18 10:14:02 UTC 2021 5.5.115-1.el7.elrepo.x86_64 $ docker history first-image IMAGE CREATED CREATED BY SIZE d2c03aa61290 11 seconds ago /bin/sh -c uname -r >> /build-info.txt 57B af12c4b34f91 16 minutes ago /bin/sh -c date > /build-info.txt 29B 5e8b97a2a082 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B <missing> 6 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B <missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$… 2.76kB <missing> 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0B <missing> 6 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B <missing> 6 weeks ago /bin/sh -c #(nop) ADD file:d37ff24540ea7700d… 114MB

The output of this command gives you information about each of the layers. Notice that there is a layer for each RUN command.

Now, let’s take the Dockerfile and combine the two RUN commands.

Copy Copied! $ cat Dockerfile FROM ubuntu:20.04 RUN date > /build-info.txt && uname -r >> /build-info.txt $ docker build -t one-layer -f Dockerfile . $ docker history one-layer IMAGE CREATED CREATED BY SIZE 3b1ef5bc19b2 6 seconds ago /bin/sh -c date > /build-info.txt && uname -… 57B 5e8b97a2a082 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B <missing> 6 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B <missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$… 2.76kB <missing> 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0B <missing> 6 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B <missing> 6 weeks ago /bin/sh -c #(nop) ADD file:d37ff24540ea7700d… 114MB

Notice that there is now only one layer that has both RUN commands included.

Another good reason to combine RUN commands is that if you have multiple layers, it’s easy to modify one layer in the container image without having to modify the entire container image.

If space is at a premium, there is a way to take the existing container image, and get rid of all the history. It can only be done using a running container. Once the container is running, run the following two commands:

Copy Copied! # export the container to a tarball docker export <CONTAINER ID> > /home/export.tar # import it back cat /home/export.tar | docker import - some-name:<tag>

This will get rid of the history of each layer but it will preserve the layers (if that is important).

Another option is to “flatten” your image to a single layer. This gets rid of all the redundancies in the layers and creates a single container. Like the previous technique, this one requires a running container as well. With the container running, issue the following command:

Copy Copied! docker export <CONTAINER ID> | docker import - some-image-name:<tag>

This pipeline exports the container through the import command creating a new container that is only one layer. For more information, see this blog post.

A few years ago before Docker, adding the ability to “squash” images via a tool called docker-squash was created. It hasn’t been updated for a couple of years, however, it is still a popular tool for reducing the size of Docker container images. The tool takes a Docker container image and “squashes” it to a single layer, reducing commonalities between layers and history of the layers producing the smallest possible container image.

The tool retains Docker commands such as PORT , ENV , etc. the squashed images work exactly the same as before they were squashed. Moreover, the files that are deleted during the squashing process are actually removed from the image.

A simple example for running docker-squash is below.

Copy Copied! docker save <ID> | docker-squash -t <TAG> [-from <ID>] | docker load

This pipeline takes the current image, saves it, squashes it with a new tag, and reloads the container. The resulting image has all the layers beneath the initial FROM layer squashed into a single layer. The default options in docker-squash retains the base image layer so that it does not need to be repeatedly transferred when pushing and pulling updates to the image.

The tool is really designed for containers that are finalized and not likely to be updated. Consequently, there is little need for details about the layers and history. It can then be squashed and put into production. Having the smallest size image will allow users to quickly download the image and get it running because it’s almost as small as possible.

Not long after Docker came out, people started creating giant images that took a long time to transfer. At that point, users and developers started working on ideas to reduce the container size. Not too long ago, some patches were proposed for Docker to allow it to squash images as they were being built. The squash option was added in Docker 1.13 (API 1.25), when Docker still followed a different versioning scheme. As of Docker 17.06‑ce the option is still classified as experimental. You can tell Docker to allow the use of experimental options if you want (refer to Docker documentation). However, NVIDIA does not support this option.

The --squash option is used when the container is built. An example of the command is the following:

Copy Copied! docker build --squash -t chamilad/testdocker:0.1 .

This command uses “Dockerfile” as the dockerfile for building the container.

The --squash option creates an image that has two layers. The first layer results from the FROM that usually starts off a Dockerfile. The subsequent layers are all “squashed” together into a single layer. This gets rid of the history in all the layers but the first one. It also eliminates redundant files.

Since it is still an experimental feature, the amount you can squeeze the image varies. There have been reports of a 50% reduction in image size.

There are some other options that be used to reduce the size of images, but they are not particularly Docker based (although there are a couple). The rest are classic Linux commands.

There is a Docker build option that deals with building applications in Docker containers. If you want to build an application when the container is created, you may not want to leave the building tools in the image because of its size. This is true when the container is supposed to be executed and not modified when it is run. Recall that Docker containers are built in layers. We can use that fact when building containers to copy binaries from one layer to another.

For example, the Docker file below:

Copy Copied! $ cat Dockerfile FROM ubuntu:20.04 RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ build-essential \ gcc && \ rm -rf /var/lib/apt/lists/* COPY hello.c /tmp/hello.c RUN gcc -o /tmp/hello /tmp/hello.c

Builds a container, installs gcc , and builds a simple “hello world” application. Checking the history of the container will give us the size of the layers:

Copy Copied! $ docker history hello IMAGE CREATED CREATED BY SIZE 49fef0e11806 8 minutes ago /bin/sh -c gcc -o /tmp/hello /tmp/hello.c 8.6kB 44a449445055 8 minutes ago /bin/sh -c #(nop) COPY file:8f0c1776b2571c38… 63B c2e5b659a549 8 minutes ago /bin/sh -c apt-get update -y && apt-get … 181MB 5e8b97a2a082 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B <missing> 6 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B <missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$… 2.76kB <missing> 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0B <missing> 6 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B <missing> 6 weeks ago /bin/sh -c #(nop) ADD file:d37ff24540ea7700d… 114MB

Notice that the layer with the build tools is 181MB in size, yet the application layer is only 8.6kB in size. If the build tools aren’t needed in the final container, then we can get rid of it from the image. However, if you simply do a apt-get remove … command, the build tools are not actually erased.

A solution is to copy the binary from the previous layer to a new layer as in this Dockerfile:

Copy Copied! $ cat Dockerfile FROM ubuntu:16.04 AS build RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ build-essential \ gcc && \ rm -rf /var/lib/apt/lists/* COPY hello.c /tmp/hello.c RUN gcc -o /tmp/hello /tmp/hello.c FROM ubuntu:16.04 COPY --from=build /tmp/hello /tmp/hello

This can be termed a “multi-stage” build. In this Dockerfile, the first stage starts with the OS and names it “build”. Then the build tools are installed, the source is copied into the container, and the binary is built.

The next layer starts with a fresh OS FROM command (referred to as a “first stage”). Docker will only save the layers starting with this one and any subsequent layers (in other words, the first layers that installed the build tools won’t be saved) or the “second stage”. The second stage can copy the binary from the first stage. No build tools are included in this stage. Building the container image is the same as before.

If we compare the size of the container with the first Dockerfile to the size using the second Dockerfile, we can see the following:

Copy Copied! $ docker images hello REPOSITORY TAG IMAGE ID CREATED SIZE hello latest 49fef0e11806 21 minutes ago 295MB $ docker images hello-rt REPOSITORY TAG IMAGE ID CREATED SIZE hello-rt latest f0cef59a05dd 2 minutes ago 114MB

The first output is the original Dockerfile. The second output is for the multistage Dockerfile. Notice the difference in size between the two.

An option to reduce the size of the Docker container is to start with a small base image. Usually, the base images for a distribution are fairly lean, but it might be a good idea to see what is installed in the image. If there are things that aren’t needed, you can then try creating your own base image that removes the unneeded tools.