Step #5: Modify Image with Dockerfile
After inspecting the PyTorch container, we will now create a modified version of the base image with custom applications and scripts included so we don’t need to install them manually whenever we launch a container. This involves creating a Dockerfile which allows us to specify various commands and settings we want to invoke while building a new image.
To create a Dockerfile, open a new file named “Dockerfile” (make sure it uses a capital “D” and no extension) in the local directory using a text editor. For more information on the Dockerfile syntax and usage, reference the official documentation from Docker.
First, we need to specify the base image that will be used. The base image provides us a starting point for our custom image that we can build upon. Since we will use the PyTorch image we inspected in the previous section as our base, our custom image will look identical to the PyTorch image for the first step. To specify our base image in the Dockerfile, we will add the following on the first line:
FROM nvcr.io/nvidia/pytorch:22.03-py3
Note that Docker will first try to use a local version of the listed image if available, followed by attempting to pull the image from the specified container registry if not available. If we were to build the image without making any additional changes to the Dockerfile, our image will be identical to the PyTorch image from NGC as we haven’t made any changes.
To make the custom image more useful, let’s clone a repo that we can use to run some exciting deep learning examples. The repo we will use is the DeepLearningExamples repo found on NVIDIA’s GitHub. We will use this repo to run an example image classification application using PyTorch on GPUs.
Dockerfile has a “RUN” instruction which tells Docker to run the indicated command while at that step in the file. This allows us to run steps inside the container that might be necessary for later instructions or for installing packages inside the container for the user. For example, if we wanted to install htop similar to when we inspected the container in the previous section, we would write the following in our Dockerfile. Note that these were written for the same “RUN” instruction. Given Docker creates a new image layer every time a command is provided, we want to reduce the overall number of unique Docker commands in the Dockerfile to keep our image small in size.
RUN apt update && \
apt install -y htop
For our purposes, we don’t need the htop package and can skip including the line above in our Dockerfile. Instead, we will want to clone the DeepLearningExamples repository using git (which has already been installed in the base image) at a specific commit hash. The commit hash shown below was found by navigating to https://github.com/nvidia/deeplearningexamples and selecting the latest commit hash at the time of writing. Considering images can be built at any time and reference code and packages may be updated at any point, it is recommended to use specific versions or tags of code and applications when possible to avoid unexpected breaking changes down the road.
To clone the repository in the Dockerfile, create an empty line after the “FROM” line above and on the next line, add the following command:
RUN git clone https://github.com/nvidia/deeplearningexamples && \
cd deeplearningexamples && \
git checkout f3dbf8a69522d69c63c4508769bd8137658786a1
Your Dockerfile should now look like this:
FROM nvcr.io/nvidia/pytorch:22.03-py3
RUN git clone https://github.com/nvidia/deeplearningexamples && \
cd deeplearningexamples && \
git checkout f3dbf8a69522d69c63c4508769bd8137658786a1
Another useful instruction is the “WORKDIR” command. This allows us to specify the working directory for all subsequent commands and for when the container is launched. Note that the WORKDIR can be updated multiple times in a Dockerfile as necessary. The final listed WORKDIR will be the directory the container opens on launch.
We will update the WORKDIR command to specify that we want a specific directory in the DeepLearningExamples repository to be our default directory for future commands as well as at container runtime. After putting another blank line after the “git checkout” in the previous command, add the following to your Dockerfile:
WORKDIR /workspace/deeplearningexamples/PyTorch/Classification/ConvNets
Next, let’s install some packages necessary for running the example. Unlike when we ran the container previously, these packages we install during the build process will be included with our custom image and we will not need to install them again while running a container based on the custom image.
Add another empty newline after the WORKDIR instruction and create another RUN instruction to install the application’s dependencies using Python’s package manager, pip.
RUN pip install -r requirements.txt nvidia-imageinary==1.1.3
With the dependencies installed, we are now finished with our custom Dockerfile which should look like the following:
FROM nvcr.io/nvidia/pytorch:22.03-py3
RUN git clone https://github.com/nvidia/deeplearningexamples && \
cd deeplearningexamples && \
git checkout f3dbf8a69522d69c63c4508769bd8137658786a1
WORKDIR /workspace/deeplearningexamples/PyTorch/Classification/ConvNets
RUN pip install -r requirements.txt nvidia-imageinary==1.1.3
Now that our Dockerfile is complete, we can build the image which will run through the steps that we specified in the file and save a copy of the image to your local workstation. To do so, run the following command which will build a new image named “nvcr.io/nv-launchpad-orgname/sample-image” and the tag “1.0”. The full image name is specified after the “-t” flag and is everything before the first colon with the tag being everything specified after the first colon. Note that your organization name (nv-launchpad-orgname in this case) will likely be different and should be updated to reflect the provided orgname accessible from your account. Otherwise, you are free to change the image name (“sample-image” in this case) and tag as desired (don’t forget the “.” at the end of the command).
$ docker build -t nvcr.io/nv-launchpad-orgname/sample-image:1.0 .
If you used the same Dockerfile as shown above, this will generate text similar to the following:
Sending build context to Docker daemon 2.048kB
Step 1/4 : FROM nvcr.io/nvidia/pytorch:22.03-py3
---> 4730bc516b92
Step 2/4 : RUN git clone https://github.com/nvidia/deeplearningexamples && cd deeplearningexamples && git checkout f3dbf8a69522d69c63c4508769bd8137658786a1
---> Running in 326d5fb91a89
Cloning into 'deeplearningexamples'...
Note: switching to 'f3dbf8a69522d69c63c4508769bd8137658786a1'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at f3dbf8a6 [BERT/PyT] add JIT autocast
Removing intermediate container 326d5fb91a89
---> 08e1c8dd26f8
Step 3/4 : WORKDIR /workspace/deeplearningexamples/PyTorch/Classification/ConvNets
---> Running in 63a3c614dfb1
Removing intermediate container 63a3c614dfb1
---> a0cc6bb5202e
Step 4/4 : RUN pip install -r requirements.txt nvidia-imageinary==1.1.3
---> Running in 801287fbc5df
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting dllogger
Cloning https://github.com/NVIDIA/dllogger (to revision v1.0.0) to /tmp/pip-install-17k4pxj6/dllogger_ae526c1927be46a8b397f50ec3a3fe85
Running command git clone -q https://github.com/NVIDIA/dllogger /tmp/pip-install-17k4pxj6/dllogger_ae526c1927be46a8b397f50ec3a3fe85
Resolved https://github.com/NVIDIA/dllogger to commit 89913fd227b720a3026550b904cdca0d49d82100
Collecting nvidia-imageinary==1.1.3
Downloading https://developer.download.nvidia.com/compute/redist/nvidia-imageinary/nvidia_imageinary-1.1.3-py3-none-any.whl (13 kB)
Collecting pynvml==11.0.0
Downloading pynvml-11.0.0-py3-none-any.whl (46 kB)
Requirement already satisfied: Pillow>=7.1.2 in /opt/conda/lib/python3.8/site-packages (from nvidia-imageinary==1.1.3) (9.0.0)
Requirement already satisfied: numpy>=1.18.0 in /opt/conda/lib/python3.8/site-packages (from nvidia-imageinary==1.1.3) (1.22.3)
Building wheels for collected packages: dllogger
Building wheel for dllogger (setup.py): started
Building wheel for dllogger (setup.py): finished with status 'done'
Created wheel for dllogger: filename=DLLogger-1.0.0-py3-none-any.whl size=5670 sha256=1f358bd0e559e49885ac67146957c1f343e24c400f5142fde7bcffef824dfaaa
Stored in directory: /tmp/pip-ephem-wheel-cache-0h6p75zr/wheels/32/ff/4a/1d61bdc575b373a327658f1de2513a0af81094c50c9c56fa8b
Successfully built dllogger
Installing collected packages: pynvml, nvidia-imageinary, dllogger
Attempting uninstall: pynvml
Found existing installation: pynvml 11.4.1
Uninstalling pynvml-11.4.1:
Successfully uninstalled pynvml-11.4.1
Successfully installed dllogger-1.0.0 nvidia-imageinary-1.1.3 pynvml-11.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container 801287fbc5df
---> b98e9cad2b60
Successfully built b98e9cad2b60
Successfully tagged nvcr.io/nv-launchpad-orgname/sample-image:1.0
Now that we have a modified image built locally, we can push it to NGC so we can use it on other machines and collaborate with teammates. Assuming you are logged into NGC locally by following the steps above, run this command, updating the image name and tag as necessary:
$ docker push nvcr.io/nv-launchpad-orgname/sample-image:1.0
While the image is being pushed, you will see output similar to the following:
The push refers to repository [nvcr.io/nv-launchpad-orgname/sample-image]
c170ca729e6b: Pushed
62f753b7286f: Pushed
9c7a2e08fe4c: Mounted from nvidia/pytorch
fe48bfeac91d: Mounted from nvidia/pytorch
6d24369f9726: Mounted from nvidia/pytorch
0f010943c2be: Mounted from nvidia/pytorch
80df8233699e: Mounted from nvidia/pytorch
4995463ed504: Mounted from nvidia/pytorch
c68289e5466a: Mounted from nvidia/pytorch
b92c6f3cf8ba: Mounted from nvidia/pytorch
ea54ed1c9d39: Mounted from nvidia/pytorch
e94fa5c9c518: Mounted from nvidia/pytorch
8456d4967bfe: Mounted from nvidia/pytorch
40f364efa84f: Mounted from nvidia/pytorch
65c14c7eaf47: Mounted from nvidia/pytorch
14e6ddddf256: Mounted from nvidia/pytorch
7821737d952f: Mounted from nvidia/pytorch
77a776e8014b: Mounted from nvidia/pytorch
7a7051e759c4: Mounted from nvidia/pytorch
e1aa1f9ee97e: Mounted from nvidia/pytorch
3b720402b8ab: Mounted from nvidia/pytorch
3b1792efdad9: Mounted from nvidia/pytorch
5f70bf18a086: Mounted from nvidia/pytorch
6ba71d233b75: Mounted from nvidia/pytorch
5342e89df8e3: Mounted from nvidia/pytorch
fc3209a87194: Mounted from nvidia/pytorch
1ee80d85e1cf: Mounted from nvidia/pytorch
489f24d7d381: Mounted from nvidia/pytorch
f7655918bfe6: Mounted from nvidia/pytorch
5ec341fc8fe7: Mounted from nvidia/pytorch
8fb729c89bb4: Mounted from nvidia/pytorch
852255d743c1: Mounted from nvidia/pytorch
abf81ae6f4c8: Mounted from nvidia/pytorch
f89ef356505e: Mounted from nvidia/pytorch
6fb2a344ac89: Mounted from nvidia/pytorch
850236713495: Mounted from nvidia/pytorch
b9dfd77f5b0a: Mounted from nvidia/pytorch
6a1014d46250: Mounted from nvidia/pytorch
85f49f4e6923: Mounted from nvidia/pytorch
2f175b794573: Mounted from nvidia/pytorch
899455397741: Mounted from nvidia/pytorch
2df8c0a32afe: Mounted from nvidia/pytorch
a060c5cefec7: Mounted from nvidia/pytorch
83cdade3c9b5: Mounted from nvidia/pytorch
fec6965e7a6b: Mounted from nvidia/pytorch
2ff0ade8d3c9: Mounted from nvidia/pytorch
01e996931197: Mounted from nvidia/pytorch
867d0767a47c: Mounted from nvidia/pytorch
1.0: digest: sha256:64e3f9abb33ac2b7287d8626d79ef9bff2d5126eadce40699a2651c7aee72ec9 size: 10642
Once the image has been fully pushed, it should now be available on NGC for usage on Base Command as well as allowing the image to be pulled down locally on different systems.