Step #5: Modify Image with Dockerfile
After inspecting the PyTorch container, we will now create a modified version of the base image with custom applications and scripts included so we don’t need to install them manually whenever we launch a container. This involves creating a Dockerfile which allows us to specify various commands and settings we want to invoke while building a new image.
To create a Dockerfile, open a new file named “Dockerfile” (make sure it uses a capital “D” and no extension) in the local directory using a text editor. For more information on the Dockerfile syntax and usage, reference the official documentation from Docker.
First, we need to specify the base image that will be used. The base image provides us a starting point for our custom image that we can build upon. Since we will use the PyTorch image we inspected in the previous section as our base, our custom image will look identical to the PyTorch image for the first step. To specify our base image in the Dockerfile, we will add the following on the first line:
Note that Docker will first try to use a local version of the listed image if available, followed by attempting to pull the image from the specified container registry if not available. If we were to build the image without making any additional changes to the Dockerfile, our image will be identical to the PyTorch image from NGC as we haven’t made any changes.
To make the custom image more useful, let’s clone a repo that we can use to run some exciting deep learning examples. The repo we will use is the DeepLearningExamples repo found on NVIDIA’s GitHub. We will use this repo to run an example image classification application using PyTorch on GPUs.
Dockerfile has a “RUN” instruction which tells Docker to run the indicated command while at that step in the file. This allows us to run steps inside the container that might be necessary for later instructions or for installing packages inside the container for the user. For example, if we wanted to install htop similar to when we inspected the container in the previous section, we would write the following in our Dockerfile. Note that these were written for the same “RUN” instruction. Given Docker creates a new image layer every time a command is provided, we want to reduce the overall number of unique Docker commands in the Dockerfile to keep our image small in size.
RUN apt update && \ apt install -y htop
For our purposes, we don’t need the htop package and can skip including the line above in our Dockerfile. Instead, we will want to clone the DeepLearningExamples repository using git (which has already been installed in the base image) at a specific commit hash. The commit hash shown below was found by navigating to https://github.com/nvidia/deeplearningexamples and selecting the latest commit hash at the time of writing. Considering images can be built at any time and reference code and packages may be updated at any point, it is recommended to use specific versions or tags of code and applications when possible to avoid unexpected breaking changes down the road.
To clone the repository in the Dockerfile, create an empty line after the “FROM” line above and on the next line, add the following command:
RUN git clone https://github.com/nvidia/deeplearningexamples && \ cd deeplearningexamples && \ git checkout f3dbf8a69522d69c63c4508769bd8137658786a1
Your Dockerfile should now look like this:
FROM nvcr.io/nvidia/pytorch:22.03-py3 RUN git clone https://github.com/nvidia/deeplearningexamples && \ cd deeplearningexamples && \ git checkout f3dbf8a69522d69c63c4508769bd8137658786a1
Another useful instruction is the “WORKDIR” command. This allows us to specify the working directory for all subsequent commands and for when the container is launched. Note that the WORKDIR can be updated multiple times in a Dockerfile as necessary. The final listed WORKDIR will be the directory the container opens on launch.
We will update the WORKDIR command to specify that we want a specific directory in the DeepLearningExamples repository to be our default directory for future commands as well as at container runtime. After putting another blank line after the “git checkout” in the previous command, add the following to your Dockerfile:
Next, let’s install some packages necessary for running the example. Unlike when we ran the container previously, these packages we install during the build process will be included with our custom image and we will not need to install them again while running a container based on the custom image.
Add another empty newline after the WORKDIR instruction and create another RUN instruction to install the application’s dependencies using Python’s package manager, pip.
RUN pip install -r requirements.txt nvidia-imageinary==1.1.3
With the dependencies installed, we are now finished with our custom Dockerfile which should look like the following:
FROM nvcr.io/nvidia/pytorch:22.03-py3 RUN git clone https://github.com/nvidia/deeplearningexamples && \ cd deeplearningexamples && \ git checkout f3dbf8a69522d69c63c4508769bd8137658786a1 WORKDIR /workspace/deeplearningexamples/PyTorch/Classification/ConvNets RUN pip install -r requirements.txt nvidia-imageinary==1.1.3
Now that our Dockerfile is complete, we can build the image which will run through the steps that we specified in the file and save a copy of the image to your local workstation. To do so, run the following command which will build a new image named “nvcr.io/nv-launchpad-orgname/sample-image” and the tag “1.0”. The full image name is specified after the “-t” flag and is everything before the first colon with the tag being everything specified after the first colon. Note that your organization name (nv-launchpad-orgname in this case) will likely be different and should be updated to reflect the provided orgname accessible from your account. Otherwise, you are free to change the image name (“sample-image” in this case) and tag as desired (don’t forget the “.” at the end of the command).
$ docker build -t nvcr.io/nv-launchpad-orgname/sample-image:1.0 .
If you used the same Dockerfile as shown above, this will generate text similar to the following:
Sending build context to Docker daemon 2.048kB Step 1/4 : FROM nvcr.io/nvidia/pytorch:22.03-py3 ---> 4730bc516b92 Step 2/4 : RUN git clone https://github.com/nvidia/deeplearningexamples && cd deeplearningexamples && git checkout f3dbf8a69522d69c63c4508769bd8137658786a1 ---> Running in 326d5fb91a89 Cloning into 'deeplearningexamples'... Note: switching to 'f3dbf8a69522d69c63c4508769bd8137658786a1'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c <new-branch-name> Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at f3dbf8a6 [BERT/PyT] add JIT autocast Removing intermediate container 326d5fb91a89 ---> 08e1c8dd26f8 Step 3/4 : WORKDIR /workspace/deeplearningexamples/PyTorch/Classification/ConvNets ---> Running in 63a3c614dfb1 Removing intermediate container 63a3c614dfb1 ---> a0cc6bb5202e Step 4/4 : RUN pip install -r requirements.txt nvidia-imageinary==1.1.3 ---> Running in 801287fbc5df Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Collecting dllogger Cloning https://github.com/NVIDIA/dllogger (to revision v1.0.0) to /tmp/pip-install-17k4pxj6/dllogger_ae526c1927be46a8b397f50ec3a3fe85 Running command git clone -q https://github.com/NVIDIA/dllogger /tmp/pip-install-17k4pxj6/dllogger_ae526c1927be46a8b397f50ec3a3fe85 Resolved https://github.com/NVIDIA/dllogger to commit 89913fd227b720a3026550b904cdca0d49d82100 Collecting nvidia-imageinary==1.1.3 Downloading https://developer.download.nvidia.com/compute/redist/nvidia-imageinary/nvidia_imageinary-1.1.3-py3-none-any.whl (13 kB) Collecting pynvml==11.0.0 Downloading pynvml-11.0.0-py3-none-any.whl (46 kB) Requirement already satisfied: Pillow>=7.1.2 in /opt/conda/lib/python3.8/site-packages (from nvidia-imageinary==1.1.3) (9.0.0) Requirement already satisfied: numpy>=1.18.0 in /opt/conda/lib/python3.8/site-packages (from nvidia-imageinary==1.1.3) (1.22.3) Building wheels for collected packages: dllogger Building wheel for dllogger (setup.py): started Building wheel for dllogger (setup.py): finished with status 'done' Created wheel for dllogger: filename=DLLogger-1.0.0-py3-none-any.whl size=5670 sha256=1f358bd0e559e49885ac67146957c1f343e24c400f5142fde7bcffef824dfaaa Stored in directory: /tmp/pip-ephem-wheel-cache-0h6p75zr/wheels/32/ff/4a/1d61bdc575b373a327658f1de2513a0af81094c50c9c56fa8b Successfully built dllogger Installing collected packages: pynvml, nvidia-imageinary, dllogger Attempting uninstall: pynvml Found existing installation: pynvml 11.4.1 Uninstalling pynvml-11.4.1: Successfully uninstalled pynvml-11.4.1 Successfully installed dllogger-1.0.0 nvidia-imageinary-1.1.3 pynvml-11.0.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Removing intermediate container 801287fbc5df ---> b98e9cad2b60 Successfully built b98e9cad2b60 Successfully tagged nvcr.io/nv-launchpad-orgname/sample-image:1.0
Now that we have a modified image built locally, we can push it to NGC so we can use it on other machines and collaborate with teammates. Assuming you are logged into NGC locally by following the steps above, run this command, updating the image name and tag as necessary:
$ docker push nvcr.io/nv-launchpad-orgname/sample-image:1.0
While the image is being pushed, you will see output similar to the following:
The push refers to repository [nvcr.io/nv-launchpad-orgname/sample-image] c170ca729e6b: Pushed 62f753b7286f: Pushed 9c7a2e08fe4c: Mounted from nvidia/pytorch fe48bfeac91d: Mounted from nvidia/pytorch 6d24369f9726: Mounted from nvidia/pytorch 0f010943c2be: Mounted from nvidia/pytorch 80df8233699e: Mounted from nvidia/pytorch 4995463ed504: Mounted from nvidia/pytorch c68289e5466a: Mounted from nvidia/pytorch b92c6f3cf8ba: Mounted from nvidia/pytorch ea54ed1c9d39: Mounted from nvidia/pytorch e94fa5c9c518: Mounted from nvidia/pytorch 8456d4967bfe: Mounted from nvidia/pytorch 40f364efa84f: Mounted from nvidia/pytorch 65c14c7eaf47: Mounted from nvidia/pytorch 14e6ddddf256: Mounted from nvidia/pytorch 7821737d952f: Mounted from nvidia/pytorch 77a776e8014b: Mounted from nvidia/pytorch 7a7051e759c4: Mounted from nvidia/pytorch e1aa1f9ee97e: Mounted from nvidia/pytorch 3b720402b8ab: Mounted from nvidia/pytorch 3b1792efdad9: Mounted from nvidia/pytorch 5f70bf18a086: Mounted from nvidia/pytorch 6ba71d233b75: Mounted from nvidia/pytorch 5342e89df8e3: Mounted from nvidia/pytorch fc3209a87194: Mounted from nvidia/pytorch 1ee80d85e1cf: Mounted from nvidia/pytorch 489f24d7d381: Mounted from nvidia/pytorch f7655918bfe6: Mounted from nvidia/pytorch 5ec341fc8fe7: Mounted from nvidia/pytorch 8fb729c89bb4: Mounted from nvidia/pytorch 852255d743c1: Mounted from nvidia/pytorch abf81ae6f4c8: Mounted from nvidia/pytorch f89ef356505e: Mounted from nvidia/pytorch 6fb2a344ac89: Mounted from nvidia/pytorch 850236713495: Mounted from nvidia/pytorch b9dfd77f5b0a: Mounted from nvidia/pytorch 6a1014d46250: Mounted from nvidia/pytorch 85f49f4e6923: Mounted from nvidia/pytorch 2f175b794573: Mounted from nvidia/pytorch 899455397741: Mounted from nvidia/pytorch 2df8c0a32afe: Mounted from nvidia/pytorch a060c5cefec7: Mounted from nvidia/pytorch 83cdade3c9b5: Mounted from nvidia/pytorch fec6965e7a6b: Mounted from nvidia/pytorch 2ff0ade8d3c9: Mounted from nvidia/pytorch 01e996931197: Mounted from nvidia/pytorch 867d0767a47c: Mounted from nvidia/pytorch 1.0: digest: sha256:64e3f9abb33ac2b7287d8626d79ef9bff2d5126eadce40699a2651c7aee72ec9 size: 10642
Once the image has been fully pushed, it should now be available on NGC for usage on Base Command as well as allowing the image to be pulled down locally on different systems.