Using Containers
Containers provide a way to encapsulate all the software dependencies of an application and enable it to be deployed on different systems. Containers are the preferred way to run applications on the DGX SuperPOD.
The DGX SuperPOD is deployed with two tools, Pyxis and Enroot, to help simplify the secure use of containers on the DGX SuperPOD. Pyxis extends the functionality of Slurm so that jobs can be launched directly into a container with srun. Enroot is a light-weight container-runtime that enables traditional container images to be run in unprivileged mode.
Examples
Here are some example commands for working with user containers:
Submit a job to Slurm on a worker node.
1srun grep PRETTY /etc/os-release 2PRETTY_NAME="Ubuntu 20.04.4 LTS"
Submit a job to Slurm and launching it in a container.
The –container-image option is used to specify which container to use.
1srun --container-image=centos grep PRETTY /etc/os-release 2PRETTY_NAME="CentOS Linux 7 (Core)"
Mount a file from the host and run the command on it from inside the container.
1srun --container-image=nvcr.io/nvidia/pytorch:22.12-py3 --container-mounts=/etc/os-release:/host/os-release grep PRETTY /host/os-release 2pyxis: importing docker image: nvcr.io/nvidia/pytorch:22.12-py3 3pyxis: imported docker image: nvcr.io/nvidia/pytorch:22.12-py3 4PRETTY_NAME="Ubuntu 20.04.4 LTS"
The –container-mounts option can be used to mount both files and directories into the container environment. Multiple options should be separated by commas.
1srun -N 2 --ntasks-per-node=1 --container-image=nvcr.io/nvidia/pytorch:22.12-py3 --container-mounts=/etc/os-release:/host/os-release grep PRETTY /host/os-release 2pyxis: imported docker image: nvcr.io/nvidia/pytorch:22.12-py3 3pyxis: imported docker image: nvcr.io/nvidia/pytorch:22.12-py3
Submit the same command across two nodes, mounting the current directory as /work in the container.
The full network name of the container is different. Enroot requires the separator between the network repository name (nvcr.io in this case) to be separated by a #, not a slash (/).
1srun -N 2 --ntasks-per-node=1 \ 2--container-image=nvcr.io/nvidia/pytorch:22.12-py3 --container-mounts=$(pwd):/work \ 3/bin/bash -c 'uname -n && cat /etc/os-release | grep PRETTY_NAME' 4dgx1 5PRETTY_NAME="Ubuntu 20.04.5 LTS" 6dgx2 7PRETTY_NAME="Ubuntu 20.04.5 LTS"
Further resources are available at these links: