Running Microsoft Cognitive Toolkit

Before running the container, use the docker pull command to ensure an up-to-date image is installed. Once the pull is complete, you can run the container image.

  1. Issue the command for the applicable release of the container that you want. The following command assumes you want to pull the latest container.
    docker pull nvcr.io/nvidia/cntk:18.05
  2. Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.
  3. Run the container image. A typical command to launch the container is:
    nvidia-docker run -it --rm -v local_dir:container_dir 
    nvcr.io/nvidia/cntk:<xx.xx>

    Where:
    • -it means interactive
    • --rm means delete the container when finished
    • –v means mount directory
    • local_dir is the directory or file from your host system (absolute path) that you want to access from inside your container. For example, the local_dir in the following path is /home/jsmith/data/mnist.
      -v /home/jsmith/data/mnist:/data/mnist

      If you are inside the container, for example, ls /data/mnist, you will see the same files as if you issued the ls /home/jsmith/data/mnist command from outside the container.

    • container_dir is the target directory when you are inside your container. For example, /data/mnist is the target directory in the example:
      -v /home/jsmith/data/mnist:/data/mnist
    • <xx.xx> is the tag. For example, 18.01.
    1. When running on a single GPU, the Microsoft Cognitive Toolkit can be invoked using a command similar to the following:
      cntk configFile=myscript.cntk ...
    2. When running on multiple GPUs, run the Microsoft Cognitive Toolkit through MPI. The following example uses 4 GPUs, numbered 0..3, for training:
      export OMP_NUM_THREADS=10
          export CUDA_DEVICE_ORDER=PCI_BUS_ID
          export CUDA_VISIBLE_DEVICES=0,1,2,3
          mpirun --allow-run-as-root --oversubscribe --npernode 4 \
                 -x OMP_NUM_THREADS -x CUDA_DEVICE_ORDER -x CUDA_VISIBLE_DEVICES \
                 cntk configFile=myscript.cntk ...
      
    3. When running with all 8 GPUs together, it is even more simple:
      export OMP_NUM_THREADS=10
          mpirun --allow-run-as-root --oversubscribe --npernode 8 \
                 -x OMP_NUM_THREADS cntk configFile=myscript.cntk ...
      Note: You can vary the number of GPUs with the option --npernode X where X is the number of GPUs. For the DGX-1™ this is a maximum of 8 GPUs per node. For the DGX Station™ it is a maximum of 4 GPUs. For NVIDIA® GPU Cloud™ (NGC) the number of GPUs depends upon the instance type that you have selected.

    You might want to pull in data and model descriptions from locations outside the container for use by Microsoft Cognitive Toolkit or save results to locations outside the container. To accomplish this, the easiest method is to mount one or more host directories as Docker data volumes.

    Note: In order to share data between ranks, NVIDIA® Collective Communications Library ™ (NCCL) may require shared system memory for IPC and pinned (page-locked) system memory resources. The operating system’s limits on these resources may need to be increased accordingly. Refer to your system’s documentation for details.
    In particular, Docker® containers default to limited shared and pinned memory resources. When using NCCL inside a container, it is recommended that you increase these resources by issuing:
    --shm-size=1g --ulimit memlock=-1
    in the command line to
    nvidia-docker run
  4. See /workspace/README.md inside the container for information on customizing your the DGX-1 image.

    For more information about the Microsoft Cognitive Toolkit, including tutorials, documentation, and examples, see the Microsoft Cognitive Toolkit wiki.