Examples of Running Containers

Logging Into the NGC Container Registry

Skip this section if you provided your API Key when logging into the VM via SSH.

If you did not provide your API Key when connecting to your instance, then you must perform this step.

Log in to the NGC container registry using the following Docker command.

docker login nvcr.io

You will be prompted to enter a Username and Password. Type “$oauthtoken” exactly as shown, and enter your NGC API key obtained during NGC account setup:

Username: $oauthtoken

Password: <Your NGC API Key>

From this point you can run Docker commands and access the NGC container registry from the VM instance.

Example: MNIST Training Run Using PyTorch Container

Once logged in to the Amazon EC2 P3 instance, you can run the MNIST example under PyTorch.

Note that the PyTorch example will download the MNIST dataset from the web.

Pull and run the PyTorch container:

docker pull nvcr.io/nvidia/pytorch:17.10
nvidia-docker run --rm -it nvcr.io/nvidia/pytorch:17.10

Run the MNIST example:

cd /opt/pytorch/examples/mnist
python main.py

Example: MNIST Training Run Using TensorFlow Container

Once logged in to the Amazon EC2 P3 instance, you can run the MNIST example under TensorFlow.

Note that the PyTorch built-in example will pull the MNIST dataset from the web.

Pull and run the TensorFlow container:

docker pull nvcr.io/nvidia/tensorflow:17.10
nvidia-docker run --rm -it nvcr.io/nvidia/tensorflow:17.10

Following this tutorial: https://www.tensorflow.org/get_started/mnist/beginners

Run the MNIST_with_summaries example:

cd /opt/tensorflow/tensorflow/examples/tutorials/mnist
python mnist_with_summaries.py

Example: ResNet50 Training on TensorFlow, Using EFS to Host the ImageNet Dataset

This example involves downloading ImageNet, and requires

  • A p3.16xlarge instance.
  • That you have created an EFS file system.
  1. Mount the EFS file system to /data with the EFS file system’s DNS name.

    This is the instruction for the one-time mount:

    sudo mkdir /data
    sudo mount -t nfs4 -o \
      nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 \
      EFS-DNS-NAME:/ /data
    sudo chmod 777 /data 
    If you plan to stop and then start your VM instance, edit /etc/fstab according to the instructions at Mounting an EFS .
  2. Launch the TensorFlow container in interactive mode with the EFS /data volume mounted to /data inside the container:
    docker pull nvcr.io/nvidia/tensorflow:17.10
    nvidia-docker run --rm -it -v /data:/data nvcr.io/nvidia/tensorflow:17.10 
  3. In the running container, move to the ImageNet download script directory:
    cd /opt/tensorflow/nvidia-examples/build_imagenet_data/
  4. Read the README.md file, and follow the instructions for downloading ImageNet. This may take several hours.
  5. Train ResNet50 with TensorFlow.
    cd /opt/tensorflow/nvidia-examples/cnn
    python nvcnn.py --model=resnet50 \
                    --data_dir=/data/imagenet_tfrecord \
                    --batch_size=64 \
                    --num_gpus=8 \
                    --num_epochs=120 \
                    --display_every=50 \
                    --log_dir=/home/train/resnet50-1