Examples of Running Containers

Logging Into the NGC Container Registry

When you connect to the VM instance, the instance script initiates the Docker login process automatically, at which point you must enter your NGC API Key.

If necessary, log in to the NGC container registry manually using the following Docker command.

docker login nvcr.io

You will be prompted to enter a Username and Password. Type “$oauthtoken” exactly as shown, and enter your NGC API key obtained during NGC account setup:

Username: $oauthtoken

Password: <Your NGC API Key>

From this point you can run Docker commands and access the NGC container registry from the VM instance.

Example: MNIST Training Run Using PyTorch Container

Once logged in to the NVIDIA GPU Cloud Image instance, you can run the MNIST example under PyTorch.

Note that the PyTorch example will download the MNIST dataset from the web.

  1. Pull and run the PyTorch container:
    docker pull nvcr.io/nvidia/pytorch:18.02-py3
    docker run --runtime=nvidia --rm -it nvcr.io/nvidia/pytorch:18.02-py3.10
  2. Run the MNIST example:
    cd /opt/pytorch/examples/mnist
    python main.py

Example: MNIST Training Run Using TensorFlow Container

Once logged in to the NVIDIA GPU Cloud Image instance, you can run the MNIST example under TensorFlow.

Note that the TensorFlow built-in example will pull the MNIST dataset from the web.

  1. Pull and run the TensorFlow container.
    docker pull nvcr.io/nvidia/tensorflow:18.02-py3
    docker run --runtime=nvidia --rm -it nvcr.io/nvidia/tensorflow:18.02-py3
  2. Follow this tutorial: https://www.tensorflow.org/get_started/mnist/beginners
  3. Run the MNIST_with_summaries example:
    cd /opt/tensorflow/tensorflow/examples/tutorials/mnist
    python mnist_with_summaries.py

Example: Persistent SSD Dataset Disk with ImageNet, ResNet50 Tor TensorFlow

This example involves downloading ImageNet, and requires

  • A Volta-based GPU Cloud instance.
  • That you have created an Persistent SSD Disk as data volume.
  1. Mount the Persistent SSD Disk volume to /data.

    This is the instruction for the one-time mount:

    sudo mkdir /data
    sudo mount /dev/sdb1 /data
    sudo chmod 777 /data
     
  2. Copy the ImageNet dataset onto the SSD file system in /data:
    scp -r local_dataset_dir/ <username>@<GCP_VM_Instance>:/data
    docker pull nvcr.io/nvidia/tensorflow:18.02-py3
    docker run --runtime=nvidia --rm -it -v /data:/data nvcr.io/nvidia/tensorflow:18.02-py3
    
  3. In the running container, move to the ImageNet download script directory:
    cd /opt/tensorflow/nvidia-examples/build_imagenet_data/
  4. Read the README.md file, and follow the instructions for downloading ImageNet. This may take several hours.
  5. Train ResNet50 with TensorFlow.
    cd /opt/tensorflow/nvidia-examples/cnn
    python nvcnn.py --model=resnet50 \
                    --data_dir=/data/imagenet_tfrecord \
                    --batch_size=64 \
                    --num_gpus=1 \
                    --num_epochs=120 \
                    --display_every=50 \
                    --log_dir=/home/train/resnet50-1
    
    For num_gpus, specify the number of GPUs used for the VM instance.