Run and Monitor a Job (Run:ai UI)

The NVIDIA NGC TensorFlow Container is optimized for GPU acceleration and contains a validated set of libraries that enable and optimize GPU performance. In this example, the Run:ai UI is used to submit an unattended ResNet-50 training job with NGC TensorFlow container.

  1. Go to the Dashboard and select the drop-down for Jobs.

  2. Select + NEW JOB on the top right of the page.

    _images/runai-ui-01.png
  3. In the New Job screen, enter the required information, such as the project name, job name, number of GPUs, image name, and commands. Then, select SUBMIT.

    The following image and command arguments were used to launch this training job:

    • Image: nvcr.io/nvidia/tensorflow:22.01-tf1-py3

    • Arguements: ./nvidia-examples/cnn/resnet.py --layers=50 --precision=fp16 -i 100 -u epoch

    _images/runai-ui-02.png
  4. Monitor the status of the job using the Jobs screen.

    _images/runai-ui-03.png
  5. The Status should be Succeeded when the job completes.

    _images/runai-ui-04.png