Running Torch

Running Torch (PDF)


Before you can run an NGC deep learning framework container, your Docker environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers And Frameworks User Guide and specify the registry, repository, and tags.
On a system with GPU support for NGC containers, the following occurs when running a container:

  • The Docker engine loads the image into a container which runs the software.
  • You define the runtime resources of the container by including additional flags and settings that are used with the command. These flags and settings are described in Running A Container.
  • The GPUs are explicitly defined for the Docker container (defaults to all GPUs, can be specified using NV_GPU environment variable).

The method implemented in your system depends on the DGX OS version installed (for DGX systems), the specific NGC Cloud Image provided by a Cloud Service Provider, or the software that you have installed in preparation for running NGC containers on TITAN PCs, Quadro PCs, or vGPUs.

  1. Issue the command for the applicable release of the container that you want. The following command assumes you want to pull the latest container.
    Copy
    Copied!
                

    docker pull nvcr.io/nvidia/torch:18.08

  2. Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.
  3. Run the container image. To run the container, choose interactive mode or non-interactive mode.
    1. Interactive mode: Open a command prompt and issue:
      Copy
      Copied!
                  

      docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/torch:<xx.xx>

    2. Non-interactive mode: Open a command prompt and issue:
      Copy
      Copied!
                  

      docker run --gpus all --rm -v local_dir:container_dir nvcr.io/nvidia/torch:<xx.xx> <command>

    You might want to pull in data and model descriptions from locations outside the container for use by Torch or save results to locations outside the container. To accomplish this, the easiest method is to mount one or more host directories as Docker data volumes.

    Note:

    Note:Deep Learning GPU Training System™ (DIGITS) uses shared memory to share data between processes. For example, if you use Torch multiprocessing for multi-threaded data loaders, the default shared memory segment size that the container runs with may not be enough. Therefore, you should increase the shared memory size by issuing either:

    Copy
    Copied!
                

    --ipc=host

    or

    Copy
    Copied!
                

    --shm-size=<requested memory size>

    in the command line to:

    Copy
    Copied!
                

    docker run --gpus all


© Copyright 2024, NVIDIA. Last updated on Jan 27, 2020.