Note
For training, evaluation, and inference, we expose two variables for each task: num_gpus and gpu_ids, which
default to 1 and [0], respectively. If both are passed, but are inconsistent, for example num_gpus = 1,
gpu_ids = [0, 1], then they are modified to follow the setting that implies more GPUs; in the same example num_gpus is modified from 1 to 2.
In some cases multi-GPU training may result in a segmentation fault. You can circumvent this by
setting the enviroment variable OMP_NUM_THREADS to 1. Depending upon your model of execution, you may use the following methods to set
this variable:
CLI Launcher:
You may set the environment variable by adding the following fields to the
Envsfield of your~/.tao_mounts.jsonfile as mentioned in bullet 3 in ths section Running the launcher.{ "Envs": [ { "variable": "OMP_NUM_THREADSR", "value": "1" } }
Docker:
You may set environment variables in Docker by setting the
-eflag in the Docker command line.docker run -it --rm --gpus all \ -e OMP_NUM_THREADS=1 \ -v /path/to/local/mount:/path/to/docker/mount nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt <model> train -e