Step #4: Training Using Base Command Platform
This step will walk through creating a job and sample commands for running 1, 2, 4 and 8 GPU systems
This training code is contained in the nv-launchpad-bc:brats-monai-lab container.
brats_training_ddp.py
contains the script being used to train the models. Flags are included to allow you to change arguments and play with the model development. Below are some sample commands for single and multi-gpu model jobs on Base Command Platform.
--dir
sets the dataset directory. This is defaulted to /mount/workspace/brats2021
--epochs
sets the median number of total epochs to run. This is defaulted to 300 epochs
--lr
sets the learning rate. This is defaulted to 1e-4.
--batch_size
sets the batch size. This is defaulted to a batch size of 1
--seed
sets the seed for initializing training. This defaults to None
--cache_rate
sets the cache. A larger cache will need more GPU memory to store the transformed images. This defaults to 0.1
--val_interval
sets the validation interval. This defaults to every 20 epochs
--network
defaults to SegResNet
but can be either UNet
or SegResNet
--wandb
defaults to True to use Weights and Biases for visualization
Play around with these arguments and see how they affect your model speed and accuracy!
This part of the lab will need to be completed using the desktop that is accessible from the left-hand navigation pane.
Log into NVIDIA NGC by visiting https://ngc.nvidia.com/signin
Expand the Base Command section by clicking the downward facing chevron and select Dashboard
Click Create Job.
Select your Accelerated Computing Environment (ACE)
Set the instance size to dgxa100.80g.1.norm
Select the Workspaces tab, select the workspace you created in step 1, and set the mount point to
/mount/workspace
Set the result mountpoint to
/results
Select the nv-launchpad-bc:brats-monai-lab container from the dropdown and the 1.0 tag
Enter the command below in Run Command to run a 1 GPU job with WandB MLOps visualization integrated
wandb login <api key>; torchrun --nproc_per_node=1 --nnodes=1 /workspace/brats-monai-curated-lab/brats_training_ddp.py --epochs=300 --val_interval=10 --cache_rate=0.15
to run without WandB visualization, add the
--wandb=False
flag
torchrun --nproc_per_node=1 --nnodes=1 /workspace/brats-monai-curated-lab/brats_training_ddp.py --epochs=300 --val_interval=5 --cache_rate=0.15 --wandb=False
Rename the job to
single gpu BraTS training
NoteOnce you have filled in all the required fields, you can copy the command at the bottom of the Create Job page and paste it into your terminal to run the job via the NGC CLI tool.
Wait for the job to run to completion, this should take about 37 hours
If you do not wish to run the job to completion, click the kebab menu on the right of the Results button and then click Kill Job.
While waiting, Base Command can run other jobs concurrently. Try running some of the multi GPU jobs at the same time!
To run the job on multiple GPUs using PyTorch distributed data-parallel training, follow the single GPU job creation process, and update the torchrun
command and change the ACE to have the required amount of GPUs.
The following examples, optimize GPU Utilization by increasing the amount of data cached on each GPU. You can see that as we increase GPUs, the increase in GPU memory allows for more of the dataset to be stored on GPU memory.
Two GPUs
torchrun --nproc_per_node=2 --nnodes=1 /workspace/brats-monai-curated-lab/brats_training_ddp.py --epochs=300 --val_interval=5 --cache_rate=0.375
Four GPUs
torchrun --nproc_per_node=4 --nnodes=1 /workspace/brats-monai-curated-lab/brats_training_ddp.py --epochs=300 --val_interval=5 --cache_rate=0.75
Eight GPUs
torchrun --nproc_per_node=8 --nnodes=1 /workspace/brats-monai-curated-lab/brats_training_ddp.py --epochs=300 --val_interval=5 --cache_rate=1.0
When going through the lab, Weights and Biases is a great tool for analyzing important metrics for your model. You should get similar metrics as what is shown below when running the different training scripts.
As data gets distributed across multiple GPUs, the training time goes down significantly.
The time it takes to get to a benchmark of 80% mean dice accuracy also decreases with an increase in GPUs. It is important to note that the data is not shuffled between epochs. Data shuffling may increase the accuracy of the model faster due to more data being introduced to the individual GPUs.
Weights and Biases allows you to run jobs on Base Command and visualize them in real time to see the loss and accuracy of the model. This allows you to compare multiple models against eachother in real time. The code is already set up to showcase the training and validation loss, as well as the mean accuracy and individual label accuracies. The x axis is currently set to show the accuracy over time; however, this can be changed to compare other values such as the epoch number to the y axis.
If you opt into using Weights and Biases, it will also generate system analytics such as shown below. This can help you determine how the model is affecting your GPUs. Some of the below graphs have been customized to take an exponential moving average to visualize the average system usage and utilization over time. A higher GPU utilization may allow for your model to train faster. This can be potentially be done by caching more images on GPU Memory or increasing the batch size.