MONAI Toolkit Documentation
MONAI Version 1.1

Performance Benchmarks

We provide comprehensive benchmarks of various machine learning algorithms, examining their performance across a variety of configurations. These benchmarks aim to provide insights into the model training time, GPU utilization rate, memory usage, and training time over a certain number of epochs. The two main workflows considered in these benchmarks are Auto3DSeg and Self-Supervised Learning. The evaluations involve varying GPU counts and types, with a focus on the 80GB A100 GPU. The benchmarks provided in this document can guide developers and researchers in selecting the best configuration for their specific needs and constraints.

The subsequent section presents the benchmarking outcomes of the Auto3DSeg algorithms concerning computational efficiency. Dataset TotalSegmentator has been selected for demonstration purposes, as it is among the largest publicly available 3D medical image datasets, containing over 1,000 CT images and their corresponding 104 foreground classes of segmentation annotations. This dataset features substantial variations in field-of-view and organ/bone shapes.

To ensure equitable comparisons, we adhere to the original methodology employed in TotalSegmentator, dividing the 104 foreground classes into five segments and utilizing one segment, comprised of 17 foreground classes, for model training source. We have provided numerical results for each fold in a 5-fold cross-validation of three algorithms: DiNTS, 3D SegResNet, and SwinUNETR. It is important to note that, for this particular dataset, 2D SegResNet is not employed in the model training process due to the data spacing distribution and the internal algorithm selection logic we utilize. The GPU utilization and memory usage are assessed utilizing the widely recognized DCGM library.

The following table provides a comparison of the three algorithms when used with an 80GB A100 GPU and varying GPU counts ranging from 1 to 32:

Algorithm

GPU

No. of GPU

Model Training Time (Hrs)

GPU Utilization %

DiNTS 80GB A100 1 19.0 92%
DiNTS 80GB A100 8 2.5 92%
DiNTS 80GB A100 16 1.5 89%
DiNTS 80GB A100 32 0.9 84%
SegResNet (3D) 80GB A100 1 13.8 92%
SegResNet (3D) 80GB A100 8 2.8 91%
SegResNet (3D) 80GB A100 16 1.5 89%
SegResNet (3D) 80GB A100 32 0.8 88%
SwinUNETR 80GB A100 1 15.6 95%
SwinUNETR 80GB A100 8 2.2 94%
SwinUNETR 80GB A100 16 1.0 93%
SwinUNETR 80GB A100 32 0.6 91%

The following table illustrates the performance of the SSL (Semi-Supervised Learning) Algorithm when used with approximately 35,000 3D volumes of training data and varying GPU counts. The testing was conducted using the 80GB A100 GPU model.

The table presents key performance metrics for each configuration, including the Model Training Time for 200 epochs, Per GPU Memory Used, and GPU Utilization. These metrics provide insights into the algorithm’s efficiency and resource utilization during the training process.

Algorithm

GPU

No. of GPU

Model Training Time (Hrs)

Per GPU Memory Used

GPU Utilization

SSL 80GB A100 4 316 ~72GB 77%
SSL 80GB A100 8 154 ~73GB 86%
SSL 80GB A100 16 82 ~71GB 90%
SSL 80GB A100 32 54 ~72GB 89%
Previous Toolkit Navigation
Next MONAI Toolkit on NVIDIA Base Command Platform (BCP)
© Copyright 2023, NVIDIA. Last updated on Aug 15, 2023.