NVIDIA Docs Hub NVIDIA MONAI MONAI Toolkit Performance Benchmarks

Performance Benchmarks

We provide comprehensive benchmarks of various machine learning algorithms, examining their performance across a variety of configurations. These benchmarks aim to provide insights into the model training time, GPU utilization rate, memory usage, and training time over a certain number of epochs. The two main workflows considered in these benchmarks are Auto3DSeg and Self-Supervised Learning. The evaluations involve varying GPU counts and types, with a focus on the 80GB A100 GPU. The benchmarks provided in this document can guide developers and researchers in selecting the best configuration for their specific needs and constraints.

Benchmark 1: DiNTS, SegResNet (3D), and SwinUNETR Algorithms

The subsequent section presents the benchmarking outcomes of the Auto3DSeg algorithms concerning computational efficiency. Dataset TotalSegmentator has been selected for demonstration purposes, as it is among the largest publicly available 3D medical image datasets, containing over 1,000 CT images and their corresponding 104 foreground classes of segmentation annotations. This dataset features substantial variations in field-of-view and organ/bone shapes.

To ensure equitable comparisons, we adhere to the original methodology employed in TotalSegmentator, dividing the 104 foreground classes into five segments and utilizing one segment, comprised of 17 foreground classes, for model training source. We have provided numerical results for each fold in a 5-fold cross-validation of three algorithms: DiNTS, 3D SegResNet, and SwinUNETR. It is important to note that, for this particular dataset, 2D SegResNet is not employed in the model training process due to the data spacing distribution and the internal algorithm selection logic we utilize. The GPU utilization and memory usage are assessed utilizing the widely recognized DCGM library.

The following table provides a comparison of the three algorithms when used with an 80GB A100 GPU and varying GPU counts ranging from 1 to 32:

Algorithm	GPU	No. of GPU	Model Training Time (Hrs)	GPU Utilization %
DiNTS	80GB A100	1	19.0	92%
DiNTS	80GB A100	8	2.5	92%
DiNTS	80GB A100	16	1.5	89%
DiNTS	80GB A100	32	0.9	84%
SegResNet (3D)	80GB A100	1	13.8	92%
SegResNet (3D)	80GB A100	8	2.8	91%
SegResNet (3D)	80GB A100	16	1.5	89%
SegResNet (3D)	80GB A100	32	0.8	88%
SwinUNETR	80GB A100	1	15.6	95%
SwinUNETR	80GB A100	8	2.2	94%
SwinUNETR	80GB A100	16	1.0	93%
SwinUNETR	80GB A100	32	0.6	91%

Benchmark 2: SSL Algorithm

The following table illustrates the performance of the SSL (Semi-Supervised Learning) Algorithm when used with approximately 35,000 3D volumes of training data and varying GPU counts. The testing was conducted using the 80GB A100 GPU model.

The table presents key performance metrics for each configuration, including the Model Training Time for 200 epochs, Per GPU Memory Used, and GPU Utilization. These metrics provide insights into the algorithm’s efficiency and resource utilization during the training process.

Algorithm	GPU	No. of GPU	Model Training Time (Hrs)	Per GPU Memory Used	GPU Utilization
SSL	80GB A100	4	316	~72GB	77%
SSL	80GB A100	8	154	~73GB	86%
SSL	80GB A100	16	82	~71GB	90%
SSL	80GB A100	32	54	~72GB	89%

Benchmark 3: Generative AI

Previous Toolkit Navigation

Next MONAI Toolkit on NVIDIA Base Command Platform (BCP)