Performance Benchmarks

We provide comprehensive benchmarks of various machine learning algorithms, examining their performance across a variety of configurations. These benchmarks aim to provide insights into the model training time, GPU utilization rate, memory usage, and training time over a certain number of epochs. The two main workflows considered in these benchmarks are Auto3DSeg and Self-Supervised Learning. The evaluations involve varying GPU counts and types, with a focus on the 80GB A100 GPU. The benchmarks provided in this document can guide developers and researchers in selecting the best configuration for their specific needs and constraints.

The subsequent section presents the benchmarking outcomes of the Auto3DSeg algorithms concerning computational efficiency. Dataset TotalSegmentator has been selected for demonstration purposes, as it is among the largest publicly available 3D medical image datasets, containing over 1,000 CT images and their corresponding 104 foreground classes of segmentation annotations. This dataset features substantial variations in field-of-view and organ/bone shapes.

To ensure equitable comparisons, we adhere to the original methodology employed in TotalSegmentator, dividing the 104 foreground classes into five segments and utilizing one segment, comprised of 17 foreground classes, for model training source. We have provided numerical results for each fold in a 5-fold cross-validation of three algorithms: DiNTS, 3D SegResNet, and SwinUNETR. It is important to note that, for this particular dataset, 2D SegResNet is not employed in the model training process due to the data spacing distribution and the internal algorithm selection logic we utilize. The GPU utilization and memory usage are assessed utilizing the widely recognized DCGM library.

The following table provides a comparison of the three algorithms when used with an 80GB A100 GPU and varying GPU counts ranging from 1 to 32:

Algorithm

GPU

No. of GPU

Model Training Time (Hrs)

GPU Utilization %

DiNTS

80GB A100

1

19.0

92%

DiNTS

80GB A100

8

2.5

92%

DiNTS

80GB A100

16

1.5

89%

DiNTS

80GB A100

32

0.9

84%

SegResNet (3D)

80GB A100

1

13.8

92%

SegResNet (3D)

80GB A100

8

2.8

91%

SegResNet (3D)

80GB A100

16

1.5

89%

SegResNet (3D)

80GB A100

32

0.8

88%

SwinUNETR

80GB A100

1

15.6

95%

SwinUNETR

80GB A100

8

2.2

94%

SwinUNETR

80GB A100

16

1.0

93%

SwinUNETR

80GB A100

32

0.6

91%

The following table illustrates the performance of the SSL (Semi-Supervised Learning) Algorithm when used with approximately 35,000 3D volumes of training data and varying GPU counts. The testing was conducted using the 80GB A100 GPU model.

The table presents key performance metrics for each configuration, including the Model Training Time for 200 epochs, Per GPU Memory Used, and GPU Utilization. These metrics provide insights into the algorithm’s efficiency and resource utilization during the training process.

Algorithm

GPU

No. of GPU

Model Training Time (Hrs)

Per GPU Memory Used

GPU Utilization

SSL

80GB A100

4

316

~72GB

77%

SSL

80GB A100

8

154

~73GB

86%

SSL

80GB A100

16

82

~71GB

90%

SSL

80GB A100

32

54

~72GB

89%

© Copyright 2023, NVIDIA. Last updated on Aug 15, 2023.