Improving inference performance

If your model is trained with dynamic input shape (the Readme.md included in the model’s MMAR specifies whether the model is trained with dynamic input shape), you may be able to obtain significantly improved inference performance in both accuracy and speed.

TensorFlow 1.13 introduced support for dynamic network input shapes. This allows the model’s computation graph to be built with placeholders of dynamic shape [None, None, None], which can accept input data of any size. Inference performance can vary greatly for different input sizes. In general, inference tends to have better performance with bigger network input sizes.

Note

This is only true within certain size ranges. When the size goes beyond the range, the overall speed drops considerably, even though the total number of scanning windows are smaller. As of now, it is not clear how to accurately determine the upper bound.

Note

Only the model SegAhnet has been modified to support dynamic input shape. SegResnet, used in all brain related models, i.e. segmentation_mri_brain_tumors_br16_full, segmentation_mri_brain_tumors_br16_t1c2tc, and annotation_mri_brain_tumors_br16_t1ce_tc, does not support dynamic input shape. Inside this model, a layer called group normalization is used. Its implementation includes calls to tf.reshape which must take concrete shape size. Dynamic input shape sets the shape size to None, which is not concrete, and thus is incompatible with SegResnet.

Clara supports training with dynamic shape. To use dynamic network input, you must modify config_train.json of your model, as shown here:

Set global variable

Set the global variable dynamic_input_shpae to true:

Copy
Copied!
            

{ "epochs": 5000, "num_training_epoch_per_valid": 20, "learning_rate": 1e-4, "multi_gpu": false, "dynamic_input_shape": true, ...

Note

Previously in TLT, it was necessary to modify ImagePipeline’s crop size to [-1, -1, -1], but this is no longer needed. In fact, ImagePipeline itself should not be used because there are now task specific subclasses. These subclasses have output_crop_size as a parameter, which should remain as the image size coming out the image pipeline even when training with dynamic input shape.


ScanWindowInferer

ScanWindowInferer can perform inference on large images that cannot be fed to the model directly due to large size. It is implemented with a sophisticated algorithm:

The ScanWindowInferer is first configured with a roi_size (roi = region of interest). Based on the roi_size, it splits the image into a set of overlapping patches called slices. It then computes prediction for each slice. It finally computes the overall prediction by aggregating the results from slice predictions.

If your model uses scanning window based inference during validation, you now must explicitly set its “roi_size” (see example below). Set it to a size that makes the best sense to your model: it produces good accuracy without going over the bound. In general, this size should be no less than the size of training crops. For SegAhnet based models, the size must be divisible by 32. Here’s an example:

Copy
Copied!
            

"Inferer": { "name": "TFScanWindowInferer", "args": { "roi_size": [160, 160, 160] } }

Note

Do NOT change crop size of any transforms for training. They decide the actual input size of the crops into the network for training.


To validate or inference with a model trained with dynamic input shape, you must also modify the ScanWindowInferer configuration in config_validation.json, by explicitly specifying its roi_size. You can be a little more generous here since you have more GPU memory to work with during validation and inference. To obtain optimal performance (higher accuracy with faster speed), you should experiment with different ROI sizes. Here’s an example:

Copy
Copied!
            

"Inferer": { "name": "TFScanWindowInferer", "args": { "roi_size": [224, 224, 224], } }

The ScanWindowInferer offers another technique for improving inference speed: batch_size. The basic algorithm computes prediction for each slice one by one. This might not be able to fully utilize the GPU’s computing power. When specifying a batch_size > 1, you compute the predictions of multiple slices in one shot, hence potentially increasing the overall speed:

Copy
Copied!
            

"Inferer": { "name": "TFScanWindowInferer", "args": { "roi_size": [224, 224, 224], "batch_size": 2 } }

Note

It is not always true that the bigger the size, the faster the overall inference. It takes some experimentation to determine the best roi_size.

Changing the roi_size can change both inference accuracy and speed; whereas batch_size only causes changes to inference speed (i.e. inference should produce exactly the same accuracy for the same roi_size, regardless of batch_size).

The accuracy of the model is determined by the validation performed during training. With fixed network input shape, both training and validation (which runs inference against the graph) use the same network input shape. The accuracy of the model is therefore also fixed. However, with dynamic input shapes, training and validation no longer have to use the same input size. For example, you can use [96, 96, 96] as the crop size for training, whereas [160, 160, 160] as the ROI size of the ScanWindowInferer for validation. Using different ROI sizes for validation could produce different accuracy of the trained model.

So the question is how important is the accuracy value produced by the training process, and whether the quality of the trained model depends on the ROI size of the ScanWindowInferer used by validation?

To find answers to these questions, we ran multiple rounds of training with different ROI sizes with deterministic training enabled. All these runs produced the best model at exactly the same epoch with different “best mean dice” values.

Based on the results of these experiments:

  • It appears that the quality of the trained model does not depend on the ROI size for validation, even though the accuracy values do vary for different ROI sizes. This means that the trained model is objective.

  • The accuracy value determined by the training process is still important, but only in relative sense. You can probably still compare two models and judge which is better, but you should do so with the same ROI size for the validation.

© Copyright 2020, NVIDIA. Last updated on Feb 2, 2023.