NVIDIA Clara Train 4.1
1.0

AutoML search space definition for Clara Train

Clara Train 4.1 configurations will require MONAI and PyTorch based components but the AutoML functionality should remain mostly the same as in previous versions of Clara Train. For a full example AutoML configuration with Clara Train 4.1 components, see Example AutoML configuration with Clara Train 4.1 components.

A training config (typically named config_train.json) consists of the following definitions:

A component defines the configuration of a Python object. Many components are used in a training config: model, loss, optimizer, transforms, metrics, image pipelines, etc.

The general format of component config is:

Copy
Copied!
            

{ "name/path": "Class name or path", "args": { Component init args }, attributes },

To fully describe a component, you must specify the following:

Class information

Python objects are instantiated from classes. You specify the class path either through the “name” (or “path” element for BYOC).

Args

The args section specifies the values of init args of the Python object. For example, for the following transform component, the args are fields and magnitude:

Copy
Copied!
            

{ "name": "ScaleIntensityOscillation", "args": { "fields": "image", "magnitude": 0.10 } }


Attributes

Additional attributes can be specified. Currently, the only supported attribute is “disabled”. When a component is disabled, the component is ignored in training. When not specified, the default value of this attribute is false.

An example of the “disabled” attribute is highlighted in examples 2 and 3 below.

Note

Attributes are not part of the class definition.


First-level parameters

In addition to components, the training config also specifies parameters to control the behavior of the training. Examples of such parameters include epochs, num_interval_per_valid, etc.

You can define a search space for args and attributes for any component with a “search” key within the component definition. For example, for searching on the arg probability for the RandomAxisFlip transform component:

Copy
Copied!
            

{ "name": "RandomAxisFlip", "args": { "fields": [ "image", "label" ], "probability": 0.0 }, "search": [ { "domain": "transform", "args": ["probability"], "type": "float", "targets": [0.0, 1.0] } ] }

Example 1 - Simple alias

In this example, you define the search arg probability in component RandomAxisFlip, assign the alias prob to it (by adding “#prob” after the arg name probability), and then use its value in the component RandomRotate3D via the alias:

Copy
Copied!
            

{ "name": "RandomAxisFlip", "args": { "fields": ["image", "label"], "probability": 0.0 }, "search": [ { "domain": "transform", "args": ["probability#prob"], "type": "enum", "targets": [0.0, 1.0] } ] }, { "name": "RandomRotate3D", "args": { "fields": ["image", "label"], "probability": 0.0 }, "apply": { "probability": "prob" } }

The effect is that the probability used for RandomAxisFlip and RandomRotate3D will be the same value in your runs.

Tip

You can apply the same alias in any number of other components.


Example 2 - Use first-level search params to disable/enable multiple components

All first-level params can be searched. Moreover, you can define any number of additional first-level search params, as long as their names do not conflict with existing ones. All first-level params can be used in “apply”.

In this example, the two transforms RandomAxisFlip and RandomRotate3D are made mutually-exclusive:

Copy
Copied!
            

"search": [ { "domain": "transform", "type": "enum", "args": ["d1", "d2"], "targets": [[true, false], [false, true]] } ], ... { "name": "RandomAxisFlip", "args": { "fields": ["image", "label"], "probability": 0.0 }, "apply": { "@disabled": "d1" } } { "name": "RandomRotate3D", "args": { "fields": ["image", "label"], "probability": 0.0 }, "apply": { "@disabled": "d2" } }


Example 3 - Try different optimizers

By using the technique in Example 2, you can use AutoML to search against optimizers.

To do this, make sure to define the “optimizer” as a list of optimizer choices. Clara Train has been modified to accept both a list and a dict (which is the single optimizer). When you use a list, make sure one and only one optimizer will be enabled; or training will not start:

Copy
Copied!
            

"search": [ { "domain": "transform", "type": "enum", "args": ["d1", "d2"], "targets": [[true, false], [false, true]] } ], ... "optimizer": [ { "name": "NovoGrad", "apply": { "@disabled": "d1" } }, { "name": "Adam", "apply": { "@disabled": "d2" } } ],

Tip

You can do this for model, loss, and LR policy as well.


Note

In versions before Clara Train 4.1, @ and # notation could be used for AutoML but now these symbols are reserved for denoting objects or code in the configuration (see Upgrading from previous versions of Clara Train for details).

Copy
Copied!
            

{ "epochs": 1260, "num_interval_per_valid": 1, "multi_gpu": false, "amp": true, "learning_rate": 2e-4, "determinism": { "random_seed": 0 }, "cudnn_benchmark": false, "dont_load_ckpt_model": true, "search": [ { "domain": "lr", "args": ["learning_rate"], "type": "float", "targets": [0.0001,0.001] }, { "domain": "transform", "type": "enum", "args": [ "mySearchLoss1", "mySearchLoss2" ], "targets": [ [ true, false ], [ false, true ] ] } ], "train": { "loss": [ { "apply": {"@disabled": "mySearchLoss1"}, "name": "DiceLoss", "args": { "to_onehot_y": true, "softmax": true } }, { "apply": {"@disabled": "mySearchLoss2"}, "name": "FocalLoss", "args": { "to_onehot_y": true } }], "optimizer": { "name": "Adam", "args": { "params": "#@model.parameters()", "lr": "{learning_rate}" } }, "lr_scheduler": { "name": "StepLR", "args": { "optimizer": "@optimizer", "step_size": 5000, "gamma": 0.1 } }, "model": { "name": "UNet", "args": { "dimensions": 3, "in_channels": "{INPUT_CHANNELS}", "out_channels": "{OUTPUT_CHANNELS}", "channels": [16, 32, 64, 128, 256], "strides": [2, 2, 2, 2], "num_res_units": 2, "norm": "batch" }, "search": [ { "type": "enum", "args": ["channels"], "targets": [[[16, 32, 64, 128, 256]], [[8, 16, 32, 64, 128]]], "domain": "net" }, { "type": "enum", "args": ["num_res_units"], "targets": [[2],[1]], "domain": "net" } ] }, "pre_transforms": [ { "name": "LoadImaged", "args": { "keys": [ "image", "label" ] } }, { "name": "EnsureChannelFirstd", "args": { "keys": [ "image", "label" ] } }, { "name": "ScaleIntensityRanged", "args": { "keys": "image", "a_min": -57, "a_max": 164, "b_min": 0.0, "b_max": 1.0, "clip": true }, "search":[ { "type": "enum", "args": ["a_min", "a_max"], "targets": [[-50,100],[-60, 150]], "domain": "transform" } ] }, { "name": "CropForegroundd", "args": { "keys": [ "image", "label" ], "source_key": "image" } }, { "name": "RandCropByPosNegLabeld", "args": { "keys": [ "image", "label" ], "label_key": "label", "spatial_size": [ 96, 96, 96 ], "pos": 1, "neg": 1, "num_samples": 4, "image_key": "image", "image_threshold": 0 } }, { "name": "RandShiftIntensityd", "args": { "keys": "image", "offsets": 0.1, "prob": 0.5 }, "search": [ { "type": "float", "args": ["prob#myProb"], "targets": [0, 1], "domain": "transform" } ] }, { "name": "RandGaussianNoised", "args": { "keys": "image", "prob": 0.10 }, "apply": {"prob": "myProb"} }, { "name": "ToTensord", "args": { "keys": [ "image", "label" ] } } ], "dataset": { "name": "CacheDataset", "data_list_file_path": "{DATASET_JSON}", "data_file_base_dir": "{DATA_ROOT}", "data_list_key": "{TRAIN_DATALIST_KEY}", "args": { "transform": "@pre_transforms", "cache_num": 32, "cache_rate": 1.0, "num_workers": 2 } }, "dataloader": { "name": "DataLoader", "args": { "dataset": "@dataset", "batch_size": 2, "shuffle": true, "num_workers": 2 } }, "inferer": { "name": "SimpleInferer" }, "handlers": [ { "name": "LrScheduleHandler", "args": { "lr_scheduler": "@lr_scheduler", "print_lr": true } }, { "name": "ValidationHandler", "args": { "validator": "@evaluator", "interval": 1, "epoch_level": true } }, { "name": "CheckpointSaver", "rank": 0, "args": { "save_dir": "{MMAR_CKPT_DIR}", "save_dict": { "model": "@model", "optimizer": "@optimizer", "lr_scheduler": "@lr_scheduler", "train_conf": "@conf" }, "save_final": true, "save_interval": 400 } }, { "name": "StatsHandler", "rank": 0, "args": { "tag_name": "train_loss", "output_transform": "#monai.handlers.from_engine(['loss'], first=True)" } }, { "name": "TensorBoardStatsHandler", "rank": 0, "args": { "log_dir": "{MMAR_CKPT_DIR}", "tag_name": "train_loss", "output_transform": "#monai.handlers.from_engine(['loss'], first=True)" } } ], "post_transforms": [ { "name": "Activationsd", "args": { "keys": "pred", "softmax": true } }, { "name": "AsDiscreted", "args": { "keys": ["pred", "label"], "argmax": [true, false], "to_onehot": 2 } } ], "key_metric": { "name": "Accuracy", "log_label": "train_acc", "args": { "output_transform": "#monai.handlers.from_engine(['pred', 'label'])" } }, "trainer": { "name": "SupervisedTrainer", "args": { "max_epochs": "{epochs}", "device": "cuda", "train_data_loader": "@dataloader", "network": "@model", "loss_function": "@loss", "optimizer": "@optimizer", "inferer": "@inferer", "postprocessing": "@post_transforms", "key_train_metric": "@key_metric", "train_handlers": "@handlers", "amp": "{amp}" } } }, "validate": { "pre_transforms": [ { "ref": "LoadImaged" }, { "ref": "EnsureChannelFirstd" }, { "ref": "ScaleIntensityRanged" }, { "ref": "CropForegroundd" }, { "ref": "ToTensord" } ], "dataset": { "name": "CacheDataset", "data_list_file_path": "{DATASET_JSON}", "data_file_base_dir": "{DATA_ROOT}", "data_list_key": "validation", "args": { "transform": "@pre_transforms", "cache_num": 9, "cache_rate": 1.0, "num_workers": 2 } }, "dataloader": { "name": "DataLoader", "args": { "dataset": "@dataset", "batch_size": 1, "shuffle": false, "num_workers": 2 } }, "inferer": { "name": "SlidingWindowInferer", "args": { "roi_size": [ 160, 160, 160 ], "sw_batch_size": 4, "overlap": 0.5 } }, "handlers": [ { "name": "StatsHandler", "rank": 0, "args": { "output_transform": "lambda x: None" } }, { "name": "TensorBoardStatsHandler", "rank": 0, "args": { "log_dir": "{MMAR_CKPT_DIR}", "output_transform": "lambda x: None" } }, { "name": "CheckpointSaver", "rank": 0, "args": { "save_dir": "{MMAR_CKPT_DIR}", "save_dict": {"model": "@model", "train_conf": "@conf"}, "save_key_metric": true } } ], "post_transforms": [ { "ref": "Activationsd" }, { "ref": "AsDiscreted" } ], "key_metric": { "name": "MeanDice", "log_label": "val_mean_dice", "args": { "include_background": false, "output_transform": "#monai.handlers.from_engine(['pred', 'label'])" } }, "additional_metrics": [ { "name": "Accuracy", "log_label": "val_acc", "args": { "output_transform": "#monai.handlers.from_engine(['pred', 'label'])" } } ], "evaluator": { "name": "SupervisedEvaluator", "args": { "device": "cuda", "val_data_loader": "@dataloader", "network": "@model", "inferer": "@inferer", "postprocessing": "@post_transforms", "key_val_metric": "@key_metric", "additional_metrics": "@additional_metrics", "val_handlers": "@handlers", "amp": "{amp}" } } } }

© Copyright 2021, NVIDIA. Last updated on Feb 2, 2023.