AutoML user guide - NVIDIA Docs

Specify search space

The mechanisms behind AutoML remain the same in Clara Train 4.1 as they did in previous versions of Clara Train although the components will need to be available in Clara Train 4.1. See AutoML search space definition for Clara Train for additional details.

Network parameter search

Add the “search” section to the “model” component:

Copy
Copied!

            
            "model": {
 "name": "SegAhnet",
 "args": {
   "num_classes": 2,
   "if_use_psp": false,
   "pretrain_weight_name": "{PRETRAIN_WEIGHTS_FILE}",
   "plane": "z",
   "final_activation": "softmax",
   "n_spatial_dim": 3
 },
 "search": [
   {
     "args": ["if_use_psp", "final_activation"],
     "domain": "net",
     "type": "enum",
     "targets": [[true, "softmax"], [false, "sigmoid"]]
   }
 ]
}

Please note that the “search” section is added at the same level as the existing “name” and “args” for the component definition. Also, in this example, the args “if_use_psp” and “final_activation” are grouped so only the combinations of [true, “softmax”] and [false, “sigmoid”] are tried. This is freely customizable and more detailed explanations are in sections below, but a configuration to search on the args individually would require a “search” section like:

Copy
Copied!

            
            "search": [
   {
     "args": "if_use_psp",
     "domain": "net",
     "type": "enum",
     "targets": [[true], [false]]
   },
   {
     "args": "final_activation",
     "domain": "net",
     "type": "enum",
     "targets": [["softmax"], ["sigmoid"]]
   }
 ]

Transform parameter search

Add the “search” section to any transform component:

Copy
Copied!

            
            {
  "name": "RandomAxisFlip",
  "args": {
    "fields": [
      "image",
      "label"
    ],
    "probability": 0.0
  },
  "search": [
    {
      "domain": "transform",
      "type": "float",
      "args": ["probability"],
      "targets": [0.0, 1.0]
    }
  ]
}

Attention

The “targets” in this example is list to specify a continuous range as opposed to a list of lists to denote discrete values.

Configuring the “search” section

You make the component’s init args searchable by adding it to the “search” section of the component, which is a defined as a list.

Each item in the “search” list specifies the search ranges for one or more args:

domain - the search domain of the args. Currently lr, net, transform.
type - data type of the search. Currently float, enum.
args - list of arg names. They must be existing args in the component’s init args.
targets- the search range candidates. Its format depends on the “type”.

Float type

For the “float” type, “targets” is a list of two numbers - min and max of the range. If multiple args are specified in the “args”, the same search result (which is a float number) is applied to all of them.

Enum type

For the “enum” type, “targets” is a list of choices, and each choice is a list of values, one for each arg in the args list.

Examine this following example:

Copy
Copied!

            
            "search": [
   {
     "args": ["if_use_psp", "final_activation"],
     "domain": "net",
     "type": "enum",
     "targets": [[true, "softmax"], [false, "sigmoid"], [true, "sigmoid"]]
   }
]

Two args are specified in “args” (“if_use_psp” and “final_activation”). There are three target choices:

Choice 0: [true, “softmax”]
Choice 1: [false, “sigmoid”]
Choice 2: [true, “sigmoid”]

If the search result is choice 2, then true is assigned to “if_use_psp”, and “sigmoid” is assigned to “final_activation”.

This supports the use case of args being related and needing to be searched together.

Command Line Interface (CLI) for AutoML

automl_train_round.sh

To start Clara Train based AutoML, simply run automl.sh in the “commands” folder of the MMAR.

automl.sh is a very simple shell script:

Copy
Copied!

            
            #!/usr/bin/env bash

my_dir="$(dirname "$0")"
. $my_dir/set_env.sh

echo "MMAR_ROOT set to $MMAR_ROOT"

additional_options="$*"

# Data list containing all data
python -u -m medl.apps.automl.train \
   -m $MMAR_ROOT \
   --set \
   run_id=a \
   workers=0:1 \
   ${additional_options}

The most important details to note are the settings of run_id and workers. The script sets their default values, but you can overwrite them by specifying them explicitly in the command line.

Specify run_id

As described above, run_id represents one AutoML experiment. Each experiment must have a unique run_id. To specify a run_id, simply append the following to the command line when running automl.sh:

Copy
Copied!

            
            run_id=<run_id>

Specify workers

You must define how many workers to use and assign GPU devices to each worker. The syntax is this:

Copy
Copied!

            
            workers=<gpu_id_list_for_worker1>:<gpu_id_list_for_worker2>:...

For each worker, you specify a list of GPU device IDs, separated by commas. Worker specs are separated by colons.

Print log to main console

To output the contents of trace.log to the main console, append the following to the command when running automl.sh:

Copy
Copied!

            
            traceout=both

For additional information, set:

Copy
Copied!

            
            engtrace=1

Examples for running AutoML

To run AutoML with run ID “test1” and two workers assigned to GPU 0 and 1 respectively:

Copy
Copied!

            
            automl.sh run_id=test1 workers=0:1

AutoML worker names

Workers are named like:

Copy
Copied!

            
            W<workerId>

where workerId is an integer starting from 1 (e.g. W1, W2, etc.).

Note

Worker names are used as a prefix to jobs’ MMAR names.

How to configure workers efficiently for AutoML?

When multiple GPUs are available, how can they be used efficiently? Should each job be executed with multiple GPUs, or should each job be assigned a single GPU? The answer is: it depends.

If multiple recommendations are produced each time by the controller, it might be more efficient to run each job with a single GPU. You still keep all GPUs busy since all jobs are run in parallel, and you can avoid cross-device synchronization overhead of a multi-gpu training (in case of horovod).

However, if the controller always produces a single recommendation each time based on the previous job score, then there would be no parallel job execution. In this case, you should arrange to run the job with multiple GPUs. Note that there may be limitations to assigning multiple GPUs to multiple workers, so a single worker with multiple GPUs may be optimal.

If the controller is implemented in a phased approach, with multiple recommendations produced then single recommendations produced, it can get tricky to optimally configure the workers.

Custom name for config_automl.json

AutoML can support user specified names for the AutoML config file via the command line, as highlighted in this example of automl.sh:

Copy
Copied!

            
            ...
python -u -m medl.apps.automl.train \
    -m $MMAR_ROOT \
--automlconf my_custom_config_automl.json \    --set \
    run_id=a \
    workers=0:1 \
    traceout=both \
    trainconf=config_train_for_automl.json \
    ${additional_options}

Note

my_custom_config_automl.json must be in the MMAR’s “config” folder!

When AutoML is started, the file name of the AutoML config file that is used will be printed. Make sure it is what you specified.