AutoML user guide
Learning rate search
Add this line to the config file:
"lr_search": [0.0001, 0.001]
This line says to perform a search for the learning rate in the range of 0.0001 to 0.001.
Network parameter search
Add the “search” section to the “model” component:
"model": {
"name": "SegAhnet",
"args": {
"num_classes": 2,
"if_use_psp": false,
"pretrain_weight_name": "{PRETRAIN_WEIGHTS_FILE}",
"plane": "z",
"final_activation": "softmax",
"n_spatial_dim": 3
},
"search": [
{
"args": ["if_use_psp", "final_activation"],
"domain": "net",
"type": "enum",
"targets": [[true, "softmax"], [false, "sigmoid"]]
}
]
}
Please note that the “search” section is added at the same level as the existing “name” and “args” for the component definition. Also, in this example, the args “if_use_psp” and “final_activation” are grouped so only the combinations of [true, “softmax”] and [false, “sigmoid”] are tried. This is freely customizable and more detailed explanations are in sections below, but a configuration to search on the args individually would require a “search” section like:
"search": [
{
"args": "if_use_psp",
"domain": "net",
"type": "enum",
"targets": [[true], [false]]
},
{
"args": "final_activation",
"domain": "net",
"type": "enum",
"targets": [["softmax"], ["sigmoid"]]
}
]
Transform parameter search
Add the “search” section to any transform component:
{
"name": "RandomAxisFlip",
"args": {
"fields": [
"image",
"label"
],
"probability": 0.0
},
"search": [
{
"domain": "transform",
"type": "float",
"args": ["probability"],
"targets": [0.0, 1.0]
}
]
}
The “targets” in this example is list to specify a continuous range as opposed to a list of lists to denote discrete values.
You make the component’s init args searchable by adding it to the “search” section of the component, which is a defined as a list.
Each item in the “search” list specifies the search ranges for one or more args:
domain - the search domain of the args. Currently lr, net, transform.
type - data type of the search. Currently float, enum.
args - list of arg names. They must be existing args in the component’s init args.
targets- the search range candidates. Its format depends on the “type”.
Float type
For the “float” type, “targets” is a list of two numbers - min and max of the range. If multiple args are specified in the “args”, the same search result (which is a float number) is applied to all of them.
Enum type
For the “enum” type, “targets” is a list of choices, and each choice is a list of values, one for each arg in the args list.
Examine this following example:
"search": [
{
"args": ["if_use_psp", "final_activation"],
"domain": "net",
"type": "enum",
"targets": [[true, "softmax"], [false, "sigmoid"], [true, "sigmoid"]]
}
]
Two args are specified in “args” (“if_use_psp” and “final_activation”). There are three target choices:
Choice 0: [true, “softmax”]
Choice 1: [false, “sigmoid”]
Choice 2: [true, “sigmoid”]
If the search result is choice 2, then true is assigned to “if_use_psp”, and “sigmoid” is assigned to “final_activation”.
This supports the use case of args being related and needing to be searched together.
automl_train_round.sh
To start Clara Train based AutoML, simply run automl.sh in the “commands” folder of the MMAR.
automl.sh is a very simple shell script:
#!/usr/bin/env bash
my_dir="$(dirname "$0")"
. $my_dir/set_env.sh
echo "MMAR_ROOT set to $MMAR_ROOT"
additional_options="$*"
# Data list containing all data
python -u -m nvmidl.apps.automl.train \
-m $MMAR_ROOT \
--set \
run_id=a \
workers=0,1,2:1,2:1,3 \
${additional_options}
The most important details to note are the settings of run_id and workers. The script sets their default values, but you can overwrite them by specifying them explicitly in the command line.
Specify run_id
As described above, run_id represents one AutoML experiment. Each experiment must have a unique run_id. To specify a run_id, simply append the following to the command line when running automl.sh:
run_id=<run_id>
Specify workers
You must define how many workers to use and assign GPU devices to each worker. The syntax is this:
workers=<gpu_id_list_for_worker1>:<gpu_id_list_for_worker2>:...
For each worker, you specify a list of GPU device IDs, separated by commas. Worker specs are separated by colons.
Print log to main console
To output the contents of trace.log to the main console, append the following to the command when running automl.sh:
traceout=both
Examples for running AutoML
To run AutoML with run ID “test1” and two workers assigned to GPU 0 and 1 respectively:
automl.sh run_id=test1 workers=0:1
To run AutoML with run ID “test2” and two workers, with worker 1 assigned to GPU 0 and 1, and worker 2 assigned to GPU 2 and 3:
automl.sh run_id=test2 workers=0,1:2,3
You can assign the same GPU to multiple workers, provided the GPU is big enough for all these workers at the same time.
For example, if you want 4 workers to share two GPUs:
automl.sh run_id=test3 workers=0:0:1:1
AutoML worker names
Workers are named like:
W<workerId>
where workerId is an integer starting from 1 (e.g. W1, W2, etc.).
Worker names are used as a prefix to jobs’ MMAR names.
How to configure workers efficiently for AutoML?
When multiple GPUs are available, how can they be used efficiently? Should each job be executed with multiple GPUs, or should each job be assigned a single GPU? The answer is: it depends.
If multiple recommendations are produced each time by the controller, it might be more efficient to run each job with a single GPU. You still keep all GPUs busy since all jobs are run in parallel, and you can avoid cross-device synchronization overhead of a multi-gpu training (in case of horovod).
However, if the controller always produces a single recommendation each time based on the previous job score, then there would be no parallel job execution. In this case, you should arrange to run the job with multiple GPUs.
If the controller is implemented in a phased approach, with multiple recommendations produced then single recommendations produced, it can get tricky to optimally configure the workers.
Custom name for config_automl.json
In Clara Train 3.1, AutoML has been enhanced to support user specified names for the AutoML config file via the command line, as highlighted in this example of automl.sh:
...
python -u -m nvmidl.apps.automl.train \
-m $MMAR_ROOT \
--automlconf my_custom_config_automl.json \ --set \
run_id=a \
workers=0:1 \
traceout=both \
trainconf=config_train_for_automl.json \
${additional_options}
my_custom_config_automl.json must be in the MMAR’s “config” folder!
When AutoML is started, the file name of the AutoML config file that is used will be printed. Make sure it is what you specified.