AutoML user guide

Specify search space

Configuring the “search” section

You make the component’s init args searchable by adding it to the “search” section of the component, which is a defined as a list.

Each item in the “search” list specifies the search ranges for one or more args:

  • domain - the search domain of the args. Currently lr, net, transform.

  • type - data type of the search. Currently float, enum.

  • args - list of arg names. They must be existing args in the component’s init args.

  • targets- the search range candidates. Its format depends on the “type”.

Float type

For the “float” type, “targets” is a list of two numbers - min and max of the range. If multiple args are specified in the “args”, the same search result (which is a float number) is applied to all of them.

Enum type

For the “enum” type, “targets” is a list of choices, and each choice is a list of values, one for each arg in the args list.

Examine this following example:

"search": [
   {
     "args": ["if_use_psp", "final_activation"],
     "domain": "net",
     "type": "enum",
     "targets": [[true, "softmax"], [false, "sigmoid"], [true, "sigmoid"]]
   }
]

Two args are specified in “args” (“if_use_psp” and “final_activation”). There are three target choices:

  • Choice 0: [true, “softmax”]

  • Choice 1: [false, “sigmoid”]

  • Choice 2: [true, “sigmoid”]

If the search result is choice 2, then true is assigned to “if_use_psp”, and “sigmoid” is assigned to “final_activation”.

This supports the use case of args being related and needing to be searched together.

Command Line Interface (CLI) for AutoML

automl_train_round.sh

To start Clara Train based AutoML, simply run automl.sh in the “commands” folder of the MMAR.

automl.sh is a very simple shell script:

#!/usr/bin/env bash

my_dir="$(dirname "$0")"
. $my_dir/set_env.sh

echo "MMAR_ROOT set to $MMAR_ROOT"

additional_options="$*"

# Data list containing all data
python -u -m nvmidl.apps.automl.train \
   -m $MMAR_ROOT \
   --set \
   run_id=a \
   workers=0,1,2:1,2:1,3 \
   ${additional_options}

The most important details to note are the settings of run_id and workers. The script sets their default values, but you can overwrite them by specifying them explicitly in the command line.

Specify run_id

As described above, run_id represents one AutoML experiment. Each experiment must have a unique run_id. To specify a run_id, simply append the following to the command line when running automl.sh:

run_id=<run_id>

Specify workers

You must define how many workers to use and assign GPU devices to each worker. The syntax is this:

workers=<gpu_id_list_for_worker1>:<gpu_id_list_for_worker2>:...

For each worker, you specify a list of GPU device IDs, separated by commas. Worker specs are separated by colons.

Examples for running AutoML

To run AutoML with run ID “test1” and two workers assigned to GPU 0 and 1 respectively:

automl.sh run_id=test1 workers=0:1

To run AutoML with run ID “test2” and two workers, with worker 1 assigned to GPU 0 and 1, and worker 2 assigned to GPU 2 and 3:

automl.sh run_id=test2 workers=0,1:2,3

Note

You can assign the same GPU to multiple workers, provided the GPU is big enough for all these workers at the same time.

For example, if you want 4 workers to share two GPUs:

automl.sh run_id=test3 workers=0:0:1:1

AutoML worker names

Workers are named like:

W<workerId>

where workerId is an integer starting from 1 (e.g. W1, W2, etc.).

Note

Worker names are used as a prefix to jobs’ MMAR names.

How to configure workers efficiently for AutoML?

When multiple GPUs are available, how can they be used efficiently? Should each job be executed with multiple GPUs, or should each job be assigned a single GPU? The answer is: it depends.

If multiple recommendations are produced each time by the controller, it might be more efficient to run each job with a single GPU. You still keep all GPUs busy since all jobs are run in parallel, and you can avoid cross-device synchronization overhead of a multi-gpu training (in case of horovod).

However, if the controller always produces a single recommendation each time based on the previous job score, then there would be no parallel job execution. In this case, you should arrange to run the job with multiple GPUs.

If the controller is implemented in a phased approach, with multiple recommendations produced then single recommendations produced, it can get tricky to optimally configure the workers.

Custom name for config_automl.json

In Clara Train 3.1, AutoML has been enhanced to support user specified names for the AutoML config file via the command line, as highlighted in this example of automl.sh:

...
python -u -m nvmidl.apps.automl.train \
    -m $MMAR_ROOT \
    --automlconf my_custom_config_automl.json \
    --set \
    run_id=a \
    workers=0:1 \
    traceout=both \
    trainconf=config_train_for_automl.json \
    ${additional_options}

Note

my_custom_config_automl.json must be in the MMAR’s “config” folder!

When AutoML is started, the file name of the AutoML config file that is used will be printed. Make sure it is what you specified.