-
class
BaseMMARExecutor
(poll_interval=1.0, remove_search_augs=False) Bases:
automl.components.executors.executor.Executor
Implements an Executor that does MMAR-based model training.
-
abort
(ctx: automl.defs.Context) Called to abort the training process. The model training is run in a subprocess. This method tries to kill that subprocess.
- Parameters
ctx – the job execution context.
Returns:
-
check_process
(job_name, prc, ctx: automl.defs.Context) → automl.defs.ProcessStatus
-
determine_search_space
(ctx: automl.defs.Context) → automl.defs.SearchSpace Determine search space based on config_train.json. The config_train.json has been augmented with search parameter definitions. This method extracts these definitions and creates a search space.
- Parameters
ctx – execution context
Returns: a SearchSpace object
-
do_job
(job_name, my_mmar_root, ctx: automl.defs.Context)
-
execute
(recommendation: automl.defs.Recommendation, ctx: automl.defs.Context) → object Do model training based on the specified recommendation
1. clone a MMAR from mmar_root and place it in the run_root. Name the MMAR based on ‘job_name’ in the ctx.
create the config_train.json based on train_config and recommendation
put the MMAR path in ctx (key: ‘job_mmar_root’)
kick off clara train subprocess from the job_mmar_root
It monitors the progress of the training subprocess. When finished, it extracts the best validation value from the training stats file and returns it as final score.
- Parameters
recommendation – the recommendation to be executed
ctx – job context
Returns: a score
-
stop_process
(prc, ctx: automl.defs.Context)
-
submit_process
(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context) → object
-
-
class
MMARPropKey
Bases:
object
Defines context key names for MMAR related component implementation
-
AUTOML_ROOT
= '_automlRoot'
-
EXEC_SUBPROCESS
= '_execSubprocess'
-
JOB_MMAR_ROOT
= '_jobMmarRoot'
-
JOB_START_TIME
= '_mmarJobStart'
-
MMAR_ROOT
= '_mmarRoot'
-
MMAR_TRACE
= '_mmarTrace'
-
MMAR_TRAIN_STATS
= '_mmarTrainStats'
-
RUN_ID
= '_runId'
-
RUN_ROOT
= '_runRoot'
-
TEMP_MMAR_ROOT
= '_tempMmarRoot'
-
TRAIN_CONFIG_FILE
= '_trainConfigFile'
-
-
class
MMARStd
Bases:
object
Defines standard folder and file names used by MMAR.
-
AUTOML_CONFIG
= 'config_automl.json'
-
AUTOML_DIR
= 'automl'
-
AUTOML_STATS_FILE
= 'automl_stats_log.json'
-
COMMAND_DIR
= 'commands'
-
CONFIG_DIR
= 'config'
-
ENV_CONFIG
= 'environment.json'
-
EVAL_DIR
= 'eval'
-
LOG_CONFIG
= 'resources/log.config'
-
LOG_FILE
= 'log.txt'
-
MODELS_DIR
= 'models'
-
SCORE_KEY
= 'best_validation_metric'
-
TRACE_FILE
= 'trace.txt'
-
TRAIN_CONFIG
= 'config_train.json'
-
TRAIN_ROUND_CMD
= 'automl_train_round.sh'
-
TRAIN_STATS
= 'train_stats.json'
-
-
class
DummyController
(max_rounds=1000) Bases:
automl.components.controllers.controller.Controller
-
initial_recommendation
(ctx) This method is called by the AutoML workflow engine to produce the initial set of recommendations. The controller must produce 1 or more recommendations. If no recommendation is produced, the AutoML workflow will stop immediately.
This method is called only once at the beginning of the AutoML process.
- Parameters
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations
-
refine_recommendation
(outcome: automl.defs.Outcome, ctx: automl.defs.Context) This method is called by the AutoML workflow engine to produce a set of recommendations based on the result from a previous job.
The controller can produce 0 or more recommendations.
This method is called every time a job finishes executing a previous recommendation.
- Parameters
outcome – the result of executing the previous recommendation
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations, could be empty
-
set_search_space
(space, ctx) Set the search space. This is the search space that the controller will search against to produce recommendations. The controller must keep it for later use.
- Parameters
space – the search space
ctx – the context that enables across-component data sharing and communication
Returns:
NOTE: the controller should validate the search space and makes sure it is acceptable. In case the search space is not acceptable, the controller should either raise an exception or ask to stop the workflow by calling: ctx.ask_to_stop().
-
-
class
DummyHandler
Bases:
automl.components.handlers.handler.Handler
-
end_job
(ctx) The job execution has ended. Called from the job thread. See notes in start_job method. You can check the job status from prop ContextKey.JOB_STATUS in the ctx.
NOTE: if the start_job is called, it is guaranteed that end_job will be called.
- Parameters
ctx – the job context
Returns:
-
recommendations_available
(ctx) The recommendations are available. Called from the main thread. You can get recommendations from prop ContextKey.RECOMMENDATIONS in the ctx.
- Parameters
ctx – main context
Returns:
-
-
class
MMARExecutor
(msg_destination: str = None, poll_interval=1.0, remove_search_augs=False) Bases:
nvmidl.apps.automl.base_mmar_exec.BaseMMARExecutor
Implements an Executor that does MMAR-based model training.
- Parameters
msg_destination (str) – destination for log messages produced during execution. Possible values: file (default), console, both, none
poll_interval (float) – how often to poll job execution status. Default 1 second.
remove_search_augs – whether to remove search augmentation JSON elements from generated train config. Default False.
-
check_process
(job_name, prc, ctx: automl.defs.Context) → automl.defs.ProcessStatus
-
stop_process
(prc, ctx: automl.defs.Context) Called to stop the training process. The model training is run in a subprocess. This method tries to kill that subprocess.
- Parameters
prc – the process object
ctx – the job execution context.
Returns:
-
submit_process
(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context) → object
-
class
MMARHandler
(num_mmars_to_keep=1, stop_threshold=None, train_config_file=None, work_dir=None, key_mmar_content_only=False, data_list_file_name=None, keep_bad_mmars=True) Bases:
automl.components.handlers.handler.Handler
,dlmed.hci.reg.CommandModule
MMARHandler implements functions to adapt AutoML to MMAR-based model training.
- Parameters
num_mmars_to_keep – number of job MMARs to keep during AutoML. Extra MMARs are deleted.
stop_threshold – if specified, when any job’s finishing score meets or exceeds this threshold, stop AutoML.
train_config_file – if specified, the name of train config file to use. If not specified, use config_train.json.
work_dir – if specified, the directory for AutoML work space. If not, default to ‘automl’.
-
end_automl
(ctx: automl.defs.Context) Print results produced from this run
- Parameters
ctx – main context
Returns:
-
end_job
(ctx: automl.defs.Context) Add job_mmar_root to self.job_mmars Check whether the result of this round (ctx: score) is better and adjust best_mmars Only keep num_mmars_to_keep MMARs and remove all others
- Parameters
ctx – job context
Returns:
NOTE: this method is called from different threads. Data access must be protected!
-
get_spec
()
-
handle_summary
(conn: dlmed.hci.conn.Connection, args: [] )
-
start_automl
(ctx: automl.defs.Context) 1. Create folder for the run (run_root) based on run_id in mmar_root if the folder does not exist yet If the folder exists, warns the user and quit (the run already exists) 2. determine the base config_train.json and place it in run_root
- Parameters
ctx – main context
Returns:
-
start_job
(ctx: automl.defs.Context) The job execution is about to start. Called from the job thread.
NOTE: this method could be called from multiple job threads at the same time. If you want to store across-job state data in the handler, you should ensure thread safety when updating such state data.
NOTE: the ctx is a per-job context, hence it is not subject to multi-thread access. Consider using the ctx to store per-job state data.
- Parameters
ctx – the job context
Returns:
-
startup
(ctx: automl.defs.Context) The handler is being started up. Use this method to initialize the handler based on info in the context.
NOTE: this method is called in the order of handler chain. If your handler depends on some info provided by previous handlers, you can get such info from the ctx.
- Parameters
ctx – main context
Returns:
-
class
MMARSummary
(root: str, score, status: automl.defs.ProcessStatus) Bases:
object
Define a simple structure for job execution stats summary
- Parameters
root – the job’s MMAR root
score – finishing score of the job
-
class
MMARTrainer
(run_id: str, mmar_root: str, workers: [], controller: automl.components.controllers.controller.Controller, executor: automl.components.executors.executor.Executor = None, handlers: [ )] = None, trace: str = 'none', engine_trace=False, max_job_run_time=None, max_sys_run_time=None, port=33330 Bases:
object
-
train
(ctx: automl.defs.Context)
-
-
class
NGCExecutor
(image, dataset_id, dataset_path, workspace_id, workspace_path, org='nvidian', team='dlmed', ace='nv-us-west-2', instance='dgx1v.16g.1.norm', result_dir='/results', poll_interval=30, remove_search_augs=False) Bases:
nvmidl.apps.automl.base_mmar_exec.BaseMMARExecutor
Implements an Executor that does MMAR-based model training deployed onto NGC clusters.
-
check_process
(job_name, prc, ctx: automl.defs.Context) → automl.defs.ProcessStatus
-
stop_process
(prc, ctx: automl.defs.Context) Called to abort the training process. NGC jobs can be stopped via web ui, ‘ngc batch kill,’ or ngc system when timeout. This method sends ‘ngc batch kill’ to stop the job.
- Parameters
prc – the ngc job id
ctx – the job execution context.
Returns:
-
submit_process
(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context)
-
-
class
EnumParamMapper
(component, args: [], range_def: list, global_vars: dict, attr_comp ) Bases:
nvmidl.apps.automl.ssu.ParamMapper
FloatParamMapper is responsible for management of enum parameters
- Parameters
component – the JSON component to be managed
args – list of arg names of the component
range_def – list of choices, each choice is a list of values, one for each arg
-
do_update_component
(search_value) Apply the search result to the component. It maps the values of the choice to the component’s args.
- Parameters
search_value – search value to be applied. Must be a integer specifying the choice ID.
Returns:
-
get_actual_value
(search_value) Return the mapped values
Returns: actual value in enum case
-
get_search_ranges
() → [<class ‘automl.defs.SearchRange’>] Determine the search ranges, which is a list of SearchRanges, one for each choice.
Returns: a list of a SearchRanges, one for each choice. Each range is simply the choice ID, starting from 0.
-
class
FloatParamMapper
(component, args: [], range_def: list, global_vars: dict, attr_comp ) Bases:
nvmidl.apps.automl.ssu.ParamMapper
FloatParamMapper is responsible for management of float parameters
- Parameters
component – the JSON component to be managed
args – list of arg names of the component
range_def – list of two numbers: min and max values of the range
-
do_update_component
(search_value) Apply the search result to the component.
- Parameters
search_value – search value to be applied.
be a valid number between the min and max values of the mapper. (Must) –
Returns:
-
get_search_ranges
() → [<class ‘automl.defs.SearchRange’>] Determine the search ranges, which is a list of a single SearchRange in the min and max values of the mapper.
Returns: a list of a single SearchRange
NOTE: the mapper does not normalize range to (0, 1).
-
class
ParamApplicator
(component, arg: str, expression: str, attr_comp) Bases:
object
-
apply
(gvars: dict)
-
-
class
ParamMapper
(component, args: [], global_vars: dict, attr_comp ) Bases:
object
ParamMapper is used to help manage search parameters when building search space from config_train.json in a MMAR. Specifically, it is responsible for:
Creating search ranges for a searchable parameter defined in the JSON.
Apply a search result to the parameter and update the appropriate component args in the JSON
- Parameters
component – the JSON component to be managed
args – list of arg names of the component
-
do_update_component
(search_value)
-
get_actual_value
(search_value) Return the mapped values (mainly for enum type search_value)
Returns: search_value or actual value in enum case
-
get_prop
(name: str)
-
get_search_ranges
() → [<class ‘automl.defs.SearchRange’>] Compute the search ranges of the param
Returns: a list of SearchRanges.
-
update_component
(search_value, global_vars: dict) Update the component’s arg(s) based on the search_value
- Parameters
search_value – the search result to be applied to the component
global_vars – global vars to be updated
Returns:
-
update_prop
(name: str, value)
-
class
SearchSpaceManager
(train_config, location: str, remove_search_augs=False) Bases:
dlmed.utils.json_scanner.JsonObjectProcessor
The SearchSpaceManager manages the search space determined from config_train.json in MMAR.
- Parameters
train_config – the JSON data from config_train.json in a MMAR
-
apply_search_result
(search_result: automl.defs.SearchResult) → str Apply search result to the config JSON and create a new config JSON. For each PRL/Value pair in the SearchResult, it finds the appropriate mapper corresponding to the PRL and calls the mapper to apply the value to the param.
- Parameters
search_result – a SearchResult object
Returns: an updated JSON config after applying the search result.
-
dump_vars
(prefix='\t')
-
extract_search_space
() → automl.defs.SearchSpace Determine the search space from the JSON config. It calls scan_json to scan the JSON structure and processes each element to find the ones that define search parameters. An appropriate ParamMapper is created for each search param based on its data type. All mappers are stored in a dict of PRL=>Mapper. Finally, it creates the search space by calling the mappers to produce search ranges for each param.
Returns: a SearchSpace object
-
get_vars_digest
(vars)
-
map_search_result
(search_result: automl.defs.SearchResult) → dict Map value of search_result into concrete value
- Parameters
search_result – a SearchResult object
Returns: a dictionary with keys == prl and values == concrete value
-
process_element
(node: dlmed.utils.json_scanner.Node) Implements the process_json method required by JsonObjectProcessor. It checks whether the JSON object contains search definition. If so, it creates a ParamMapper based on the search range definition. It creates a PRL for the param and keeps the mapper in a dict of PRL => Mapper.
- Parameters
node – json node
Returns:
-
class
TestController
(total_recs, max_recs_each_time) Bases:
automl.components.controllers.controller.Controller
-
initial_recommendation
(ctx: automl.defs.Context) → [<class ‘automl.defs.Recommendation’>] This method is called by the AutoML workflow engine to produce the initial set of recommendations. The controller must produce 1 or more recommendations. If no recommendation is produced, the AutoML workflow will stop immediately.
This method is called only once at the beginning of the AutoML process.
- Parameters
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations
-
refine_recommendation
(outcome: automl.defs.Outcome, ctx: automl.defs.Context) → [<class ‘automl.defs.Recommendation’>] This method is called by the AutoML workflow engine to produce a set of recommendations based on the result from a previous job.
The controller can produce 0 or more recommendations.
This method is called every time a job finishes executing a previous recommendation.
- Parameters
outcome – the result of executing the previous recommendation
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations, could be empty
-
set_search_space
(space: automl.defs.SearchSpace, ctx: automl.defs.Context) Set the search space. This is the search space that the controller will search against to produce recommendations. The controller must keep it for later use.
- Parameters
space – the search space
ctx – the context that enables across-component data sharing and communication
Returns:
NOTE: the controller should validate the search space and makes sure it is acceptable. In case the search space is not acceptable, the controller should either raise an exception or ask to stop the workflow by calling: ctx.ask_to_stop().
-