medl.apps.automl package
- class BaseMMARExecutor(poll_interval=1.0, remove_search_augs=False)
Bases:
automl.components.executors.executor.Executor
Implements an Executor that does MMAR-based model training.
- abort(ctx: automl.defs.Context)
Called to abort the training process. The model training is run in a subprocess. This method tries to kill that subprocess.
- Parameters
ctx – the job execution context.
Returns:
- check_process(job_name, prc, ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.ProcessStatus" title="automl.defs.ProcessStatus" target="_self">automl.defs.ProcessStatus</a>
- determine_search_space(ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.SearchSpace" title="automl.defs.SearchSpace" target="_self">automl.defs.SearchSpace</a>
Determine search space based on config_train.json. The config_train.json has been augmented with search parameter definitions. This method extracts these definitions and creates a search space.
- Parameters
ctx – execution context
Returns: a SearchSpace object
- do_job(job_name, my_mmar_root, ctx: automl.defs.Context)
- execute(recommendation: automl.defs.Recommendation, ctx: automl.defs.Context) → object
Do model training based on the specified recommendation
1. clone a MMAR from mmar_root and place it in the run_root. Name the MMAR based on ‘job_name’ in the ctx.
create the config_train.json based on train_config and recommendation
put the MMAR path in ctx (key: ‘job_mmar_root’)
kick off clara train subprocess from the job_mmar_root
It monitors the progress of the training subprocess. When finished, it extracts the best validation value from the training stats file and returns it as final score.
- Parameters
recommendation – the recommendation to be executed
ctx – job context
Returns: a score
- stop_process(prc, ctx: automl.defs.Context)
- submit_process(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context) → object
- class MMARPropKey
Bases:
object
Defines context key names for MMAR related component implementation
- AUTOML_ROOT = '_automlRoot'
- EXEC_SUBPROCESS = '_execSubprocess'
- JOB_MMAR_ROOT = '_jobMmarRoot'
- JOB_START_TIME = '_mmarJobStart'
- MMAR_ROOT = '_mmarRoot'
- MMAR_TRACE = '_mmarTrace'
- MMAR_TRAIN_STATS = '_mmarTrainStats'
- RUN_ID = '_runId'
- RUN_ROOT = '_runRoot'
- TEMP_MMAR_ROOT = '_tempMmarRoot'
- TRAIN_CONFIG_FILE = '_trainConfigFile'
- class MMARStd
Bases:
object
Defines standard folder and file names used by MMAR.
- AUTOML_CONFIG = 'config_automl.json'
- AUTOML_DIR = 'automl'
- AUTOML_STATS_FILE = 'automl_stats_log.json'
- COMMAND_DIR = 'commands'
- CONFIG_DIR = 'config'
- ENV_CONFIG = 'environment.json'
- EVAL_DIR = 'eval'
- LOG_CONFIG = 'resources/log.config'
- LOG_FILE = 'log.txt'
- MODELS_DIR = 'models'
- SCORE_KEY = 'best_validation_metric'
- TRACE_FILE = 'trace.txt'
- TRAIN_CONFIG = 'config_train.json'
- TRAIN_ROUND_CMD = 'automl_train_round.sh'
- TRAIN_STATS = 'train_stats.json'
- class DummyController(max_rounds=1000)
Bases:
automl.components.controllers.controller.Controller
- initial_recommendation(ctx)
This method is called by the AutoML workflow engine to produce the initial set of recommendations. The controller must produce 1 or more recommendations. If no recommendation is produced, the AutoML workflow will stop immediately.
This method is called only once at the beginning of the AutoML process.
- Parameters
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations
- refine_recommendation(outcome: automl.defs.Outcome, ctx: automl.defs.Context)
This method is called by the AutoML workflow engine to produce a set of recommendations based on the result from a previous job.
The controller can produce 0 or more recommendations.
This method is called every time a job finishes executing a previous recommendation.
- Parameters
outcome – the result of executing the previous recommendation
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations, could be empty
- set_search_space(space, ctx)
Set the search space. This is the search space that the controller will search against to produce recommendations. The controller must keep it for later use.
- Parameters
space – the search space
ctx – the context that enables across-component data sharing and communication
Returns:
NOTE: the controller should validate the search space and makes sure it is acceptable. In case the search space is not acceptable, the controller should either raise an exception or ask to stop the workflow by calling: ctx.ask_to_stop().
- class DummyHandler
Bases:
automl.components.handlers.handler.Handler
- end_job(ctx)
The job execution has ended. Called from the job thread. See notes in start_job method. You can check the job status from prop ContextKey.JOB_STATUS in the ctx.
NOTE: if the start_job is called, it is guaranteed that end_job will be called.
- Parameters
ctx – the job context
Returns:
- recommendations_available(ctx)
The recommendations are available. Called from the main thread. You can get recommendations from prop ContextKey.RECOMMENDATIONS in the ctx.
- Parameters
ctx – main context
Returns:
- class MMARExecutor(msg_destination: Optional[str] = None, poll_interval=1.0, remove_search_augs=False)
Bases:
medl.apps.automl.base_mmar_exec.BaseMMARExecutor
Implements an Executor that does MMAR-based model training.
- Parameters
msg_destination (str) – destination for log messages produced during execution. Possible values: file (default), console, both, none
poll_interval (float) – how often to poll job execution status. Default 1 second.
remove_search_augs – whether to remove search augmentation JSON elements from generated train config. Default False.
- check_process(job_name, prc, ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.ProcessStatus" title="automl.defs.ProcessStatus" target="_self">automl.defs.ProcessStatus</a>
- stop_process(prc, ctx: automl.defs.Context)
Called to stop the training process. The model training is run in a subprocess. This method tries to kill that subprocess.
- Parameters
prc – the process object
ctx – the job execution context.
Returns:
- submit_process(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context) → object
- class MMARHandler(num_mmars_to_keep=1, stop_threshold=None, train_config_file=None, work_dir=None, key_mmar_content_only=False, data_list_file_name=None, keep_bad_mmars=True)
Bases:
automl.components.handlers.handler.Handler
,dlmed.hci.reg.CommandModule
MMARHandler implements functions to adapt AutoML to MMAR-based model training.
- Parameters
num_mmars_to_keep – number of job MMARs to keep during AutoML. Extra MMARs are deleted.
stop_threshold – if specified, when any job’s finishing score meets or exceeds this threshold, stop AutoML.
train_config_file – if specified, the name of train config file to use. If not specified, use config_train.json.
work_dir – if specified, the directory for AutoML work space. If not, default to ‘automl’.
- end_automl(ctx: automl.defs.Context)
Print results produced from this run
- Parameters
ctx – main context
Returns:
- end_job(ctx: automl.defs.Context)
Add job_mmar_root to self.job_mmars Check whether the result of this round (ctx: score) is better and adjust best_mmars Only keep num_mmars_to_keep MMARs and remove all others
- Parameters
ctx – job context
Returns:
NOTE: this method is called from different threads. Data access must be protected!
- get_spec()
- handle_summary()
- start_automl(ctx: automl.defs.Context)
1. Create folder for the run (run_root) based on run_id in mmar_root if the folder does not exist yet If the folder exists, warns the user and quit (the run already exists) 2. determine the base config_train.json and place it in run_root
- Parameters
ctx – main context
Returns:
- start_job(ctx: automl.defs.Context)
The job execution is about to start. Called from the job thread.
NOTE: this method could be called from multiple job threads at the same time. If you want to store across-job state data in the handler, you should ensure thread safety when updating such state data.
NOTE: the ctx is a per-job context, hence it is not subject to multi-thread access. Consider using the ctx to store per-job state data.
- Parameters
ctx – the job context
Returns:
- startup(ctx: automl.defs.Context)
The handler is being started up. Use this method to initialize the handler based on info in the context.
NOTE: this method is called in the order of handler chain. If your handler depends on some info provided by previous handlers, you can get such info from the ctx.
- Parameters
ctx – main context
Returns:
- class MMARSummary(root: str, score, status: automl.defs.ProcessStatus)
Bases:
object
Define a simple structure for job execution stats summary
- Parameters
root – the job’s MMAR root
score – finishing score of the job
- class MMARStatsHandler(stats_file_name='automl_stats_log.json')
Bases:
automl.components.handlers.handler.Handler
Defines a stats handler that writes to a json file the search space and recommendations as well as the job stats after each job is finished.
If a new set of recommendations is made available, a new file will be written to for each set of recommendations as a workaround for now to avoid overwriting the recommendations in the original json.
- Parameters
stats_file_name – string of file to write output of json stats to, relative to this run’s root directory
- end_job(ctx: automl.defs.Context)
Update json stats with stats of the round after each round is completed
- Parameters
ctx – job context
- recommendations_available(ctx: automl.defs.Context)
Update stats json with recommendations :param ctx: job context
- search_space_available(ctx: automl.defs.Context)
Initialize file to write to, and write out search space to it
- Parameters
ctx – job context
- class npEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
json.encoder.JSONEncoder
Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.
If specified, separators should be an (item_separator, key_separator) tuple. The default is (‘, ‘, ‘: ‘) if indent is
None
and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a
TypeError
.- default(o)
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- class MMARTrainer
Bases:
object
- train(ctx: automl.defs.Context)
- class NGCExecutor(image, dataset_id, dataset_path, workspace_id, workspace_path, org='nvidian', team='dlmed', ace='nv-us-west-2', instance='dgx1v.16g.1.norm', result_dir='/results', poll_interval=30, remove_search_augs=False)
Bases:
medl.apps.automl.base_mmar_exec.BaseMMARExecutor
Implements an Executor that does MMAR-based model training deployed onto NGC clusters.
- check_process(job_name, prc, ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.ProcessStatus" title="automl.defs.ProcessStatus" target="_self">automl.defs.ProcessStatus</a>
- stop_process(prc, ctx: automl.defs.Context)
Called to abort the training process. NGC jobs can be stopped via web ui, ‘ngc batch kill,’ or ngc system when timeout. This method sends ‘ngc batch kill’ to stop the job.
- Parameters
prc – the ngc job id
ctx – the job execution context.
Returns:
- submit_process(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context)
- class EnumParamMapper
Bases:
medl.apps.automl.ssu.ParamMapper
FloatParamMapper is responsible for management of enum parameters
- Parameters
component – the JSON component to be managed
args – list of arg names of the component
range_def – list of choices, each choice is a list of values, one for each arg
- do_update_component(search_value)
Apply the search result to the component. It maps the values of the choice to the component’s args.
- Parameters
search_value – search value to be applied. Must be a integer specifying the choice ID.
Returns:
- get_actual_value(search_value)
Return the mapped values
Returns: actual value in enum case
- get_search_ranges()
Determine the search ranges, which is a list of SearchRanges, one for each choice.
Returns: a list of a SearchRanges, one for each choice. Each range is simply the choice ID, starting from 0.
- class FloatParamMapper
Bases:
medl.apps.automl.ssu.ParamMapper
FloatParamMapper is responsible for management of float parameters
- Parameters
component – the JSON component to be managed
args – list of arg names of the component
range_def – list of two numbers: min and max values of the range
- do_update_component(search_value)
Apply the search result to the component.
- Parameters
search_value – search value to be applied.
mapper. (Must be a valid number between the min and max values of the) –
Returns:
- get_search_ranges()
Determine the search ranges, which is a list of a single SearchRange in the min and max values of the mapper.
Returns: a list of a single SearchRange
NOTE: the mapper does not normalize range to (0, 1).
- class ParamApplicator(component, arg: str, expression: str, attr_comp)
Bases:
object
- apply(gvars: dict)
- class ParamMapper
Bases:
object
ParamMapper is used to help manage search parameters when building search space from config_train.json in a MMAR. Specifically, it is responsible for:
Creating search ranges for a searchable parameter defined in the JSON.
Apply a search result to the parameter and update the appropriate component args in the JSON
- Parameters
component – the JSON component to be managed
args – list of arg names of the component
- do_update_component(search_value)
- get_actual_value(search_value)
Return the mapped values (mainly for enum type search_value)
Returns: search_value or actual value in enum case
- get_prop(name: str)
- get_search_ranges()
Compute the search ranges of the param
Returns: a list of SearchRanges.
- update_component(search_value, global_vars: dict)
Update the component’s arg(s) based on the search_value
- Parameters
search_value – the search result to be applied to the component
global_vars – global vars to be updated
Returns:
- update_prop(name: str, value)
- class SearchSpaceManager(train_config, location: str, remove_search_augs=False)
Bases:
dlmed.utils.json_scanner.JsonObjectProcessor
The SearchSpaceManager manages the search space determined from config_train.json in MMAR.
- Parameters
train_config – the JSON data from config_train.json in a MMAR
- apply_search_result(search_result: automl.defs.SearchResult) → str
Apply search result to the config JSON and create a new config JSON. For each PRL/Value pair in the SearchResult, it finds the appropriate mapper corresponding to the PRL and calls the mapper to apply the value to the param.
- Parameters
search_result – a SearchResult object
Returns: an updated JSON config after applying the search result.
- dump_vars(prefix='\t')
- extract_search_space() → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.SearchSpace" title="automl.defs.SearchSpace" target="_self">automl.defs.SearchSpace</a>
Determine the search space from the JSON config. It calls scan_json to scan the JSON structure and processes each element to find the ones that define search parameters. An appropriate ParamMapper is created for each search param based on its data type. All mappers are stored in a dict of PRL=>Mapper. Finally, it creates the search space by calling the mappers to produce search ranges for each param.
Returns: a SearchSpace object
- get_vars_digest(vars)
- map_search_result(search_result: automl.defs.SearchResult) → dict
Map value of search_result into concrete value
- Parameters
search_result – a SearchResult object
Returns: a dictionary with keys == prl and values == concrete value
- process_element(node: dlmed.utils.json_scanner.Node)
Implements the process_json method required by JsonObjectProcessor. It checks whether the JSON object contains search definition. If so, it creates a ParamMapper based on the search range definition. It creates a PRL for the param and keeps the mapper in a dict of PRL => Mapper.
- Parameters
node – json node
Returns:
- class TestController(total_recs, max_recs_each_time)
Bases:
automl.components.controllers.controller.Controller
- initial_recommendation()
This method is called by the AutoML workflow engine to produce the initial set of recommendations. The controller must produce 1 or more recommendations. If no recommendation is produced, the AutoML workflow will stop immediately.
This method is called only once at the beginning of the AutoML process.
- Parameters
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations
- refine_recommendation()
This method is called by the AutoML workflow engine to produce a set of recommendations based on the result from a previous job.
The controller can produce 0 or more recommendations.
This method is called every time a job finishes executing a previous recommendation.
- Parameters
outcome – the result of executing the previous recommendation
ctx – the context that enables across-component data sharing and communication
Returns: a list of recommendations, could be empty
- set_search_space(space: automl.defs.SearchSpace, ctx: automl.defs.Context)
Set the search space. This is the search space that the controller will search against to produce recommendations. The controller must keep it for later use.
- Parameters
space – the search space
ctx – the context that enables across-component data sharing and communication
Returns:
NOTE: the controller should validate the search space and makes sure it is acceptable. In case the search space is not acceptable, the controller should either raise an exception or ask to stop the workflow by calling: ctx.ask_to_stop().