NVIDIA Clara Train 4.1
1.0

medl.apps.automl package

class BaseMMARExecutor(poll_interval=1.0, remove_search_augs=False)

Bases: automl.components.executors.executor.Executor

Implements an Executor that does MMAR-based model training.

abort(ctx: automl.defs.Context)

Called to abort the training process. The model training is run in a subprocess. This method tries to kill that subprocess.

Parameters

ctx – the job execution context.

Returns:

check_process(job_name, prc, ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.ProcessStatus" title="automl.defs.ProcessStatus" target="_self">automl.defs.ProcessStatus</a>
determine_search_space(ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.SearchSpace" title="automl.defs.SearchSpace" target="_self">automl.defs.SearchSpace</a>

Determine search space based on config_train.json. The config_train.json has been augmented with search parameter definitions. This method extracts these definitions and creates a search space.

Parameters

ctx – execution context

Returns: a SearchSpace object

do_job(job_name, my_mmar_root, ctx: automl.defs.Context)
execute(recommendation: automl.defs.Recommendation, ctx: automl.defs.Context) → object

Do model training based on the specified recommendation

1. clone a MMAR from mmar_root and place it in the run_root. Name the MMAR based on ‘job_name’ in the ctx.

  1. create the config_train.json based on train_config and recommendation

  2. put the MMAR path in ctx (key: ‘job_mmar_root’)

  3. kick off clara train subprocess from the job_mmar_root

It monitors the progress of the training subprocess. When finished, it extracts the best validation value from the training stats file and returns it as final score.

Parameters
  • recommendation – the recommendation to be executed

  • ctx – job context

Returns: a score

stop_process(prc, ctx: automl.defs.Context)
submit_process(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context) → object
class MMARPropKey

Bases: object

Defines context key names for MMAR related component implementation

AUTOML_ROOT = '_automlRoot'
EXEC_SUBPROCESS = '_execSubprocess'
JOB_MMAR_ROOT = '_jobMmarRoot'
JOB_START_TIME = '_mmarJobStart'
MMAR_ROOT = '_mmarRoot'
MMAR_TRACE = '_mmarTrace'
MMAR_TRAIN_STATS = '_mmarTrainStats'
RUN_ID = '_runId'
RUN_ROOT = '_runRoot'
TEMP_MMAR_ROOT = '_tempMmarRoot'
TRAIN_CONFIG_FILE = '_trainConfigFile'
class MMARStd

Bases: object

Defines standard folder and file names used by MMAR.

AUTOML_CONFIG = 'config_automl.json'
AUTOML_DIR = 'automl'
AUTOML_STATS_FILE = 'automl_stats_log.json'
COMMAND_DIR = 'commands'
CONFIG_DIR = 'config'
ENV_CONFIG = 'environment.json'
EVAL_DIR = 'eval'
LOG_CONFIG = 'resources/log.config'
LOG_FILE = 'log.txt'
MODELS_DIR = 'models'
SCORE_KEY = 'best_validation_metric'
TRACE_FILE = 'trace.txt'
TRAIN_CONFIG = 'config_train.json'
TRAIN_ROUND_CMD = 'automl_train_round.sh'
TRAIN_STATS = 'train_stats.json'
class DummyController(max_rounds=1000)

Bases: automl.components.controllers.controller.Controller

initial_recommendation(ctx)

This method is called by the AutoML workflow engine to produce the initial set of recommendations. The controller must produce 1 or more recommendations. If no recommendation is produced, the AutoML workflow will stop immediately.

This method is called only once at the beginning of the AutoML process.

Parameters

ctx – the context that enables across-component data sharing and communication

Returns: a list of recommendations

refine_recommendation(outcome: automl.defs.Outcome, ctx: automl.defs.Context)

This method is called by the AutoML workflow engine to produce a set of recommendations based on the result from a previous job.

The controller can produce 0 or more recommendations.

This method is called every time a job finishes executing a previous recommendation.

Parameters
  • outcome – the result of executing the previous recommendation

  • ctx – the context that enables across-component data sharing and communication

Returns: a list of recommendations, could be empty

set_search_space(space, ctx)

Set the search space. This is the search space that the controller will search against to produce recommendations. The controller must keep it for later use.

Parameters
  • space – the search space

  • ctx – the context that enables across-component data sharing and communication

Returns:

NOTE: the controller should validate the search space and makes sure it is acceptable. In case the search space is not acceptable, the controller should either raise an exception or ask to stop the workflow by calling: ctx.ask_to_stop().

class DummyHandler

Bases: automl.components.handlers.handler.Handler

end_job(ctx)

The job execution has ended. Called from the job thread. See notes in start_job method. You can check the job status from prop ContextKey.JOB_STATUS in the ctx.

NOTE: if the start_job is called, it is guaranteed that end_job will be called.

Parameters

ctx – the job context

Returns:

recommendations_available(ctx)

The recommendations are available. Called from the main thread. You can get recommendations from prop ContextKey.RECOMMENDATIONS in the ctx.

Parameters

ctx – main context

Returns:

class MMARExecutor(msg_destination: Optional[str] = None, poll_interval=1.0, remove_search_augs=False)

Bases: medl.apps.automl.base_mmar_exec.BaseMMARExecutor

Implements an Executor that does MMAR-based model training.

Parameters
  • msg_destination (str) – destination for log messages produced during execution. Possible values: file (default), console, both, none

  • poll_interval (float) – how often to poll job execution status. Default 1 second.

  • remove_search_augs – whether to remove search augmentation JSON elements from generated train config. Default False.

check_process(job_name, prc, ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.ProcessStatus" title="automl.defs.ProcessStatus" target="_self">automl.defs.ProcessStatus</a>
stop_process(prc, ctx: automl.defs.Context)

Called to stop the training process. The model training is run in a subprocess. This method tries to kill that subprocess.

Parameters
  • prc – the process object

  • ctx – the job execution context.

Returns:

submit_process(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context) → object
class MMARHandler(num_mmars_to_keep=1, stop_threshold=None, train_config_file=None, work_dir=None, key_mmar_content_only=False, data_list_file_name=None, keep_bad_mmars=True)

Bases: automl.components.handlers.handler.Handler, dlmed.hci.reg.CommandModule

MMARHandler implements functions to adapt AutoML to MMAR-based model training.

Parameters
  • num_mmars_to_keep – number of job MMARs to keep during AutoML. Extra MMARs are deleted.

  • stop_threshold – if specified, when any job’s finishing score meets or exceeds this threshold, stop AutoML.

  • train_config_file – if specified, the name of train config file to use. If not specified, use config_train.json.

  • work_dir – if specified, the directory for AutoML work space. If not, default to ‘automl’.

end_automl(ctx: automl.defs.Context)

Print results produced from this run

Parameters

ctx – main context

Returns:

end_job(ctx: automl.defs.Context)

Add job_mmar_root to self.job_mmars Check whether the result of this round (ctx: score) is better and adjust best_mmars Only keep num_mmars_to_keep MMARs and remove all others

Parameters

ctx – job context

Returns:

NOTE: this method is called from different threads. Data access must be protected!

get_spec()
handle_summary()
start_automl(ctx: automl.defs.Context)

1. Create folder for the run (run_root) based on run_id in mmar_root if the folder does not exist yet If the folder exists, warns the user and quit (the run already exists) 2. determine the base config_train.json and place it in run_root

Parameters

ctx – main context

Returns:

start_job(ctx: automl.defs.Context)

The job execution is about to start. Called from the job thread.

NOTE: this method could be called from multiple job threads at the same time. If you want to store across-job state data in the handler, you should ensure thread safety when updating such state data.

NOTE: the ctx is a per-job context, hence it is not subject to multi-thread access. Consider using the ctx to store per-job state data.

Parameters

ctx – the job context

Returns:

startup(ctx: automl.defs.Context)

The handler is being started up. Use this method to initialize the handler based on info in the context.

NOTE: this method is called in the order of handler chain. If your handler depends on some info provided by previous handlers, you can get such info from the ctx.

Parameters

ctx – main context

Returns:

class MMARSummary(root: str, score, status: automl.defs.ProcessStatus)

Bases: object

Define a simple structure for job execution stats summary

Parameters
  • root – the job’s MMAR root

  • score – finishing score of the job

class MMARStatsHandler(stats_file_name='automl_stats_log.json')

Bases: automl.components.handlers.handler.Handler

Defines a stats handler that writes to a json file the search space and recommendations as well as the job stats after each job is finished.

If a new set of recommendations is made available, a new file will be written to for each set of recommendations as a workaround for now to avoid overwriting the recommendations in the original json.

Parameters

stats_file_name – string of file to write output of json stats to, relative to this run’s root directory

end_job(ctx: automl.defs.Context)

Update json stats with stats of the round after each round is completed

Parameters

ctx – job context

recommendations_available(ctx: automl.defs.Context)

Update stats json with recommendations :param ctx: job context

search_space_available(ctx: automl.defs.Context)

Initialize file to write to, and write out search space to it

Parameters

ctx – job context

class npEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: json.encoder.JSONEncoder

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (‘, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(o)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

Copy
Copied!
            

def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)

class MMARTrainer

Bases: object

train(ctx: automl.defs.Context)
class NGCExecutor(image, dataset_id, dataset_path, workspace_id, workspace_path, org='nvidian', team='dlmed', ace='nv-us-west-2', instance='dgx1v.16g.1.norm', result_dir='/results', poll_interval=30, remove_search_augs=False)

Bases: medl.apps.automl.base_mmar_exec.BaseMMARExecutor

Implements an Executor that does MMAR-based model training deployed onto NGC clusters.

check_process(job_name, prc, ctx: automl.defs.Context) → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.ProcessStatus" title="automl.defs.ProcessStatus" target="_self">automl.defs.ProcessStatus</a>
stop_process(prc, ctx: automl.defs.Context)

Called to abort the training process. NGC jobs can be stopped via web ui, ‘ngc batch kill,’ or ngc system when timeout. This method sends ‘ngc batch kill’ to stop the job.

Parameters
  • prc – the ngc job id

  • ctx – the job execution context.

Returns:

submit_process(job_name, my_mmar_root, train_cmd_script, ctx: automl.defs.Context)
class EnumParamMapper

Bases: medl.apps.automl.ssu.ParamMapper

FloatParamMapper is responsible for management of enum parameters

Parameters
  • component – the JSON component to be managed

  • args – list of arg names of the component

  • range_def – list of choices, each choice is a list of values, one for each arg

do_update_component(search_value)

Apply the search result to the component. It maps the values of the choice to the component’s args.

Parameters

search_value – search value to be applied. Must be a integer specifying the choice ID.

Returns:

get_actual_value(search_value)

Return the mapped values

Returns: actual value in enum case

get_search_ranges()

Determine the search ranges, which is a list of SearchRanges, one for each choice.

Returns: a list of a SearchRanges, one for each choice. Each range is simply the choice ID, starting from 0.

class FloatParamMapper

Bases: medl.apps.automl.ssu.ParamMapper

FloatParamMapper is responsible for management of float parameters

Parameters
  • component – the JSON component to be managed

  • args – list of arg names of the component

  • range_def – list of two numbers: min and max values of the range

do_update_component(search_value)

Apply the search result to the component.

Parameters
  • search_value – search value to be applied.

  • mapper. (Must be a valid number between the min and max values of the) –

Returns:

get_search_ranges()

Determine the search ranges, which is a list of a single SearchRange in the min and max values of the mapper.

Returns: a list of a single SearchRange

NOTE: the mapper does not normalize range to (0, 1).

class ParamApplicator(component, arg: str, expression: str, attr_comp)

Bases: object

apply(gvars: dict)
class ParamMapper

Bases: object

ParamMapper is used to help manage search parameters when building search space from config_train.json in a MMAR. Specifically, it is responsible for:

  1. Creating search ranges for a searchable parameter defined in the JSON.

  2. Apply a search result to the parameter and update the appropriate component args in the JSON

Parameters
  • component – the JSON component to be managed

  • args – list of arg names of the component

do_update_component(search_value)
get_actual_value(search_value)

Return the mapped values (mainly for enum type search_value)

Returns: search_value or actual value in enum case

get_prop(name: str)
get_search_ranges()

Compute the search ranges of the param

Returns: a list of SearchRanges.

update_component(search_value, global_vars: dict)

Update the component’s arg(s) based on the search_value

Parameters
  • search_value – the search result to be applied to the component

  • global_vars – global vars to be updated

Returns:

update_prop(name: str, value)
class SearchSpaceManager(train_config, location: str, remove_search_augs=False)

Bases: dlmed.utils.json_scanner.JsonObjectProcessor

The SearchSpaceManager manages the search space determined from config_train.json in MMAR.

Parameters

train_config – the JSON data from config_train.json in a MMAR

apply_search_result(search_result: automl.defs.SearchResult) → str

Apply search result to the config JSON and create a new config JSON. For each PRL/Value pair in the SearchResult, it finds the appropriate mapper corresponding to the PRL and calls the mapper to apply the value to the param.

Parameters

search_result – a SearchResult object

Returns: an updated JSON config after applying the search result.

dump_vars(prefix='\t')
extract_search_space() → <a class="reference internal" href="../automl/apidocs/automl.html#automl.defs.SearchSpace" title="automl.defs.SearchSpace" target="_self">automl.defs.SearchSpace</a>

Determine the search space from the JSON config. It calls scan_json to scan the JSON structure and processes each element to find the ones that define search parameters. An appropriate ParamMapper is created for each search param based on its data type. All mappers are stored in a dict of PRL=>Mapper. Finally, it creates the search space by calling the mappers to produce search ranges for each param.

Returns: a SearchSpace object

get_vars_digest(vars)
map_search_result(search_result: automl.defs.SearchResult) → dict

Map value of search_result into concrete value

Parameters

search_result – a SearchResult object

Returns: a dictionary with keys == prl and values == concrete value

process_element(node: dlmed.utils.json_scanner.Node)

Implements the process_json method required by JsonObjectProcessor. It checks whether the JSON object contains search definition. If so, it creates a ParamMapper based on the search range definition. It creates a PRL for the param and keeps the mapper in a dict of PRL => Mapper.

Parameters

node – json node

Returns:

class TestController(total_recs, max_recs_each_time)

Bases: automl.components.controllers.controller.Controller

initial_recommendation()

This method is called by the AutoML workflow engine to produce the initial set of recommendations. The controller must produce 1 or more recommendations. If no recommendation is produced, the AutoML workflow will stop immediately.

This method is called only once at the beginning of the AutoML process.

Parameters

ctx – the context that enables across-component data sharing and communication

Returns: a list of recommendations

refine_recommendation()

This method is called by the AutoML workflow engine to produce a set of recommendations based on the result from a previous job.

The controller can produce 0 or more recommendations.

This method is called every time a job finishes executing a previous recommendation.

Parameters
  • outcome – the result of executing the previous recommendation

  • ctx – the context that enables across-component data sharing and communication

Returns: a list of recommendations, could be empty

set_search_space(space: automl.defs.SearchSpace, ctx: automl.defs.Context)

Set the search space. This is the search space that the controller will search against to produce recommendations. The controller must keep it for later use.

Parameters
  • space – the search space

  • ctx – the context that enables across-component data sharing and communication

Returns:

NOTE: the controller should validate the search space and makes sure it is acceptable. In case the search space is not acceptable, the controller should either raise an exception or ask to stop the workflow by calling: ctx.ask_to_stop().

© Copyright 2021, NVIDIA. Last updated on Feb 2, 2023.