Modulus Launch Logging
- class modulus.launch.logging.launch.LaunchLogger(name_space, *args, **kwargs)[source]
Bases:
object
Modulus Launch logger
An abstracted logger class that takes care of several fundamental logging functions. This class should first be initialized and then used via a context manager. This will auto compute epoch metrics. This is the standard logger for Modulus examples.
- Parameters
name_space (str) – Namespace of logger to use. This will define the loggers title in the console and the wandb group the metric is plotted
epoch (int, optional) – Current epoch, by default 1
num_mini_batch (Union[int, None], optional) – Number of mini-batches used to calculate the epochs progress, by default None
profile (bool, optional) – Profile code using nvtx markers, by default False
mini_batch_log_freq (int, optional) – Frequency to log mini-batch losses, by default 100
epoch_alert_freq (Union[int, None], optional) – Epoch frequency to send training alert, by default None
Example
>>> from modulus.launch.logging import LaunchLogger >>> LaunchLogger.initialize() >>> epochs = 3 >>> for i in range(epochs): ... with LaunchLogger("Train", epoch=i) as log: ... # Log 3 mini-batches manually ... log.log_minibatch({"loss": 1.0}) ... log.log_minibatch({"loss": 2.0}) ... log.log_minibatch({"loss": 3.0})
- static initialize(use_wandb: bool = False, use_mlflow: bool = False)[source]
Initialize logging singleton
- Parameters
use_wandb (bool, optional) – Use WandB logging, by default False
use_mlflow (bool, optional) – Use MLFlow logging, by default False
- log_epoch(losses: Dict[str, float])[source]
Logs metrics for a single epoch
- Parameters
losses (Dict[str, float]) – Dictionary of metrics/loss values to log
- log_minibatch(losses: Dict[str, float])[source]
Logs metrics for a mini-batch epoch
This function should be called every mini-batch iteration. It will accumulate loss values over a datapipe. At the end of a epoch the average of these losses from each mini-batch will get calculated.
- Parameters
losses (Dict[str, float]) – Dictionary of metrics/loss values to log
- classmethod toggle_mlflow(value: bool)[source]
Toggle MLFlow logging
- Parameters
value (bool) – Use MLFlow logging
- classmethod toggle_wandb(value: bool)[source]
Toggle WandB logging
- Parameters
value (bool) – Use WandB logging
- class modulus.launch.logging.console.PythonLogger(name: str = 'launch')[source]
Bases:
object
Simple console logger for DL training This is a WIP
- error(message: str)[source]
Log error
- file_logging(file_name: str = 'launch.log')[source]
Log to file
- info(message: str)[source]
Log info
- log(message: str)[source]
Log message
- success(message: str)[source]
Log success
- warning(message: str)[source]
Log warning
- class modulus.launch.logging.console.RankZeroLoggingWrapper(obj, dist)[source]
Bases: object
Wrapper class to only log from rank 0 process in distributed training.
- modulus.launch.logging.mlflow.check_mlflow_logged_in(client: MlflowClient)[source]
Checks to see if MLFlow URI is functioning
This isn’t the best solution right now and overrides http timeout. Can update if MLFlow use is increased.
- modulus.launch.logging.mlflow.initialize_mlflow(experiment_name: str, experiment_desc: Optional[str] = None, run_name: Optional[str] = None, run_desc: Optional[str] = None, user_name: Optional[str] = None, mode: Literal['offline', 'online', 'ngc'] = 'offline', tracking_location: Optional[str] = None, artifact_location: Optional[str] = None) → Tuple[MlflowClient, Run][source]
Initializes MLFlow logging client and run.
- Parameters
experiment_name (str) – Experiment name
experiment_desc (str, optional) – Experiment description, by default None
run_name (str, optional) – Run name, by default None
run_desc (str, optional) – Run description, by default None
user_name (str, optional) – User name, by default None
mode (str, optional) – MLFlow mode. Supports “offline”, “online” and “ngc”. Offline mode records logs to local file system. Online mode is for remote tracking servers. NGC is specific standardized setup for NGC runs, default “offline”
tracking_location (str, optional) – Tracking location for MLFlow. For offline this would be an absolute folder directory. For online mode this would be a http URI or databricks. For NGC, this option is ignored, by default “/<run directory>/mlruns”
artifact_location (str, optional) – Optional separate artifact location, by default None
NoteFor NGC mode, one needs to mount a NGC workspace / folder system with a metric folder at /mlflow/mlflow_metrics/ and a artifact folder at /mlflow/mlflow_artifacts/.
NoteThis will set up Modulus Launch logger for MLFlow logging. Only one MLFlow logging client is supported with the Modulus Launch logger.
- Returns
- Return type
Returns MLFlow logging client and active run object
Tuple[MlflowClient, Run]
Weights and Biases Routines and Utilities
- modulus.launch.logging.wandb.alert(title, text, duration=300, level=0, is_master=True)[source]
Send alert.
- modulus.launch.logging.wandb.initialize_wandb(project: str, entity: str, name: str = 'train', group: Optional[str] = None, sync_tensorboard: bool = False, save_code: bool = False, resume: Optional[str] = None, config=None, mode: Literal['offline', 'online', 'disabled'] = 'offline', results_dir: Optional[str] = None)[source]
Function to initialize wandb client with the weights and biases server.
- Parameters
project (str) – Name of the project to sync data with
entity (str,) – Name of the wanbd entity
sync_tensorboard (bool, optional) – sync tensorboard summary writer with wandb, by default False
save_code (bool, optional) – Whether to push a copy of the code to wandb dashboard, by default False
name (str, optional) – Name of the task running, by default “train”
group (str, optional) – Group name of the task running. Good to set for ddp runs, by default None
resume (str, optional) – Sets the resuming behavior. Options: “allow”, “must”, “never”, “auto” or None, by default None.
config (optional) – a dictionary-like object for saving inputs , like hyperparameters. If dict, argparse or absl.flags, it will load the key value pairs into the wandb.config object. If str, it will look for a yaml file by that name, by default None.
mode (str, optional) – Can be “offline”, “online” or “disabled”, by default “offline”
results_dir (str, optional) – Output directory of the experiment, by default “/<run directory>/wandb”
- modulus.launch.logging.wandb.is_wandb_initialized()[source]
Check if wandb has been initialized.
- modulus.launch.logging.utils.create_ddp_group_tag(group_name: Optional[str] = None) → str[source]
Creates a common group tag for logging
For some reason this does not work with multi-node. Seems theres a bug in PyTorch when one uses a distributed util before DDP
- Parameters
- Returns
- Return type
group_name (str, optional) – Optional group name prefix. If None will use “DDP_Group_”, by default None
Group tag
str