Modulus Launch Logging
- class modulus.launch.logging.launch.LaunchLogger(name_space, *args, **kwargs)[source]
Bases:
object
Modulus Launch logger
An abstracted logger class that takes care of several fundamental logging functions. This class should first be initialized and then used via a context manager. This will auto compute epoch metrics. This is the standard logger for Modulus examples.
- Parameters
name_space (str) – Namespace of logger to use. This will define the loggers title in the console and the wandb group the metric is plotted
epoch (int, optional) – Current epoch, by default 1
num_mini_batch (Union[int, None], optional) – Number of mini-batches used to calculate the epochs progress, by default None
profile (bool, optional) – Profile code using nvtx markers, by default False
mini_batch_log_freq (int, optional) – Frequency to log mini-batch losses, by default 100
epoch_alert_freq (Union[int, None], optional) – Epoch frequency to send training alert, by default None
Example
>>> from modulus.launch.logging import LaunchLogger >>> LaunchLogger.initialize() >>> epochs = 3 >>> for i in range(epochs): ... with LaunchLogger("Train", epoch=i) as log: ... # Log 3 mini-batches manually ... log.log_minibatch({"loss": 1.0}) ... log.log_minibatch({"loss": 2.0}) ... log.log_minibatch({"loss": 3.0})
- static initialize(use_wandb: bool = False, use_mlflow: bool = False)[source]
Initialize logging singleton
- Parameters
use_wandb (bool, optional) – Use WandB logging, by default False
use_mlflow (bool, optional) – Use MLFlow logging, by default False
- log_epoch(losses: Dict[str, float])[source]
Logs metrics for a single epoch
- Parameters
losses (Dict[str, float]) – Dictionary of metrics/loss values to log
- log_figure(figure, artifact_file: str = 'artifact', plot_dir: str = './', log_to_file: bool = False)[source]
Logs figures on root process to wand or mlflow. Will store it to file in case neither are selected.
- Parameters
figure (Figure) – matplotlib or plotly figure to plot
artifact_file (str, optional) – File name. CAUTION overrides old files of same name
plot_dir (str, optional) – output directory for plot
log_to_file (bool, optional) – set to true in case figure shall be stored to file in addition to logging it to mlflow/wandb
- log_minibatch(losses: Dict[str, float])[source]
Logs metrics for a mini-batch epoch
This function should be called every mini-batch iteration. It will accumulate loss values over a datapipe. At the end of a epoch the average of these losses from each mini-batch will get calculated.
- Parameters
losses (Dict[str, float]) – Dictionary of metrics/loss values to log
- classmethod toggle_mlflow(value: bool)[source]
Toggle MLFlow logging
- Parameters
value (bool) – Use MLFlow logging
- classmethod toggle_wandb(value: bool)[source]
Toggle WandB logging
- Parameters
value (bool) – Use WandB logging
- class modulus.launch.logging.console.PythonLogger(name: str = 'launch')[source]
Bases:
object
Simple console logger for DL training This is a WIP
- error(message: str)[source]
Log error
- file_logging(file_name: str = 'launch.log')[source]
Log to file
- info(message: str)[source]
Log info
- log(message: str)[source]
Log message
- success(message: str)[source]
Log success
- warning(message: str)[source]
Log warning
- class modulus.launch.logging.console.RankZeroLoggingWrapper(obj, dist)[source]
Bases:
object
Wrapper class to only log from rank 0 process in distributed training.
Weights and Biases Routines and Utilities
- modulus.launch.logging.wandb.alert(title, text, duration=300, level=0, is_master=True)[source]
Send alert.
- modulus.launch.logging.wandb.initialize_wandb(project: str, entity: str, name: str = 'train', group: Optional[str] = None, sync_tensorboard: bool = False, save_code: bool = False, resume: Optional[str] = None, config=None, mode: Literal['offline', 'online', 'disabled'] = 'offline', results_dir: Optional[str] = None)[source]
Function to initialize wandb client with the weights and biases server.
- Parameters
project (str) – Name of the project to sync data with
entity (str,) – Name of the wanbd entity
sync_tensorboard (bool, optional) – sync tensorboard summary writer with wandb, by default False
save_code (bool, optional) – Whether to push a copy of the code to wandb dashboard, by default False
name (str, optional) – Name of the task running, by default “train”
group (str, optional) – Group name of the task running. Good to set for ddp runs, by default None
resume (str, optional) – Sets the resuming behavior. Options: “allow”, “must”, “never”, “auto” or None, by default None.
config (optional) – a dictionary-like object for saving inputs , like hyperparameters. If dict, argparse or absl.flags, it will load the key value pairs into the wandb.config object. If str, it will look for a yaml file by that name, by default None.
mode (str, optional) – Can be “offline”, “online” or “disabled”, by default “offline”
results_dir (str, optional) – Output directory of the experiment, by default “/<run directory>/wandb”
- modulus.launch.logging.wandb.is_wandb_initialized()[source]
Check if wandb has been initialized.
- modulus.launch.logging.utils.create_ddp_group_tag(group_name: Optional[str] = None) → str[source]
Creates a common group tag for logging
For some reason this does not work with multi-node. Seems theres a bug in PyTorch when one uses a distributed util before DDP
- Parameters
group_name (str, optional) – Optional group name prefix. If None will use
"DDP_Group_"
, by default None- Returns
Group tag
- Return type
str