NVIDIA Modulus Core (Latest Release)
Core (Latest Release)

Modulus Launch Logging

class modulus.launch.logging.launch.LaunchLogger(name_space, *args, **kwargs)[source]

Bases: object

Modulus Launch logger

An abstracted logger class that takes care of several fundamental logging functions. This class should first be initialized and then used via a context manager. This will auto compute epoch metrics. This is the standard logger for Modulus examples.

Parameters
  • name_space (str) – Namespace of logger to use. This will define the loggers title in the console and the wandb group the metric is plotted

  • epoch (int, optional) – Current epoch, by default 1

  • num_mini_batch (Union[int, None], optional) – Number of mini-batches used to calculate the epochs progress, by default None

  • profile (bool, optional) – Profile code using nvtx markers, by default False

  • mini_batch_log_freq (int, optional) – Frequency to log mini-batch losses, by default 100

  • epoch_alert_freq (Union[int, None], optional) – Epoch frequency to send training alert, by default None

Example

Copy
Copied!
            

>>> from modulus.launch.logging import LaunchLogger >>> LaunchLogger.initialize() >>> epochs = 3 >>> for i in range(epochs): ... with LaunchLogger("Train", epoch=i) as log: ... # Log 3 mini-batches manually ... log.log_minibatch({"loss": 1.0}) ... log.log_minibatch({"loss": 2.0}) ... log.log_minibatch({"loss": 3.0})

static initialize(use_wandb: bool = False, use_mlflow: bool = False)[source]

Initialize logging singleton

Parameters
  • use_wandb (bool, optional) – Use WandB logging, by default False

  • use_mlflow (bool, optional) – Use MLFlow logging, by default False

log_epoch(losses: Dict[str, float])[source]

Logs metrics for a single epoch

Parameters

losses (Dict[str, float]) – Dictionary of metrics/loss values to log

log_figure(figure, artifact_file: str = 'artifact', plot_dir: str = './', log_to_file: bool = False)[source]

Logs figures on root process to wand or mlflow. Will store it to file in case neither are selected.

Parameters
  • figure (Figure) – matplotlib or plotly figure to plot

  • artifact_file (str, optional) – File name. CAUTION overrides old files of same name

  • plot_dir (str, optional) – output directory for plot

  • log_to_file (bool, optional) – set to true in case figure shall be stored to file in addition to logging it to mlflow/wandb

log_minibatch(losses: Dict[str, float])[source]

Logs metrics for a mini-batch epoch

This function should be called every mini-batch iteration. It will accumulate loss values over a datapipe. At the end of a epoch the average of these losses from each mini-batch will get calculated.

Parameters

losses (Dict[str, float]) – Dictionary of metrics/loss values to log

classmethod toggle_mlflow(value: bool)[source]

Toggle MLFlow logging

Parameters

value (bool) – Use MLFlow logging

classmethod toggle_wandb(value: bool)[source]

Toggle WandB logging

Parameters

value (bool) – Use WandB logging

class modulus.launch.logging.console.PythonLogger(name: str = 'launch')[source]

Bases: object

Simple console logger for DL training This is a WIP

error(message: str)[source]

Log error

file_logging(file_name: str = 'launch.log')[source]

Log to file

info(message: str)[source]

Log info

log(message: str)[source]

Log message

success(message: str)[source]

Log success

warning(message: str)[source]

Log warning

class modulus.launch.logging.console.RankZeroLoggingWrapper(obj, dist)[source]

Bases: object

Wrapper class to only log from rank 0 process in distributed training.

Weights and Biases Routines and Utilities

modulus.launch.logging.wandb.alert(title, text, duration=300, level=0, is_master=True)[source]

Send alert.

modulus.launch.logging.wandb.initialize_wandb(project: str, entity: str, name: str = 'train', group: Optional[str] = None, sync_tensorboard: bool = False, save_code: bool = False, resume: Optional[str] = None, wandb_id: Optional[str] = None, config=None, mode: Literal['offline', 'online', 'disabled'] = 'offline', results_dir: Optional[str] = None)[source]

Function to initialize wandb client with the weights and biases server.

Parameters
  • project (str) – Name of the project to sync data with

  • entity (str,) – Name of the wanbd entity

  • sync_tensorboard (bool, optional) – sync tensorboard summary writer with wandb, by default False

  • save_code (bool, optional) – Whether to push a copy of the code to wandb dashboard, by default False

  • name (str, optional) – Name of the task running, by default “train”

  • group (str, optional) – Group name of the task running. Good to set for ddp runs, by default None

  • resume (str, optional) – Sets the resuming behavior. Options: “allow”, “must”, “never”, “auto” or None, by default None.

  • wandb_id (str, optional) – A unique ID for this run, used for resuming. Used in conjunction with resume parameter to enable experiment resuming. See W&B documentation for more details: https://docs.wandb.ai/guides/runs/resuming/

  • config (optional) – a dictionary-like object for saving inputs , like hyperparameters. If dict, argparse or absl.flags, it will load the key value pairs into the wandb.config object. If str, it will look for a yaml file by that name, by default None.

  • mode (str, optional) – Can be “offline”, “online” or “disabled”, by default “offline”

  • results_dir (str, optional) – Output directory of the experiment, by default “/<run directory>/wandb”

modulus.launch.logging.wandb.is_wandb_initialized()[source]

Check if wandb has been initialized.

modulus.launch.logging.utils.create_ddp_group_tag(group_name: Optional[str] = None) → str[source]

Creates a common group tag for logging

For some reason this does not work with multi-node. Seems theres a bug in PyTorch when one uses a distributed util before DDP

Parameters

group_name (str, optional) – Optional group name prefix. If None will use "DDP_Group_", by default None

Returns

Group tag

Return type

str

Previous Modulus Utils
Next Modulus Launch Utils
© Copyright 2023, NVIDIA Modulus Team. Last updated on Nov 27, 2024.