NVIDIA Docs Hub NVIDIA Modulus NVIDIA Modulus Core (Latest Release) Modulus Launch Logging

Core (Latest Release)

Modulus Launch Logging

Launch Logger

class modulus.launch.logging.launch.LaunchLogger(name_space, *args, **kwargs)[source]

Bases: object

Modulus Launch logger

An abstracted logger class that takes care of several fundamental logging functions. This class should first be initialized and then used via a context manager. This will auto compute epoch metrics. This is the standard logger for Modulus examples.

Parameters

name_space (str) – Namespace of logger to use. This will define the loggers title in the console and the wandb group the metric is plotted
epoch (int, optional) – Current epoch, by default 1
num_mini_batch (Union[int, None], optional) – Number of mini-batches used to calculate the epochs progress, by default None
profile (bool, optional) – Profile code using nvtx markers, by default False
mini_batch_log_freq (int, optional) – Frequency to log mini-batch losses, by default 100
epoch_alert_freq (Union[int, None], optional) – Epoch frequency to send training alert, by default None

Example

Copy
Copied!

            
            >>> from modulus.launch.logging import LaunchLogger
>>> LaunchLogger.initialize()
>>> epochs = 3
>>> for i in range(epochs):
...   with LaunchLogger("Train", epoch=i) as log:
...     # Log 3 mini-batches manually
...     log.log_minibatch({"loss": 1.0})
...     log.log_minibatch({"loss": 2.0})
...     log.log_minibatch({"loss": 3.0})

static initialize(use_wandb: bool = False, use_mlflow: bool = False)[source]

Initialize logging singleton

Parameters

use_wandb (bool, optional) – Use WandB logging, by default False
use_mlflow (bool, optional) – Use MLFlow logging, by default False

log_epoch(losses: Dict[str, float])[source]

Logs metrics for a single epoch

Parameters: losses (Dict[str, float]) – Dictionary of metrics/loss values to log

log_figure(figure, artifact_file: str = 'artifact', plot_dir: str = './', log_to_file: bool = False)[source]

Logs figures on root process to wand or mlflow. Will store it to file in case neither are selected.

Parameters

figure (Figure) – matplotlib or plotly figure to plot
artifact_file (str, optional) – File name. CAUTION overrides old files of same name
plot_dir (str, optional) – output directory for plot
log_to_file (bool, optional) – set to true in case figure shall be stored to file in addition to logging it to mlflow/wandb

log_minibatch(losses: Dict[str, float])[source]

Logs metrics for a mini-batch epoch

This function should be called every mini-batch iteration. It will accumulate loss values over a datapipe. At the end of a epoch the average of these losses from each mini-batch will get calculated.

Parameters: losses (Dict[str, float]) – Dictionary of metrics/loss values to log

classmethod toggle_mlflow(value: bool)[source]

Toggle MLFlow logging

Parameters: value (bool) – Use MLFlow logging

classmethod toggle_wandb(value: bool)[source]

Toggle WandB logging

Parameters: value (bool) – Use WandB logging

Console Logger

class modulus.launch.logging.console.PythonLogger(name: str = 'launch')[source]

Bases: object

Simple console logger for DL training This is a WIP

error(message: str)[source]: Log error

file_logging(file_name: str = 'launch.log')[source]: Log to file

info(message: str)[source]: Log info

log(message: str)[source]: Log message

success(message: str)[source]: Log success

warning(message: str)[source]: Log warning

class modulus.launch.logging.console.RankZeroLoggingWrapper(obj, dist)[source]

Bases: object

Wrapper class to only log from rank 0 process in distributed training.

MLflow Logger

Weights and Biases Logger

Weights and Biases Routines and Utilities

modulus.launch.logging.wandb.alert(title, text, duration=300, level=0, is_master=True)[source]: Send alert.

modulus.launch.logging.wandb.initialize_wandb(project: str, entity: str, name: str = 'train', group: Optional[str] = None, sync_tensorboard: bool = False, save_code: bool = False, resume: Optional[str] = None, config=None, mode: Literal['offline', 'online', 'disabled'] = 'offline', results_dir: Optional[str] = None)[source]

Function to initialize wandb client with the weights and biases server.

Parameters

project (str) – Name of the project to sync data with
entity (str,) – Name of the wanbd entity
sync_tensorboard (bool, optional) – sync tensorboard summary writer with wandb, by default False
save_code (bool, optional) – Whether to push a copy of the code to wandb dashboard, by default False
name (str, optional) – Name of the task running, by default “train”
group (str, optional) – Group name of the task running. Good to set for ddp runs, by default None
resume (str, optional) – Sets the resuming behavior. Options: “allow”, “must”, “never”, “auto” or None, by default None.
config (optional) – a dictionary-like object for saving inputs , like hyperparameters. If dict, argparse or absl.flags, it will load the key value pairs into the wandb.config object. If str, it will look for a yaml file by that name, by default None.
mode (str, optional) – Can be “offline”, “online” or “disabled”, by default “offline”
results_dir (str, optional) – Output directory of the experiment, by default “/<run directory>/wandb”

modulus.launch.logging.wandb.is_wandb_initialized()[source]: Check if wandb has been initialized.

Logging utils

modulus.launch.logging.utils.create_ddp_group_tag(group_name: Optional[str] = None) → str[source]

Creates a common group tag for logging

For some reason this does not work with multi-node. Seems theres a bug in PyTorch when one uses a distributed util before DDP

Parameters: group_name (str, optional) – Optional group name prefix. If None will use "DDP_Group_", by default None
Returns: Group tag
Return type: str

Previous Modulus Utils

Next Modulus Launch Utils