bridge.training.tensor_inspect#
Module Contents#
Functions#
Initialize NVIDIA-DL-Framework-Inspect before model construction. |
|
Attach supported metric loggers (TensorBoard, W&B raw module). |
|
Finalize setup after model creation: attach loggers, set names and groups. |
|
Advance DLFw Inspect step if enabled. |
|
Shutdown DLFw Inspect if enabled. |
Data#
API#
- bridge.training.tensor_inspect.MISSING_NVINSPECT_MSG#
‘nvdlfw_inspect is not available. Please install it with
pip install nvdlfw-inspect.’
- bridge.training.tensor_inspect.initialize_tensor_inspect_pre_model_initialization(
- tensor_inspect_config: megatron.bridge.training.config.TensorInspectConfig | None,
Initialize NVIDIA-DL-Framework-Inspect before model construction.
When enabled and the API is unavailable or fails, raise to stop training.
- bridge.training.tensor_inspect._maybe_attach_metric_loggers(
- tensorboard_logger: Any | None,
- wandb_logger: Any | None,
Attach supported metric loggers (TensorBoard, W&B raw module).
- bridge.training.tensor_inspect.finalize_tensor_inspect_post_model_initialization(
- tensor_inspect_config: megatron.bridge.training.config.TensorInspectConfig | None,
- model: list[megatron.core.transformer.MegatronModule],
- tensorboard_logger: Any | None,
- wandb_logger: Any | None,
- current_training_step: int | None = None,
Finalize setup after model creation: attach loggers, set names and groups.
- bridge.training.tensor_inspect.tensor_inspect_step_if_enabled(
- tensor_inspect_config: megatron.bridge.training.config.TensorInspectConfig | None,
Advance DLFw Inspect step if enabled.
- bridge.training.tensor_inspect.tensor_inspect_end_if_enabled(
- tensor_inspect_config: megatron.bridge.training.config.TensorInspectConfig | None,
Shutdown DLFw Inspect if enabled.