Debugging - NVIDIA Docs

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Debugging

When working with a customized version of NeMo code, encountering errors and bugs is a common scenario. To effectively debug these issues, follow these steps:

Switch to Interactive Mode: Begin by switching to interactive mode for debugging your deep learning code. This can be done by starting the NeMo framework container using srun or docker run.
Modify the Launcher Command: Execute the same launcher command that led to the issue, but with a slight modification: set cluster_type=interactive and adjust the command to use a single node. This adjustment allows the NeMo launcher to operate in interactive mode and execute via bash. If your intention is solely to generate the YAML configuration file and bash scripts without actually running them, activate the dry run feature by setting the environment variable NEMO_LAUNCHER_DEBUG=1.
Use Generated Scripts for Debugging: After reproducing the issue using a single node and bash, navigate to the launcher experiment results folder. Here, you can directly utilize the generated .sh scripts to initiate calls to the NeMo framework for further debugging.

Previous Launcher Multirun

Next Model Alignment