Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Best Practices for NeMo Developers#
Import Guarding#
Sometimes, developers have an optional package they would like to use only when it is available. In this case, the developer may want to follow different code paths depending on whether the optional package is present. Other times, a developer may want to require a package for their collection, but they may not want to make that package required for all collections. In either of these cases, it’s important to guard the optional imports.
In import_utils.py, NeMo provides the utilities required to handle the import of optional packages effectively. This script is adapted from cuML’s safe_imports module. The two functions developers should be aware of are:
safe_import: A function used to import optional modules. Developers can provide an optional error message to be displayed in the case the module is used after a failed import. Alternatively, they can provide an alternate module to be used if the import of the optional module fails.
safe_import
returns a tuple containing:the successfully imported optional module or, if the import fails, the given alternate module or a placeholder
UnavailableMeta
class instance anda boolean indicating whether the import of the optional module was successful.
The returned boolean can be used throughout the script to ensure you only use the optional module when it is present. For example, in the LLM collection, we use
safe_import
to determine whether TE is installed. When creating the default GPT layer spec, we use the value ofHAVE_TE
to determine whether the default layer spec uses the transformer engine:_, HAVE_TE = safe_import("transformer_engine") ... def default_layer_spec(config: "GPTConfig") -> ModuleSpec: if HAVE_TE: return transformer_engine_layer_spec(config) else: return local_layer_spec(config)
safe_import_from: A function used to import symbols from modules that may not be available. As in the case of
safe_import
, developers can provide a message to display whenever the symbol is used after a failed import, or they can provide an object to be used in place of the symbol if the import of the symbol fails.safe_import_from
returns the same tuple containing:the successfully imported optional symbol or, if the import fails, the given alternate object or a placeholder
UnavailableMeta
class instance anda boolean indicating whether the import of the desired symbol was successful.
safe_import
and safe_import_from
are used throughout the NeMo
codebase. megatron_gpt_model.py
is one example:
transformer_engine, HAVE_TE = safe_import("transformer_engine")
te_module, HAVE_TE_MODULE = safe_import_from("transformer_engine.pytorch", "module")
get_gpt_layer_with_te_and_hyena_spec, HAVE_HYENA_SPEC = safe_import_from(
"nemo.collections.nlp.modules.common.hyena.hyena_spec", "get_gpt_layer_with_te_and_hyena_spec"
)
HAVE_TE = HAVE_TE and HAVE_TE_MODULE and HAVE_HYENA_SPEC
Transformer Engine is required for FP8 and Cuda Graphs. The value of HAVE_TE
is used throughout megatron_gpt_model.py
to determine whether these features can be enabled and to gracefully handle the case when a user requests these features and they are not present.
For example, when a user enables cuda graphs, we use the value of HAVE_TE
to ensure that Transformer Engine is present. If HAVE_TE
is False, a useful message is printed.
One consequence of import guarding is suppose a developer expects a particular module to be present, but the import fails. If the import is guarded, this will cause the execution to continue with a different code path than the developer expects. During development, a user may find it useful to run in debug
mode.
This causes the logger to report any failed imports along with the corresponding traceback, which can help the developer catch any unexpected failed imports and understand why the expected modules are missing.
Debug mode can be enabled with the following code:
from nemo.utils import logging
logging.set_verbosity(logging.DEBUG)
Working with Hugging Face Models#
Some of the NeMo examples require accessing gated Hugging Face models. If you try to run a model and get an error that looks like this:
OSError: You are trying to access a gated repo.
Make sure to have access to it at <URL>
you likely need to set up your HF_TOKEN
environment variable. You must first request access to the gated model by following the URL provided.
After access has been granted, make sure you have a Hugging Face access token (if you do not, follow this tutorial to generate one).
Finally, be sure to set the HF_TOKEN
variable in your environment:
export HF_TOKEN=<your_access_token>
Working with scripts in NeMo 2.0#
When working with any scripts in NeMo 2.0, please make sure you wrap your code in an if __name__ == "__main__":
block. Otherwise, your code may hang unexpectedly.
The reason for this is that NeMo 2.0 uses Python’s multiprocessing
module in the backend when running a
multi-GPU job. The multiprocessing module will create new Python processes that will import the current module (your
script). If you did not add __name__== "__main__"
, then your module will spawn new processes which import the
module and then each spawn new processes. This results in an infinite loop of processing spawning.