Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Launcher Introduction
Important: NeMo Framework Launcher is compatible with NeMo version 1.0 only. NeMo-Run is recommended for launching experiments using NeMo 2.0.
The NeMo Launcher streamlines your experience with the NeMo Framework, offering a user-friendly interface for efficient management and organization of experiments across various environments. Built upon the Hydra framework, it empowers users to effortlessly compose and adjust hierarchical configurations using configuration files and command-line arguments.
Key Features
Developed on GitHub: Engage with its development at NVIDIA/NeMo-Framework-Launcher.
Model Support: Compatible with NeMo LLM (Large Language Models) and Multimodal models.
Optimized Configurations: Features ready-to-use, fully-tested configurations for model training and tuning.
Extensive Environment Support: Accommodates various environments including local (single node support), BCM (Slurm), BCP, AWS, Azure, and OCI.
Customizable Workflow: Supports complete workflows from data download and preparation to training, evaluation, and model export.
Experiment Organization: Efficiently organizes and stores all experiment configurations, scripts, and results in a single folder, simplifying debugging and future reproductions.
How the Launcher Operates
When you initiate the NeMo Megatron Launcher using python3 main.py
, it performs several critical operations to ensure a smooth and efficient workflow:
Updating Hydra Configurations: The launcher first updates the default Hydra configurations. These preloaded configuration files are tailored based on the user’s input arguments.
Configuration File Storage: After updating the configurations, the launcher saves these modified settings into a YAML file. This file is then used for subsequent calls to the NeMo Framework.
Launch Script Creation: The launcher then proceeds to generate submission scripts or create specific launch scripts. These scripts incorporate necessary calls to NeMo and other components required for the pipeline. If multiple scripts are involved, they are efficiently streamlined with dependencies, ensuring a coherent workflow based on the selected model type, learning stage, saved configurations, and target platform. This script is essential for guiding the launcher in executing the workflow correctly.
Script Execution on Target Platform: Finally, the launcher executes the script on the chosen platform. The execution method varies depending on the environment: it might use a bash run for interactive (local) setups, an sbatch submission for Slurm setups, etc.
It’s important to note that the launcher itself does not perform the heavy lifting for your experiments. Instead, it acts as a wrapper around the NeMo Framework, simplifying the process of running and managing experiments through this framework.
Support Matrix
This section provides a table outlining the compatible versions of Python and required Python packages for NeMo Framework Launcher.
Component |
Version(s) |
---|---|
Python |
>=3.8,<=3.10 |
best_download |
>=0.0.6 |
huggingface_hub |
>=0.13.0 |
hydra-core |
>=1.2.0,<1.3 |
img2dataset |
1.45.0 |
omegaconf |
>=2.2,<2.3 |
pynvml |
11.4.1 |
pytablewriter |
0.58.0 |
requests |
>=2.26.0 |
tqdm |
>=4.62.3 |
kubeflow-training |
>=0.15.2 |
hera |
<=5.17.0 |
pydantic |
<=2.8.2 |
kubernetes |
<=31.0.0 |
sqlitedict |
<=2.1.0 |