Important
NeMo 2.0 is an experimental feature and currently released in the dev container only:
nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Quickstart Guide for NeMo Launcher
Installation Steps
Clone the NeMo Launcher:
Start by cloning the repository from GitHub. This is where you’ll find the necessary launcher scripts:
git clone https://github.com/NVIDIA/NeMo-Framework-Launcher
Locate the scripts in
NeMo-Framework-Launcher/launcher_scripts.Set Up Your Python Environment:
Install the required packages to prepare your environment (it is recommended to use a virtual environment):
python -m venv my_project_env source my_project_env/bin/activate pip install -r requirements.txtAdditional Setup Requirements:
Ensure you have the following ready:
A dataset for training, tuning, or evaluation.
Your Wandb key for logging to a wandb server.
The NeMo FW source code, if using a custom version of NeMo.
Starting the NeMo Framework Container
Use commands appropriate for your environment (like srun, docker run, etc.) to run the container, ensuring necessary launcher and data folders are mounted.
Example: GPT3 5B Pretraining
Configuration:
The launcher uses hierarchical configurations, with the main file at
conf/config.yaml.This example employs the GPT3 5B training configuration from
conf/training/gpt3/5b.yaml.Edit the config files or use command-line arguments for modifications. For more information, see the Hydra tutorial.
Execution Script: Go to the launcher scripts directory and run the following commands for training:
cd /path/to/NeMo-Framework-Launcher/launcher_scripts
python3 main.py \
stages=[training] \
launcher_scripts_path=/path/to/launcher_scripts \
data_dir=/path/to/dataset/the_pile_gpt3 \
wandb_api_key_file=/path/to/wandb_key \
cluster_type=interactive \
training=gpt3/5b \
training.trainer.max_time=00:03:50:00 \
training.trainer.num_nodes=1
Your training logs and results will be located in /path/to/launcher_scripts/results/gpt3_5b.