Quick Start Guide for NeMo Launcher

  1. Clone the NeMo Launcher: Start by cloning the repository from GitHub. This is where you’ll find the necessary launcher scripts:

    Copy
    Copied!
                

    git clone https://github.com/NVIDIA/NeMo-Megatron-Launcher

    Locate the scripts in NeMo-Megatron-Launcher/launcher_scripts.

  2. Set Up Your Python Environment: Install the required packages to prepare your environment (it is recommended to use a virtual environment):

    Copy
    Copied!
                

    python -m venv my_project_env source my_project_env/bin/activate pip install -r requirements.txt

  3. Additional Setup Requirements: Ensure you have the following ready: - A NeMo Framework container. - A dataset for training, tuning, or evaluation. - Your Wandb key for logging to a wandb server. - The NeMo FW source code, if using a custom version of NeMo.

Use commands appropriate for your environment (like srun, docker run, etc.) to run the container, ensuring necessary launcher and data folders are mounted.

Configuration: - The launcher uses hierarchical configurations, with the main file at conf/config.yaml. - This example employs the GPT3 5B training configuration from conf/training/gpt3/5b.yaml. - Edit the config files or use command-line arguments for modifications. For more information, see the Hydra tutorial.

Execution Script: Go to the launcher scripts directory and run the following commands for training:

Copy
Copied!
            

cd /path/to/NeMo-Megatron-Launcher/launcher_scripts python3 main.py \ stages=[training] \ launcher_scripts_path=/path/to/launcher_scripts \ data_dir=/path/to/dataset/the_pile_gpt3 \ wandb_api_key_file=/path/to/wandb_key \ cluster_type=interactive \ training=gpt3/5b \ training.trainer.max_time=00:03:50:00 \ training.trainer.num_nodes=1

Your training logs and results will be located in /path/to/launcher_scripts/results/gpt3_5b.

Previous Launcher Introduction
Next NeMo Launcher Tutorial
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.