Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Quickstart Guide for NeMo Launcher#

Installation Steps#

  1. Clone the NeMo Launcher:

    Start by cloning the repository from GitHub. This is where you’ll find the necessary launcher scripts:

    git clone https://github.com/NVIDIA/NeMo-Framework-Launcher
    

    Locate the scripts in NeMo-Framework-Launcher/launcher_scripts.

  2. Set Up Your Python Environment:

    Install the required packages to prepare your environment (it is recommended to use a virtual environment):

    python -m venv my_project_env
    source my_project_env/bin/activate
    pip install -r requirements.txt
    
  3. Additional Setup Requirements:

    Ensure you have the following ready:

    1. A NeMo Framework container.

    2. A dataset for training, tuning, or evaluation.

    3. Your Wandb key for logging to a wandb server.

    4. The NeMo FW source code, if using a custom version of NeMo.

Starting the NeMo Framework Container#

Use commands appropriate for your environment (like srun, docker run, etc.) to run the container, ensuring necessary launcher and data folders are mounted.

Example: GPT3 5B Pretraining#

Configuration:

  • The launcher uses hierarchical configurations, with the main file at conf/config.yaml.

  • This example employs the GPT3 5B training configuration from conf/training/gpt3/5b.yaml.

  • Edit the config files or use command-line arguments for modifications. For more information, see the Hydra tutorial.

Execution Script: Go to the launcher scripts directory and run the following commands for training:

cd /path/to/NeMo-Framework-Launcher/launcher_scripts
python3 main.py \
    stages=[training] \
    launcher_scripts_path=/path/to/launcher_scripts \
    data_dir=/path/to/dataset/the_pile_gpt3 \
    wandb_api_key_file=/path/to/wandb_key \
    cluster_type=interactive \
    training=gpt3/5b \
    training.trainer.max_time=00:03:50:00 \
    training.trainer.num_nodes=1

Your training logs and results will be located in /path/to/launcher_scripts/results/gpt3_5b.