Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Supervised Fine-Tuning (SFT)

Important

For all the steps below, you must authenticate with NVIDIA NGC, generate API KEY from NGC, add the key to your credentials following instructions in this guide, and get into NVIDIA NeMo dev container nvcr.io/nvidia/nemo:dev.

Run Fine-Tuning

For full fine-tuning, run the following script:

#!/bin/bash

MBS=4
GBS=128
TP=4 # According to the saved checkpoint
SP=True # True only if TP>1 otherwise False
SEQ_LEN=2048
NUM_DEVICES=8
PATH_TO_NEMO_MODEL=<path to .nemo file>
TRAIN_DATASET_PATH=<path to training dataset file>
VAL_DATASET_PATH=<path to validation dataset file>
CONFIG_PATH="/opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
CONFIG_NAME="megatron_mamba_finetuning_config"
SAVE_DIR=<path to the saving directory>

export NVTE_FUSED_ATTN=1
export NVTE_FLASH_ATTN=0

torchrun --nproc_per_node=${NUM_DEVICES} \
        /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_mamba_finetuning.py \
        --config-path=${CONFIG_PATH} \
        --config-name=${CONFIG_NAME} \
        trainer.devices=${NUM_DEVICES} \
        trainer.precision=bf16 \
        trainer.accelerator=gpu \
        trainer.log_every_n_steps=1 \
        trainer.val_check_interval=100 \
        trainer.limit_val_batches=50 \
        +trainer.num_sanity_val_steps=0 \
        +trainer.accumulate_grad_batches=1 \
        trainer.max_steps=700 \
        trainer.gradient_clip_val=1.0 \
        exp_manager.exp_dir=${SAVE_DIR} \
        exp_manager.resume_if_exists=True \
        exp_manager.create_checkpoint_callback=True \
        exp_manager.create_wandb_logger=True \
        model.tensor_model_parallel_size=${TP} \
        model.sequence_parallel=$SP \
        model.peft.peft_scheme='none' \
        model.megatron_amp_O2=True \
        model.encoder_seq_length=${SEQ_LEN} \
        model.data.validation_ds.pad_to_max_length=True \
        model.data.train_ds.pad_to_max_length=True \
        model.optim.name="distributed_fused_adam" \
        model.data.train_ds.max_seq_length=${SEQ_LEN} \
        model.data.validation_ds.max_seq_length=${SEQ_LEN} \
        model.micro_batch_size=${MBS} \
        model.global_batch_size=${GBS} \
        model.restore_from_path=${PATH_TO_NEMO_MODEL} \
        model.data.train_ds.file_names=[${TRAIN_DATASET_PATH}] \
        model.data.validation_ds.file_names=[${VAL_DATASET_PATH}] \
        model.optim.lr=5e-6 \
        model.optim.sched.min_lr=1e-7

Evaluate the Fine-Tuned Model

To evaluate the fine-tuned model, run the following script:

#!/bin/bash

MBS=32
GBS=64
TP=4 # According to the fine-tuned checkpoint
SP=True # True only if TP>1 otherwise False
SEQ_LEN=2048
NUM_DEVICES=8
PATH_TO_NEMO_MODEL=<path to .nemo file>
TEST_DATASET="[<path to test datasets (list)>]"
CONFIG_PATH="/opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
CONFIG_NAME="megatron_mamba_finetuning_config"
SAVE_DIR=<path to the saving directory>

export NVTE_FUSED_ATTN=1
export NVTE_FLASH_ATTN=0


CONFIG_PATH="/opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
CONFIG_NAME="megatron_mamba_generate_config"

torchrun --nproc_per_node=${NUM_DEVICES}  /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_mamba_generate.py \
        --config-path=${CONFIG_PATH} \
        --config-name=${CONFIG_NAME} \
        trainer.devices=${NUM_DEVICES} \
        trainer.precision=bf16 \
        trainer.accelerator=gpu \
        trainer.log_every_n_steps=1 \
        trainer.val_check_interval=10 \
        trainer.limit_val_batches=20 \
        ++trainer.num_sanity_val_steps=0 \
        ++trainer.accumulate_grad_batches=1 \
        trainer.max_steps=1000 \
        trainer.gradient_clip_val=1.0 \
        exp_manager.exp_dir=${SAVE_DIR} \
        exp_manager.resume_if_exists=False \
        exp_manager.create_wandb_logger=False \
        model.megatron_amp_O2=True \
        model.peft.restore_from_path=False \
        +model.peft.restore_from_ckpt.checkpoint_dir=False \
        +model.peft.restore_from_ckpt.checkpoint_name=False \
        model.tensor_model_parallel_size=${TP} \
        model.micro_batch_size=${MBS} \
        model.global_batch_size=${GBS} \
        model.restore_from_path=${PATH_TO_NEMO_MODEL} \
        model.data.test_ds.file_names=${TEST_DATASET} \
        model.data.test_ds.names=["squad"] \
        model.data.test_ds.global_batch_size=${GBS} \
        model.data.test_ds.micro_batch_size=${MBS} \
        model.data.test_ds.tokens_to_generate=30 \
        model.answer_only_loss=True \
        inference.greedy=True \
        exp_manager.checkpoint_callback_params.monitor=validation_loss \
        ++inference.verbose=True \
        model.data.test_ds.write_predictions_to_file=True \
        model.data.test_ds.output_file_path_prefix=${SAVE_DIR}/shorteval \
        && echo "Eval finished, calculating scores" \
        && python /opt/NeMo/scripts/metric_calculation/peft_metric_calc.py --label_field original_answers \
        --pred_file ${SAVE_DIR}/shorteval_test_squad_inputs_preds_labels.jsonl > ${SAVE_DIR}/shorteval_test_squad_inputs_preds_labels.score \
        && cat ${SAVE_DIR}/shorteval_test_squad_inputs_preds_labels.score

Run SFT with NeMo Launcher

Please refer to the Launcher Guide section to understand the basics of NeMo Launcher.

To run SFT update conf/config.yaml:

defaults:
  - fine: mamba/finetune

stages:
  - finetune

Execute the launcher pipeline: python3 main.py

Configure Fine-Tuning

You can find the default configurations for fine-tuning with the squad in conf/fine_tuning/mamba/sft.yaml. Fine-tuning configuration is divided into four sections run, trainer, exp_manger, and model.

Set the run parameters:

run:
    name: sft_mamba
    results_dir: ${base_results_dir}/${fine_tuning.run.name}
    time_limit: "04:00:00"
    dependency: "singleton"

Set the number of devices for fine-tuning:
trainer: num_nodes: 1 devices: 8
model: restore_from_path: <path to .nemo model>
restore_from_path sets the path to the .nemo checkpoint to run fine-tuning.