Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Supervised Fine-Tuning (SFT)
Important
For all the steps below, you must authenticate with NVIDIA NGC, generate API KEY from NGC, add the key to your credentials following instructions in this guide, and get into NVIDIA NeMo dev container nvcr.io/nvidia/nemo:dev
.
Run Fine-Tuning
For full fine-tuning, run the following script:
#!/bin/bash
MBS=4
GBS=128
TP=4 # According to the saved checkpoint
SP=True # True only if TP>1 otherwise False
SEQ_LEN=2048
NUM_DEVICES=8
PATH_TO_NEMO_MODEL=<path to .nemo file>
TRAIN_DATASET_PATH=<path to training dataset file>
VAL_DATASET_PATH=<path to validation dataset file>
CONFIG_PATH="/opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
CONFIG_NAME="megatron_mamba_finetuning_config"
SAVE_DIR=<path to the saving directory>
export NVTE_FUSED_ATTN=1
export NVTE_FLASH_ATTN=0
torchrun --nproc_per_node=${NUM_DEVICES} \
/opt/NeMo/examples/nlp/language_modeling/tuning/megatron_mamba_finetuning.py \
--config-path=${CONFIG_PATH} \
--config-name=${CONFIG_NAME} \
trainer.devices=${NUM_DEVICES} \
trainer.precision=bf16 \
trainer.accelerator=gpu \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=100 \
trainer.limit_val_batches=50 \
+trainer.num_sanity_val_steps=0 \
+trainer.accumulate_grad_batches=1 \
trainer.max_steps=700 \
trainer.gradient_clip_val=1.0 \
exp_manager.exp_dir=${SAVE_DIR} \
exp_manager.resume_if_exists=True \
exp_manager.create_checkpoint_callback=True \
exp_manager.create_wandb_logger=True \
model.tensor_model_parallel_size=${TP} \
model.sequence_parallel=$SP \
model.peft.peft_scheme='none' \
model.megatron_amp_O2=True \
model.encoder_seq_length=${SEQ_LEN} \
model.data.validation_ds.pad_to_max_length=True \
model.data.train_ds.pad_to_max_length=True \
model.optim.name="distributed_fused_adam" \
model.data.train_ds.max_seq_length=${SEQ_LEN} \
model.data.validation_ds.max_seq_length=${SEQ_LEN} \
model.micro_batch_size=${MBS} \
model.global_batch_size=${GBS} \
model.restore_from_path=${PATH_TO_NEMO_MODEL} \
model.data.train_ds.file_names=[${TRAIN_DATASET_PATH}] \
model.data.validation_ds.file_names=[${VAL_DATASET_PATH}] \
model.optim.lr=5e-6 \
model.optim.sched.min_lr=1e-7
Evaluate the Fine-Tuned Model
To evaluate the fine-tuned model, run the following script:
#!/bin/bash
MBS=32
GBS=64
TP=4 # According to the fine-tuned checkpoint
SP=True # True only if TP>1 otherwise False
SEQ_LEN=2048
NUM_DEVICES=8
PATH_TO_NEMO_MODEL=<path to .nemo file>
TEST_DATASET="[<path to test datasets (list)>]"
CONFIG_PATH="/opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
CONFIG_NAME="megatron_mamba_finetuning_config"
SAVE_DIR=<path to the saving directory>
export NVTE_FUSED_ATTN=1
export NVTE_FLASH_ATTN=0
CONFIG_PATH="/opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
CONFIG_NAME="megatron_mamba_generate_config"
torchrun --nproc_per_node=${NUM_DEVICES} /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_mamba_generate.py \
--config-path=${CONFIG_PATH} \
--config-name=${CONFIG_NAME} \
trainer.devices=${NUM_DEVICES} \
trainer.precision=bf16 \
trainer.accelerator=gpu \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=10 \
trainer.limit_val_batches=20 \
++trainer.num_sanity_val_steps=0 \
++trainer.accumulate_grad_batches=1 \
trainer.max_steps=1000 \
trainer.gradient_clip_val=1.0 \
exp_manager.exp_dir=${SAVE_DIR} \
exp_manager.resume_if_exists=False \
exp_manager.create_wandb_logger=False \
model.megatron_amp_O2=True \
model.peft.restore_from_path=False \
+model.peft.restore_from_ckpt.checkpoint_dir=False \
+model.peft.restore_from_ckpt.checkpoint_name=False \
model.tensor_model_parallel_size=${TP} \
model.micro_batch_size=${MBS} \
model.global_batch_size=${GBS} \
model.restore_from_path=${PATH_TO_NEMO_MODEL} \
model.data.test_ds.file_names=${TEST_DATASET} \
model.data.test_ds.names=["squad"] \
model.data.test_ds.global_batch_size=${GBS} \
model.data.test_ds.micro_batch_size=${MBS} \
model.data.test_ds.tokens_to_generate=30 \
model.answer_only_loss=True \
inference.greedy=True \
exp_manager.checkpoint_callback_params.monitor=validation_loss \
++inference.verbose=True \
model.data.test_ds.write_predictions_to_file=True \
model.data.test_ds.output_file_path_prefix=${SAVE_DIR}/shorteval \
&& echo "Eval finished, calculating scores" \
&& python /opt/NeMo/scripts/metric_calculation/peft_metric_calc.py --label_field original_answers \
--pred_file ${SAVE_DIR}/shorteval_test_squad_inputs_preds_labels.jsonl > ${SAVE_DIR}/shorteval_test_squad_inputs_preds_labels.score \
&& cat ${SAVE_DIR}/shorteval_test_squad_inputs_preds_labels.score
Run SFT with NeMo Launcher
Please refer to the Launcher Guide
section to understand the basics of NeMo Launcher.
To run SFT update
conf/config.yaml
:defaults: - fine: mamba/finetune stages: - finetune
Execute the launcher pipeline:
python3 main.py
Configure Fine-Tuning
You can find the default configurations for fine-tuning with the squad in conf/fine_tuning/mamba/sft.yaml
.
Fine-tuning configuration is divided into four sections run
, trainer
, exp_manger
, and model
.
Set the run parameters:
run: name: sft_mamba results_dir: ${base_results_dir}/${fine_tuning.run.name} time_limit: "04:00:00" dependency: "singleton"
Set the number of devices for fine-tuning:
trainer: num_nodes: 1 devices: 8
model: restore_from_path: <path to .nemo model>
restore_from_path
sets the path to the.nemo
checkpoint to run fine-tuning.