Parameter Efficient Fine-Tuning (PEFT)

Run PEFT with NeMo Launcher

To run PEFT update conf/config.yaml:

defaults:
  - peft: mixtral/squad

stages:
  - peft

Specify the desired model size for peft configuration with mixtral/squad for the 8x7B or mixtral/squad_8x22b for the 8x22B model.

Execute launcher pipeline: python3 main.py

Configuration

For the Mixtral-8x7B model, default configurations for PEFT with squad can be found in conf/peft/mixtral/squad.yaml (or conf/peft/mixtral/squad_8x22b.yaml for the 8x22B model), we will continue our presentation using the 8x7B model since they are similar. Fine-tuning configuration is divided into four sections run, trainer, exp_manger and model.

run:
  name: mixtral-8x7b
  time_limit: "04:00:00"
  dependency: "singleton"
  convert_name: convert_nemo
  model_train_name: mixtral-8x7b
  convert_dir: ${base_results_dir}/${peft.run.model_train_name}/${peft.run.convert_name}
  task_name: "squad"
  results_dir: ${base_results_dir}/${.model_train_name}/peft_${.task_name}

Set the number of nodes and devices for fine-tuning:

trainer:
  num_nodes: 1
  devices: 8
model:
  restore_from_path: ${peft.run.convert_dir}/results/megatron_mixtral.nemo
  tensor_model_parallel_size: 8

restore_from_path sets the path to the .nemo checkpoint to run fine-tuning.

Set tensor parallel and pipelien parallel size for different model sizes.

Set PEFT specific configruation:

model:
    peft:
        peft_scheme: "lora"

peft_scheme sets the fine-tuning scheme to be used. Supported schemes include: lora, adapter, ia3, ptuning.

Gated Model assets

Mistral’s tokenizer is hosted on Huggingface.com which requires login. In order to access the tokenizer assets, users are advised to prepend the HF_TOKEN=<token> environment variable to the nemo launcher invocation command.

In NeMo Laucher this can be achieved by appending “++env_vars.HF_TOKEN=<user-token” to the argument list.