Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Continual Learning with Pretrained Checkpoints

Continual learning allows LLMs to acquire new skills and stay up-to-date with the rapidly evolving landscape of human knowledge. In this guide, we’ll explore how to engage in continual learning using existing pretrained checkpoints with the examples/nlp/language_modeling/megatron_gpt_pretraining.py script. This process is applicable to various models, including Llama 1/2/3, Gemma, Mistral, Mixtral, and others.

Configure Continual Learning

To enable continual learning, modify the model configuration in your script as follows:

model:
  # Use the following settings to specify the source for continual learning:
  restore_from_path: null  # Set this to a .nemo file path to restore only the model weights.
  restore_from_ckpt: null  # Set this to a checkpoint path to restore both model weights and optimizer states.
  • restore_from_path: Use this when you want to restore only the model weights from a .nemo file.

  • restore_from_ckpt: Use this to restore both model weights and optimizer states from a PyTorch checkpoint.

Adjust Training Configurations

When engaging in continual learning, it is often beneficial to modify various training configurations. For example:

  • Model Parallelism: Depending on the computational resources available, you may adjust the model’s parallelism settings.

  • Data Blend: You can change the dataset or modify how data is blended during training to better suit the new training objectives.

  • Learning Rate Scheduler: Adjusting the learning rate schedule can help optimize training for the new conditions.

Refer to the pretrain section of the documentation for details on configuring these parameters.

Important Notes on Continual Learning

  • State Reset: When engaging in continual learning, all states including model parallelism configuration, dataset, learning rate scheduler, and randomness are reset, except for the states loaded from the checkpoint.

  • Resuming Training: If training is interrupted (e.g., due to a system failure) and you wish to resume the continual learning process, ensure that the restore flags (restore_from_path or restore_from_ckpt) are set back to null. This prevents the system from resetting the states again, which could lead to incorrect results.