Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Checkpoints
There are three main ways to load pretrained checkpoints in NeMo:
Using the
restore_from()
method to load a local checkpoint file (.nemo
), orConverting a partially trained
.ckpt
(intermediate) checkpoint to.nemo
format.Converting HuggingFace public checkpoints to
.nemo
format.
Refer to the following sections for instructions and examples for each.
Note that these instructions are for loading fully trained checkpoints for evaluation or fine-tuning.
Loading .nemo
Checkpoints
NeMo automatically saves checkpoints of a model that is trained in a .nemo
format. Alternatively, to manually save the model at any
point, issue model.save_to(<checkpoint_path>.nemo)
.
If there is a local .nemo
checkpoint that you’d like to load, use the restore_from()
method:
import nemo.collections.multimodal as nemo_multimodal
model = nemo_multimodal.models.<MODEL_BASE_CLASS>.restore_from(restore_path="<path/to/checkpoint/file.nemo>")
Where the model base class is the MM model class of the original checkpoint.
Converting Intermediate Checkpoints
To evaluate a partially trained checkpoint, you may need to convert it to .nemo
format.
script to convert the checkpoint <ADD convert_ckpt_to_nemo.py PATH>.
python -m torch.distributed.launch --nproc_per_node=<tensor_model_parallel_size> * <pipeline_model_parallel_size> \
convert_ckpt_to_nemo.py \
--checkpoint_folder <path_to_PTL_checkpoints_folder> \
--checkpoint_name <checkpoint_name> \
--nemo_file_path <path_to_output_nemo_file> \
--tensor_model_parallel_size <tensor_model_parallel_size> \
--pipeline_model_parallel_size <pipeline_model_parallel_size>
Converting HuggingFace Checkpoints
To fully utilize the optimized training pipeline and framework/TRT inference pipeline of NeMo, we provide scripts to convert popular checkpoints on HuggingFace into NeMo format. Once converted, you can perform fine-tuning or inference on such checkpoints.
Stable Diffusion & ControlNet
We provide script to convert the Huggingface checkpoint <ADD convert_hf_ckpt_to_nemo.py PATH> to .nemo
format, which can then be used within our inference pipeline.
python convert_hf_ckpt_to_nemo.py \
--ckpt_path <path_to_HF_checkpoints> \
--hparams_file <path_to_hparams_file> \
--nemo_file_path <path_to_output_nemo_file> \
--model_type <model_to_be_converted> \
--nemo_clip_path <clip_ckpt_in_nemo_format>
hparams_file
: Config file to be combined with model weights to generate.nemo
checkpoint. It can be generated from a dummy run and can be found at, for example,nemo_experiments/stable-diffusion-train/version_0/hparams.yaml
.model_type
: We support converting stable_diffusion and controlnet checkpoint in this script.nemo_clip_path
: It’s required only when thecond_stage_config
inhparams_file
refer to a NeMo CLIP model. It will be ignored whencond_stage_config
refer to Hugginface CLIP. See Model Configuration for more details.
Imagen
We will provide conversion script if Imagen research team releases their checkpoint in the future. Conversion script for DeepFloyd IF models will be provided in the next release.