In this section, we present four key functionalities of NVIDIA NeMo related to checkpoint management:
Checkpoint Loading: Load local
.nemo
checkpoint files with therestore_from()
method.Partial Checkpoint Conversion: Convert partially-trained
.ckpt
checkpoints to the.nemo
format.Community Checkpoint Conversion: Transition checkpoints from community sources, like HuggingFace, into the
.nemo
format.Model Parallelism Adjustment: Modify model parallelism to efficiently train models that exceed the memory of a single GPU. NeMo employs both tensor (intra-layer) and pipeline (inter-layer) model parallelisms. Dive deeper with “Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM” (link). This tool aids in adjusting model parallelism, accommodating users who need to deploy on larger GPU arrays due to memory constraints.
A .nemo
checkpoint is fundamentally a tar file that bundles the model configurations (given as a YAML file), model weights, and other pertinent artifacts like tokenizer models or vocabulary files. This consolidated design streamlines sharing, loading, tuning, evaluating, and inference.
On the other hand, the .ckpt
file is a product of PyTorch Lightning training. It stores model weights and optimizer states, and it’s generally used for resuming training.
Subsequent sections delve into each of the previously listed functionalities, emphasizing the loading of fully trained checkpoints for evaluation or additional fine-tuning.
NeMo inherently saves any model’s checkpoints in the .nemo
format. To manually save a model at any stage:
model.save_to(<checkpoint_path>.nemo)
To load a local .nemo
checkpoint:
import nemo.collections.multimodal as nemo_multimodal
model = nemo_multimodal.models.<MODEL_BASE_CLASS>.restore_from(restore_path="<path/to/checkpoint/file.nemo>")
Replace <MODEL_BASE_CLASS> with the appropriate MM model class.
The training script only auto-converts the final checkpoint into the .nemo
format. To evaluate intermediate training checkpoints, conversion to .nemo
might be needed. For this:
python -m torch.distributed.launch --nproc_per_node=<tensor_model_parallel_size> * <pipeline_model_parallel_size> \
examples/multimodal/convert_ckpt_to_nemo.py \
--checkpoint_folder <path_to_PTL_checkpoints_folder> \
--checkpoint_name <checkpoint_name> \
--nemo_file_path <path_to_output_nemo_file> \
--tensor_model_parallel_size <tensor_model_parallel_size> \
--pipeline_model_parallel_size <pipeline_model_parallel_size>
NeVA Checkpoints
Currently, the conversion mainly supports LLaVA checkpoints based on “llama-2 chat” checkpoints. As a reference, we’ll consider the checkpoint llava-llama-2-13b-chat-lightning-preview.
After downloading this checkpoint and saving it at /path/to/llava-llama-2-13b-chat-lightning-preview, undertake the following procedures:
Modifying the Tokenizer
NeMo mandates adding specific tokens to the tokenizer model for peak performance. To modify an existing tokenizer located in /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer, execute the following in the NeMo container:
cd /opt/sentencepiece/src/
protoc --python_out=/opt/NeMo/scripts/tokenizers/ sentencepiece_model.proto
python /opt/NeMo/scripts/tokenizers/add_special_tokens_to_sentencepiece.py \
--input_file /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer.model \
--output_file /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer_neva.model \
--is_userdefined \
--tokens "<extra_id_0>" "<extra_id_1>" "<extra_id_2>" "<extra_id_3>" \
"<extra_id_4>" "<extra_id_5>" "<extra_id_6>" "<extra_id_7>"
Checkpoint Conversion
For conversion:
python examples/multimodal/mllm/neva/convert_hf_llava_to_neva.py \
--in-file /path/to/llava-llama-2-13b-chat-lightning-preview \
--out-file /path/to/neva-llava-llama-2-13b-chat-lightning-preview.nemo \
--tokenizer-model /path/to/llava-llama-2-13b-chat-lightning-preview/tokenizer_add_special.model
--conv-template llama_2
NeVA Checkpoints
Adjust model parallelism with:
python examples/nlp/language_modeling/megatron_change_num_partitions.py \
--model_file=/path/to/source.nemo \
--target_file=/path/to/target.nemo \
--tensor_model_parallel_size=??? \
--target_tensor_model_parallel_size=??? \
--pipeline_model_parallel_size=??? \
--target_pipeline_model_parallel_size=??? \
--model_class="nemo.collections.multimodal.models.multimodal_llm.neva.neva_model.MegatronNevaModel" \
--precision=32 \
--tokenizer_model_path=/path/to/tokenizer.model \
--tp_conversion_only