Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Checkpoint Conversion
NVIDIA provides a simple tool to convert the checkpoints from .ckpt
format to .nemo
format. The .nemo
checkpoint will be used for evaluation and inference.
Run Conversion
To run checkpoint conversion update conf/config.yaml
:
defaults:
- conversion: chatglm/convert_chatglm
stages:
- conversion
Execute the launcher pipeline: python3 main.py
.
Configuration
Default configurations for conversion can be found in file conf/conversion/chatglm/convert_chatglm.yaml
.
run:
name: convert_${conversion.run.model_train_name}
nodes: ${divide_ceil:${conversion.model.model_parallel_size}, 8} # 8 gpus per node
time_limit: "2:00:00"
ntasks_per_node: ${divide_ceil:${conversion.model.model_parallel_size}, ${.nodes}}
convert_name: convert_nemo
model_train_name: chatglm3_6b
train_dir: ${base_results_dir}/${.model_train_name}
results_dir: ${.train_dir}/${.convert_name}
output_path: ${.train_dir}/${.convert_name}
nemo_file_name: megatron_chatglm.nemo # name of nemo checkpoint; must be .nemo file
nemo_file_name
sets the output filename of the converted .nemo
checkpoint.
output_path
sets the output location of the converted .nemo
checkpoint.
model:
model_type: gpt
checkpoint_folder: ${conversion.run.train_dir}/results/checkpoints
checkpoint_name: latest # latest OR name pattern of a checkpoint (e.g. megatron_gpt-*last.ckpt)
hparams_file: ${conversion.run.train_dir}/results/hparams.yaml
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
tokenizer_model: ${data_dir}/chatglm/chatglm_tokenizer.model
checkpoint_folder
sets the input checkpoint folder to be used for conversion.
checkpoint_name
sets the input checkpoint filename to bed used for conversion.