Checkpoint Conversion

User Guide (Latest Version)

NVIDIA provides a simple tool to convert the checkpoints from .ckpt format to .nemo format. The .nemo checkpoint will be used for evaluation and inference.

To run checkpoint conversion update conf/config.yaml:


defaults: - conversion: chatglm/convert_chatglm stages: - conversion

Execute launcher pipeline: python3


Default configurations for conversion can be found in file conf/conversion/chatglm/convert_chatglm.yaml.


run: name: convert_${} nodes: ${divide_ceil:${conversion.model.model_parallel_size}, 8} # 8 gpus per node time_limit: "2:00:00" ntasks_per_node: ${divide_ceil:${conversion.model.model_parallel_size}, ${.nodes}} convert_name: convert_nemo model_train_name: chatglm3_6b train_dir: ${base_results_dir}/${.model_train_name} results_dir: ${.train_dir}/${.convert_name} output_path: ${.train_dir}/${.convert_name} nemo_file_name: megatron_chatglm.nemo # name of nemo checkpoint; must be .nemo file

nemo_file_name sets the output filename of the converted .nemo checkpoint.

output_path sets the output location of the converted .nemo checkpoint.


model: model_type: gpt checkpoint_folder: ${}/results/checkpoints checkpoint_name: latest # latest OR name pattern of a checkpoint (e.g. megatron_gpt-*last.ckpt) hparams_file: ${}/results/hparams.yaml tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1 model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}} tokenizer_model: ${data_dir}/chatglm/chatglm_tokenizer.model

checkpoint_folder sets the input checkpoint folder to be used for conversion.

checkpoint_name sets the input checkpoint filename to bed used for conversion.

Previous Training with Predefined Configurations
Next Model Evaluation
© | | | | | | |. Last updated on Jun 19, 2024.