Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Checkpoint Conversion
NVIDIA provides a simple tool to convert the checkpoints from .ckpt
format to .nemo format. The .nemo checkpoint will be used for evaluation and inference.
Run Conversion
To run checkpoint conversion update conf/config.yaml:
defaults:
  - conversion: chatglm/convert_chatglm
stages:
  - conversion
Execute the launcher pipeline: python3 main.py.
Configuration
Default configurations for conversion can be found in file conf/conversion/chatglm/convert_chatglm.yaml.
run:
    name: convert_${conversion.run.model_train_name}
    nodes: ${divide_ceil:${conversion.model.model_parallel_size}, 8} # 8 gpus per node
    time_limit: "2:00:00"
    ntasks_per_node: ${divide_ceil:${conversion.model.model_parallel_size}, ${.nodes}}
    convert_name: convert_nemo
    model_train_name: chatglm3_6b
    train_dir: ${base_results_dir}/${.model_train_name}
    results_dir: ${.train_dir}/${.convert_name}
    output_path: ${.train_dir}/${.convert_name}
    nemo_file_name: megatron_chatglm.nemo # name of nemo checkpoint; must be .nemo file
nemo_file_name sets the output filename of the converted .nemo checkpoint.
output_path sets the output location of the converted .nemo checkpoint.
model:
    model_type: gpt
    checkpoint_folder: ${conversion.run.train_dir}/results/checkpoints
    checkpoint_name: latest # latest OR name pattern of a checkpoint (e.g. megatron_gpt-*last.ckpt)
    hparams_file: ${conversion.run.train_dir}/results/hparams.yaml
    tensor_model_parallel_size: 1
    pipeline_model_parallel_size: 1
    model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
    tokenizer_model: ${data_dir}/chatglm/chatglm_tokenizer.model
checkpoint_folder sets the input checkpoint folder to be used for conversion.
checkpoint_name sets the input checkpoint filename to bed used for conversion.