Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Checkpoint Conversion
NVIDIA provides a simple tool to convert the checkpoints from .ckpt
format to .nemo
format. The .nemo
checkpoint will be used for evaluation and inference.
Run Conversion
To run the checkpoint conversion, update
conf/config.yaml
:
defaults:
- conversion: qwen2/convert_qwen2
stages:
- conversion
Execute the launcher pipeline:
python3 main.py
.
Configure Settings
You can find default configurations for conversion in conf/conversion/qwen2/convert_qwen2.yaml
.
To configure:
run:
name: convert_${conversion.run.model_train_name}
nodes: ${divide_ceil:${conversion.model.model_parallel_size}, 8} # 8 gpus per node
time_limit: "2:00:00"
ntasks_per_node: ${divide_ceil:${conversion.model.model_parallel_size}, ${.nodes}}
convert_name: convert_nemo
model_train_name: qwen2_7b
train_dir: ${base_results_dir}/${.model_train_name}
results_dir: ${.train_dir}/${.convert_name}
output_path: ${.train_dir}/${.convert_name}
nemo_file_name: megatron_qwen2.nemo # name of nemo checkpoint; must be .nemo file
nemo_file_name
sets the output filename of the converted .nemo
checkpoint.
output_path
sets the output location of the converted .nemo
checkpoint.
model:
model_type: gpt
checkpoint_folder: ${conversion.run.train_dir}/results/checkpoints
checkpoint_name: latest # latest OR name pattern of a checkpoint (e.g. megatron_gpt-*last.ckpt)
hparams_file: ${conversion.run.train_dir}/results/hparams.yaml
tensor_model_parallel_size: 2
pipeline_model_parallel_size: 1
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
tokenizer_model: ${data_dir}/qwen2/qwen2_tokenizer.model
checkpoint_folder
sets the input checkpoint folder to be used for conversion.
checkpoint_name
sets the input checkpoint filename to bed used for conversion.