Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Note

Attention: Dedicated Container for StarCoder2

For StarCoder2 models, please use the nvcr.io/nvidia/nemo:24.07 container. Also check our StarCoder2 playbooks.

Checkpoint Conversion

NVIDIA provides scripts to convert the external StarCoder2 checkpoints from Hugging Face format to .nemo format. The .nemo checkpoint will be used for SFT, PEFT, and inference. NVIDIA also provides scripts to convert .nemo format back to Hugging Face format.

  1. Run the container using the following command

    docker run --gpus device=1 --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/nvidia/nemo:24.07 bash
    
  2. Convert the Hugging Face StarCoder2 model to .nemo model:

    python3 /opt/NeMo/scripts/checkpoint_converters/convert_starcoder2_hf_to_nemo.py \
    --input_name_or_path /path/to/starcoder2/checkpoints/hf \
    --output_path /path/to/starcoder2.nemo
    

    The generated starcoder2.nemo file uses distributed checkpointing and can be loaded with any tensor parallel (tp) or pipeline parallel (pp) combination without reshaping/splitting.

  3. Convert the Starcoder2 .nemo model to Hugging Face:

    python3 /opt/NeMo/scripts/checkpoint_converters/convert_starcoder2_hf_to_nemo.py \
    --input_name_or_path /path/to/starcoder2/nemo/checkpoint \
    --output_path /path/to/hf/folder
    

    You can load the generated Hugging Face checkpoint folder using the Hugging Face Transformers pipeline and then upload it to the hub.