StarCoder2

Released in 2024, BigCode’s StarCoder2 is a code-generation large language mode based on the transformer decoder framework. The model was trained on a massive code dataset called The Stack v2 which contains over 900B tokens. StarCoder2 is offered in 3B, 7B, and 15B sizes and competes with models well above its size on code-generation tasks. More information can be found in the companion paper “StarCoder 2 and The Stack v2: The Next Generation”.

Feature

Status

Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
FP32/TF32
AMP/FP16
BF16
TransformerEngine/FP8
Multi-GPU
Multi-Node
Inference N/A
Slurm
Base Command Manager
Base Command Platform
Distributed data preprcessing
NVfuser
P-Tuning and Prompt Tuning
IA3 and Adapter learning
Distributed Optimizer
Distributed Checkpoint
Fully Shared Data Parallel N/A
Previous Parameter Efficient Fine-Tuning (PEFT)
Next Data Preparation for SFT and PEFT
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.