StarCoder2

Released in 2024, BigCode’s StarCoder2 is a code-generation large language mode based on the transformer decoder framework. The model was trained on a massive code dataset called The Stack v2 which contains over 900B tokens. StarCoder2 is offered in 3B, 7B, and 15B sizes and competes with models well above its size on code-generation tasks. More information can be found in the companion paper “StarCoder 2 and The Stack v2: The Next Generation”.

Feature	Status
Data parallelism	✓
Tensor parallelism	✓
Pipeline parallelism	✓
Interleaved Pipeline Parallelism Sched	N/A
Sequence parallelism	✓
Selective activation checkpointing	✓
Gradient checkpointing	✓
Partial gradient checkpointing	✓
FP32/TF32	✓
AMP/FP16	✗
BF16	✓
TransformerEngine/FP8	✗
Multi-GPU	✓
Multi-Node	✓
Inference	N/A
Slurm	✓
Base Command Manager	✓
Base Command Platform	✓
Distributed data preprcessing	✓
NVfuser	✗
P-Tuning and Prompt Tuning	✓
IA3 and Adapter learning	✓
Distributed Optimizer	✓
Distributed Checkpoint	✓
Fully Shared Data Parallel	N/A