StarCoder2

Released in 2024, BigCode’s StarCoder2 is a code-generation large language mode based on the transformer decoder framework. The model was trained on a massive code dataset called The Stack v2 which contains over 900B tokens. StarCoder2 is offered in 3B, 7B, and 15B sizes and competes with models well above its size on code-generation tasks. More information can be found in the companion paper “StarCoder 2 and The Stack v2: The Next Generation”.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

N/A

Slurm

Base Command Manager

Base Command Platform

Distributed data preprcessing

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

Distributed Optimizer

Distributed Checkpoint

Fully Shared Data Parallel

N/A