Baichuan 2

Released in 2023, Baichuan Intelligence Inc.’s Baichuan 2 is a large multilingual language model meant to fill a gap in the English-dominated LLM ecosystem. Based on the prevailing transformer decoder framework, Baichuan 2 makes a few interesting adjustments, including SwiGLU activations, memory-efficient attention from xFormers, and Layer Normalization on the input of the transformer block. Additionally, the 7B model uses RoPE; while the 13B model uses Attention with Linear Biases. More information is available in the paper “Baichuan 2: Open Large-scale Language Models”.

Feature	Status
Data parallelism	✓
Tensor parallelism	✓
Pipeline parallelism	✓
Interleaved Pipeline Parallelism Sched	N/A
Sequence parallelism	✓
Selective activation checkpointing	✓
Gradient checkpointing	✓
Partial gradient checkpointing	✓
FP32/TF32	✓
AMP/FP16	✗
BF16	✓
TransformerEngine/FP8	✗
Multi-GPU	✓
Multi-Node	✓
Inference	N/A
Slurm	✓
Base Command Manager	✓
Base Command Platform	✓
Distributed data preprcessing	✓
NVfuser	✗
P-Tuning and Prompt Tuning	✓
IA3 and Adapter learning	✓
Distributed Optimizer	✓
Distributed Checkpoint	✓
Fully Shared Data Parallel	N/A