Baichuan 2

Released in 2023, Baichuan Intelligence Inc.’s Baichuan 2 is a large multilingual language model meant to fill a gap in the English-dominated LLM ecosystem. Based on the prevailing transformer decoder framework, Baichuan 2 makes a few interesting adjustments, including SwiGLU activations, memory-efficient attention from xFormers, and Layer Normalization on the input of the transformer block. Additionally, the 7B model uses RoPE; while the 13B model uses Attention with Linear Biases. More information is available in the paper “Baichuan 2: Open Large-scale Language Models”.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

N/A

Slurm

Base Command Manager

Base Command Platform

Distributed data preprcessing

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

Distributed Optimizer

Distributed Checkpoint

Fully Shared Data Parallel

N/A