Baichuan 2

Released in 2023, Baichuan Intelligence Inc.’s Baichuan 2 is a large multilingual language model meant to fill a gap in the English-dominated LLM ecosystem. Based on the prevailing transformer decoder framework, Baichuan 2 makes a few interesting adjustments, including SwiGLU activations, memory-efficient attention from xFormers, and Layer Normalization on the input of the transformer block. Additionally, the 7B model uses RoPE; while the 13B model uses Attention with Linear Biases. More information is available in the paper “Baichuan 2: Open Large-scale Language Models”.

Feature

Status

Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
FP32/TF32
AMP/FP16
BF16
TransformerEngine/FP8
Multi-GPU
Multi-Node
Inference N/A
Slurm
Base Command Manager
Base Command Platform
Distributed data preprcessing
NVfuser
P-Tuning and Prompt Tuning
IA3 and Adapter learning
Distributed Optimizer
Distributed Checkpoint
Fully Shared Data Parallel N/A
Previous Model Deployment
Next Data Preparation
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.