BERT

User Guide (Latest Version)

Released in 2018, Bidirectional Representations from Transformers (BERT) is a encoder-only transformer network augmented with its namesake bidirectional layers. BERT has become a staple in the Natural Language Processing domain, and the ideas in the original paper have been adapted to many different domains and applications.

Feature

Status

Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
FP32/TF32
AMP/FP16
BF16
TransformerEngine/FP8
Multi-GPU
Multi-Node
Inference N/A
Slurm
Base Command Manager
Base Command Platform
Distributed data preprcessing
NVfuser
P-Tuning and Prompt Tuning N/A
IA3 and Adapter learning N/A
Distributed Optimizer
Distributed Checkpoint
Fully Shared Data Parallel N/A
Torch Distributed Checkpoint
Previous GPT Results
Next Data Preparation
© | | | | | | |. Last updated on Jun 19, 2024.