BERT - NVIDIA Docs

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide BERT

Released in 2018, Bidirectional Representations from Transformers (BERT) is a encoder-only transformer network augmented with its namesake bidirectional layers. BERT has become a staple in the Natural Language Processing domain, and the ideas in the original paper have been adapted to many different domains and applications.

Feature	Status
Data parallelism	✓
Tensor parallelism	✓
Pipeline parallelism	✓
Interleaved Pipeline Parallelism Sched	N/A
Sequence parallelism	✓
Selective activation checkpointing	✓
Gradient checkpointing	✓
Partial gradient checkpointing	✓
FP32/TF32	✓
AMP/FP16	✗
BF16	✓
TransformerEngine/FP8	✗
Multi-GPU	✓
Multi-Node	✓
Inference	N/A
Slurm	✓
Base Command Manager	✓
Base Command Platform	✓
Distributed data preprcessing	✓
NVfuser	✗
P-Tuning and Prompt Tuning	N/A
IA3 and Adapter learning	N/A
Distributed Optimizer	✓
Distributed Checkpoint	✓
Fully Shared Data Parallel	N/A
Torch Distributed Checkpoint	✓

Previous GPT Results

Next Data Preparation