Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
mT5
Released in 2020, Google’s mT5 is a multilingual language model based on the transformer encoder-decoder framework. Leveraging a massive multilingual dataset with over 100 languages, mT5 achieved state-of-the-art benchmark scores across multiple languages. With offerings in “small”, “base”, “large”, “xl”, and “xxl”, mt5 has a variety of model sizes to suit a variety of needs. More information is available in the companion paper “mT5: A massively multilingual pre-trained text-to-text transformer”.
Feature |
T5/mT5 |
---|---|
Data parallelism |
✓ |
Tensor parallelism |
✓ |
Pipeline parallelism |
✓ |
Interleaved Pipeline Parallelism Sched |
N/A |
Sequence parallelism |
✗ |
Selective activation checkpointing |
✗ |
Gradient checkpointing |
✓ |
Partial gradient checkpointing |
✓ |
FP32/TF32 |
✓ |
AMP/FP16 |
✗ |
BF16 |
✓ |
TransformerEngine/FP8 |
✗ |
Multi-GPU |
✓ |
Multi-Node |
✓ |
Inference |
N/A |
Slurm |
✓ |
Base Command Manager |
✓ |
Base Command Platform |
✓ |
Distributed data preprcessing |
✓ |
NVfuser |
✗ |
P-Tuning and Prompt Tuning |
✓ |
IA3 and Adapter learning |
✓ |
Distributed Optimizer |
✓ |
Distributed Checkpoint |
N/A |
Fully Shared Data Parallel |
N/A |