Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
mT5
Released in 2020, Google’s mT5 is a multilingual language model based on the transformer encoder-decoder framework. Leveraging a massive multilingual dataset with over 100 languages, mT5 achieved state-of-the-art benchmark scores across multiple languages. With offerings in “small”, “base”, “large”, “xl”, and “xxl”, mt5 has a variety of model sizes to suit a variety of needs. More information is available in the companion paper “mT5: A massively multilingual pre-trained text-to-text transformer”.
| Feature | T5/mT5 | 
|---|---|
| Data parallelism | ✓ | 
| Tensor parallelism | ✓ | 
| Pipeline parallelism | ✓ | 
| Interleaved Pipeline Parallelism Sched | N/A | 
| Sequence parallelism | ✗ | 
| Selective activation checkpointing | ✗ | 
| Gradient checkpointing | ✓ | 
| Partial gradient checkpointing | ✓ | 
| FP32/TF32 | ✓ | 
| AMP/FP16 | ✗ | 
| BF16 | ✓ | 
| TransformerEngine/FP8 | ✗ | 
| Multi-GPU | ✓ | 
| Multi-Node | ✓ | 
| Inference | N/A | 
| Slurm | ✓ | 
| Base Command Manager | ✓ | 
| Base Command Platform | ✓ | 
| Distributed data preprcessing | ✓ | 
| NVfuser | ✗ | 
| P-Tuning and Prompt Tuning | ✓ | 
| IA3 and Adapter learning | ✓ | 
| Distributed Optimizer | ✓ | 
| Distributed Checkpoint | N/A | 
| Fully Shared Data Parallel | N/A |