Feature | Training | Inference |
Data parallelism | Yes | N/A |
Tensor parallelism | Yes | Yes |
Pipeline parallelism | Yes | Yes (Megatron-LM checkpoints) |
Interleaved Pipeline Parallelism Schedule | Yes | N/A |
Sequence parallelism | Yes | No |
Selective activation checkpointing | Yes | No |
Gradient checkpointing | Yes | N/A |
Partial gradient checkpointing | Yes | N/A |
FP32/TF32 | Yes | Yes (FP16 enabled by default) |
AMP/FP16 | No | Yes |
BF16 | Yes | Yes |
TransformerEngine/FP8 | Yes | No |
Multi-GPU | Yes | Yes |
Multi-Node | Yes | Yes |
Inference | N/A | NVIDIA Triton supported TRT-LLM |
Software stack support | Slurm / Base Command Manager / Base Command Platform | Slurm / Base Command Manager / Base Command Platform |
Distributed data preprcessing | Yes (the Pile only) | N/A |
NVfuser | No | N/A |
P-Tuning and Prompt Tuning | Yes | N/A |
IA3 and Adapter learning | Yes | N/A |
Distributed Optimizer | Yes | N/A |
Distributed Checkpoint | Yes | N/A |
Feature | Training |
Data parallelism | Yes |
Tensor parallelism | Yes |
Pipeline parallelism | Yes |
Sequence parallelism | No |
Selective activation checkpointing | No |
Gradient checkpointing | Yes |
Partial gradient checkpointing | Yes |
FP32/TF32 | Yes |
AMP/FP16 | No |
BF16 | Yes |
Multi-GPU | Yes |
Multi-Node | Yes |
Inference deployment | N/A |
Software stack support | Slurm / Base Command |
Distributed data preprocessing | Yes (Pile dataset for T5, mC4 dataset for mT5) |
NVfuser | No |
P-Tuning and Prompt Tuning | Yes |
IA3 and Adapter learning | Yes |
AutoConfigurator | Yes |
Distributed Optimizer | Yes |
Mixture of Experts | Yes (no expert parallelism) |
Feature | Training |
Data parallelism | Yes |
Tensor parallelism | Yes |
Pipeline parallelism | Yes |
Sequence parallelism | Yes |
Selective activation checkpointing | Yes |
Gradient checkpointing | Yes |
Partial gradient checkpointing | Yes |
FP32/TF32 | Yes |
AMP/FP16 | No |
BF16 | Yes |
Multi-GPU | Yes |
Multi-Node | Yes |
Inference deployment | N/A |
Software stack support | Slurm / Base Command Manager / Base Command Platform |
Distributed data preprocessing | Yes (the Pile only) |
NVfuser | Yes |
P-Tuning and Prompt Tuning | N/A |
IA3 and Adapter learning | N/A |
Distributed Optimizer | Yes |