Feature Matrix

Feature Training Inference
Data parallelism Yes N/A
Tensor parallelism Yes Yes
Pipeline parallelism Yes Yes (Megatron-LM checkpoints)
Interleaved Pipeline Parallelism Schedule Yes N/A
Sequence parallelism Yes No
Selective activation checkpointing Yes No
Gradient checkpointing Yes N/A
Partial gradient checkpointing Yes N/A
FP32/TF32 Yes Yes (FP16 enabled by default)
AMP/FP16 No Yes
BF16 Yes Yes
TransformerEngine/FP8 Yes No
Multi-GPU Yes Yes
Multi-Node Yes Yes
Inference N/A NVIDIA Triton supported TRT-LLM
Software stack support Slurm / Base Command Manager / Base Command Platform Slurm / Base Command Manager / Base Command Platform
Distributed data preprcessing Yes (the Pile only) N/A
NVfuser No N/A
P-Tuning and Prompt Tuning Yes N/A
IA3 and Adapter learning Yes N/A
Distributed Optimizer Yes N/A
Distributed Checkpoint Yes N/A
Feature Training
Data parallelism Yes
Tensor parallelism Yes
Pipeline parallelism Yes
Sequence parallelism No
Selective activation checkpointing No
Gradient checkpointing Yes
Partial gradient checkpointing Yes
FP32/TF32 Yes
AMP/FP16 No
BF16 Yes
Multi-GPU Yes
Multi-Node Yes
Inference deployment N/A
Software stack support Slurm / Base Command
Distributed data preprocessing Yes (Pile dataset for T5, mC4 dataset for mT5)
NVfuser No
P-Tuning and Prompt Tuning Yes
IA3 and Adapter learning Yes
AutoConfigurator Yes
Distributed Optimizer Yes
Mixture of Experts Yes (no expert parallelism)
Feature Training
Data parallelism Yes
Tensor parallelism Yes
Pipeline parallelism Yes
Sequence parallelism Yes
Selective activation checkpointing Yes
Gradient checkpointing Yes
Partial gradient checkpointing Yes
FP32/TF32 Yes
AMP/FP16 No
BF16 Yes
Multi-GPU Yes
Multi-Node Yes
Inference deployment N/A
Software stack support Slurm / Base Command Manager / Base Command Platform
Distributed data preprocessing Yes (the Pile only)
NVfuser Yes
P-Tuning and Prompt Tuning N/A
IA3 and Adapter learning N/A
Distributed Optimizer Yes
© Copyright 2023, NVIDIA. Last updated on Dec 6, 2023.