We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.
The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
VideoNeVA Llama2 Chat 13B | Samples per Second | 53 | 106 | 211 | 424 | 822 |
Perfect Linear Scaling (Samples) | 37 | 107 | 214 | 428 | 857 | |
Speedup | 1x | 1.99x | 3.94x | 7.93x | 15.36x |