Performance
Training Performance Results
We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.
The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
VideoNeVA Llama2 Chat 13B |
Samples per Second |
53 |
106 |
211 |
424 |
822 |
Perfect Linear Scaling (Samples) |
37 |
107 |
214 |
428 |
857 |
|
Speedup |
1x |
1.99x |
3.94x |
7.93x |
15.36x |