Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Performance
Training Performance Results
We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.
The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
VideoNeVA Llama2 Chat 13B |
Samples per Second |
53 |
106 |
211 |
424 |
822 |
Perfect Linear Scaling (Samples) |
37 |
107 |
214 |
428 |
857 |
|
Speedup |
1x |
1.99x |
3.94x |
7.93x |
15.36x |