Performance

User Guide (Latest Version)

We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.

The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).

1

2

4

8

16

VideoNeVA Llama2 Chat 13B Samples per Second 53 106 211 424 822
Perfect Linear Scaling (Samples) 37 107 214 428 857
Speedup 1x 1.99x 3.94x 7.93x 15.36x
VideoNeVA Llama2 Chat 13B NeMo Throughput (H100).svg

Previous Framework Inference
Next Vision-Language Foundation Models
© | | | | | | |. Last updated on Jun 19, 2024.