Performance

Training Performance Results

We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.

The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).

1

2

4

8

16

VideoNeVA Llama2 Chat 13B

Samples per Second

53

106

211

424

822

Perfect Linear Scaling (Samples)

37

107

214

428

857

Speedup

1x

1.99x

3.94x

7.93x

15.36x

../../../_images/VideoNeVA Llama2 Chat 13B NeMo Throughput (H100).svg