Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Performance#

Training Performance Results#

We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.

The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).

1

2

4

8

16

VideoNeVA Llama2 Chat 13B

Samples per Second

53

106

211

424

822

Perfect Linear Scaling (Samples)

37

107

214

428

857

Speedup

1x

1.99x

3.94x

7.93x

15.36x

../../../_images/VideoNeVA%20Llama2%20Chat%2013B%20NeMo%20Throughput%20%28H100%29.svg