Performance - NVIDIA Docs

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide (Latest) Performance

User Guide (Latest Version)

Training Performance Results

We measured the throughput of training VideNeVA models on different numbers of DGX H100 nodes and achieved near-linear scaling on the DGX H100 platform.

The following table and chart show the pretraining performance results for the NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for VideoNeVA Llama2 Chat 13B Model Pretraining).

		1	2	4	8	16
VideoNeVA Llama2 Chat 13B	Samples per Second	53	106	211	424	822
	Perfect Linear Scaling (Samples)	37	107	214	428	857
	Speedup	1x	1.99x	3.94x	7.93x	15.36x

VideoNeVA Llama2 Chat 13B NeMo Throughput (H100).svg

Previous Framework Inference

Next Vision-Language Foundation Models