Performance - NVIDIA Docs

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Performance

Training Quality Results

Training Accuracy: NVIDIA DGX SuperPOD (8 x 8 x A100 80GB)

Here we show two responses generated by NeVA 13B model.

Copy
Copied!

            
            prompt_1: <image>Describe this painting in detail.
response_1: The painting is a portrait of a woman, likely Leonardo da Vinci's famous painting, \"Mona Lisa.\"
The woman is depicted with a smile on her face, which is a characteristic feature of the painting. The portrait
is painted in a realistic style, capturing the woman's facial expression and features in great detail. The
painting is set against a backdrop of a mountainous landscape, which adds depth and context to the scene. The
overall composition of the painting is visually appealing and showcases the artist's skill in capturing the
essence of the subject.

prompt_2: <image>Discuss the historical impact and the significance of this painting in the art world.
response_2: The painting, featuring a woman with a smiling expression and a red background, is a portrait of Mona
Lisa, created by the famous artist Leonardo da Vinci. Mona Lisa is widely considered one of the most iconic paintings
in the history of art, and it has had a significant impact on the art world. The painting's mysterious and enigmatic
smile has captivated viewers for centuries, and it has been the subject of numerous interpretations and artistic
reinterpretations.

The painting's historical significance lies in its representation of the Renaissance period,
which marked a significant shift in artistic styles and techniques. The painting's intricate details, such as the
subtle smile and the sfumato technique used to create a soft, hazy effect, showcase the mastery of Leonardo da Vinci
and the artistic advancements of the time.

Moreover, the painting's enduring popularity and influence have made
it a symbol of artistic excellence and a cultural icon. It has inspired countless artists, photographers, and
filmmakers, and continues to be a source of fascination and inspiration for people around the world.

Training Performance Results

We measured the throughput of training and fine-tuning NeVA models on different numbers of DGX A100 nodes and DGX H100 nodes, and we achieved near-linear scaling on both platforms.

We are comparing the out-of-box performance on DGX H100 machines with the same configuration from DGX A100 machines. This comparison is an apple-to-apple assessment, ensuring that we evaluate the relative performance of the two machine types under equivalent conditions and configurations.

The tables and charts below show the performance results.

Pretraining Performance:

NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Pretraining)

		1	2	4	8	16
NeVA Llama2 Chat 13B	Samples per Second	37	74	145	285	554
	Perfect Linear Scaling (Samples)	37	75	150	299	598
	Speedup	1x	1.99x	3.87x	7.63x	14.82x

NeVA Llama2 Chat 13B NeMo Throughput (A100).svg

NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Pretraining)

		1	2	4	8	16
NeVA Llama2 Chat 13B	Samples per Second	79	157	302	569	1124
	Perfect Linear Scaling (Samples)	79	158	315	631	1262
	Speedup	1x	1.99x	3.83x	7.21x	14.26x

NeVA Llama2 Chat 13B NeMo Throughput (H100).svg

Fine-tuning Performance:

NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)

		1	2	4	8	16
NeVA Llama2 Chat 13B	Samples per Second	12	23	45	88	172
	Perfect Linear Scaling (Samples)	12	25	49	99	197
	Speedup	1x	1.88x	3.68x	7.11x	13.94x

NeVA Llama2 Chat 13B NeMo Throughput (A100) FT.svg

NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)

		1	2	4	8	16
NeVA Llama2 Chat 13B	Samples per Second	24	43	86	164	316
	Perfect Linear Scaling (Samples)	24	47	94	189	377
	Speedup	1x	1.83x	3.66x	6.96x	13.41x

NeVA Llama2 Chat 13B NeMo Throughput (H100) FT.svg

DGX A100 vs. DGX H100: A Comparative Analysis of NeVA Training

Model	Stage	Nodes	Global Batch Size	Micro Batch Size	Precision	Global Batch / Sec (A100)	Global Batch / Sec (H100)	Speedup (x)
NeVA Llama2 Chat 7B	Pretraining	2	128	32	bf16 (O2)	1.30	2.67	2.1
NeVA Llama2 Chat 13B	Pretraining	2	128	32	bf16 (O2)	0.58	1.21	2.1
NeVA Llama2 Chat 7B	Fine-tuning	4	32	4	bf16 (O2)	2.94	5.81	2.0
NeVA Llama2 Chat 13B	Fine-tuning	4	32	4	bf16 (O2)	1.41	2.68	1.9

Previous Model Deployment

Next Vision-Language Foundation Models