Performance

Training Accuracy: NVIDIA DGX SuperPOD (8 x 8 x A100 80GB)

Here we show two responses generated by NeVA 13B model.

mona_lisa.jpg

Copy
Copied!
            

prompt_1: <image>Describe this painting in detail. response_1: The painting is a portrait of a woman, likely Leonardo da Vinci's famous painting, \"Mona Lisa.\" The woman is depicted with a smile on her face, which is a characteristic feature of the painting. The portrait is painted in a realistic style, capturing the woman's facial expression and features in great detail. The painting is set against a backdrop of a mountainous landscape, which adds depth and context to the scene. The overall composition of the painting is visually appealing and showcases the artist's skill in capturing the essence of the subject. prompt_2: <image>Discuss the historical impact and the significance of this painting in the art world. response_2: The painting, featuring a woman with a smiling expression and a red background, is a portrait of Mona Lisa, created by the famous artist Leonardo da Vinci. Mona Lisa is widely considered one of the most iconic paintings in the history of art, and it has had a significant impact on the art world. The painting's mysterious and enigmatic smile has captivated viewers for centuries, and it has been the subject of numerous interpretations and artistic reinterpretations. The painting's historical significance lies in its representation of the Renaissance period, which marked a significant shift in artistic styles and techniques. The painting's intricate details, such as the subtle smile and the sfumato technique used to create a soft, hazy effect, showcase the mastery of Leonardo da Vinci and the artistic advancements of the time. Moreover, the painting's enduring popularity and influence have made it a symbol of artistic excellence and a cultural icon. It has inspired countless artists, photographers, and filmmakers, and continues to be a source of fascination and inspiration for people around the world.

We measured the throughput of training and fine-tuning NeVA models on different numbers of DGX A100 nodes and DGX H100 nodes, and we achieved near-linear scaling on both platforms.

We are comparing the out-of-box performance on DGX H100 machines with the same configuration from DGX A100 machines. This comparison is an apple-to-apple assessment, ensuring that we evaluate the relative performance of the two machine types under equivalent conditions and configurations.

The tables and charts below show the performance results.

Pretraining Performance:

  • NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Pretraining)

1

2

4

8

16

NeVA Llama2 Chat 13B Samples per Second 37 74 145 285 554
Perfect Linear Scaling (Samples) 37 75 150 299 598
Speedup 1x 1.99x 3.87x 7.63x 14.82x
NeVA Llama2 Chat 13B NeMo Throughput (A100).svg

  • NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Pretraining)

1

2

4

8

16

NeVA Llama2 Chat 13B Samples per Second 79 157 302 569 1124
Perfect Linear Scaling (Samples) 79 158 315 631 1262
Speedup 1x 1.99x 3.83x 7.21x 14.26x
NeVA Llama2 Chat 13B NeMo Throughput (H100).svg

Fine-tuning Performance:

  • NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)

1

2

4

8

16

NeVA Llama2 Chat 13B Samples per Second 12 23 45 88 172
Perfect Linear Scaling (Samples) 12 25 49 99 197
Speedup 1x 1.88x 3.68x 7.11x 13.94x
NeVA Llama2 Chat 13B NeMo Throughput (A100) FT.svg

  • NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)

1

2

4

8

16

NeVA Llama2 Chat 13B Samples per Second 24 43 86 164 316
Perfect Linear Scaling (Samples) 24 47 94 189 377
Speedup 1x 1.83x 3.66x 6.96x 13.41x
NeVA Llama2 Chat 13B NeMo Throughput (H100) FT.svg

  • DGX A100 vs. DGX H100: A Comparative Analysis of NeVA Training

Model

Stage

Nodes

Global Batch Size

Micro Batch Size

Precision

Global Batch / Sec (A100)

Global Batch / Sec (H100)

Speedup (x)

NeVA Llama2 Chat 7B Pretraining 2 128 32 bf16 (O2) 1.30 2.67 2.1
NeVA Llama2 Chat 13B Pretraining 2 128 32 bf16 (O2) 0.58 1.21 2.1
NeVA Llama2 Chat 7B Fine-tuning 4 32 4 bf16 (O2) 2.94 5.81 2.0
NeVA Llama2 Chat 13B Fine-tuning 4 32 4 bf16 (O2) 1.41 2.68 1.9
NeVA Training Throughput Comparison.svg

Previous Model Deployment
Next Vision-Language Foundation Models
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.