Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Performance#
Training Quality Results#
Training Accuracy: NVIDIA DGX SuperPOD (8 x 8 x A100 80GB)
Here we show two responses generated by NeVA 13B model.
prompt_1: <image>Describe this painting in detail.
response_1: The painting is a portrait of a woman, likely Leonardo da Vinci's famous painting, \"Mona Lisa.\"
The woman is depicted with a smile on her face, which is a characteristic feature of the painting. The portrait
is painted in a realistic style, capturing the woman's facial expression and features in great detail. The
painting is set against a backdrop of a mountainous landscape, which adds depth and context to the scene. The
overall composition of the painting is visually appealing and showcases the artist's skill in capturing the
essence of the subject.
prompt_2: <image>Discuss the historical impact and the significance of this painting in the art world.
response_2: The painting, featuring a woman with a smiling expression and a red background, is a portrait of Mona
Lisa, created by the famous artist Leonardo da Vinci. Mona Lisa is widely considered one of the most iconic paintings
in the history of art, and it has had a significant impact on the art world. The painting's mysterious and enigmatic
smile has captivated viewers for centuries, and it has been the subject of numerous interpretations and artistic
reinterpretations.
The painting's historical significance lies in its representation of the Renaissance period,
which marked a significant shift in artistic styles and techniques. The painting's intricate details, such as the
subtle smile and the sfumato technique used to create a soft, hazy effect, showcase the mastery of Leonardo da Vinci
and the artistic advancements of the time.
Moreover, the painting's enduring popularity and influence have made
it a symbol of artistic excellence and a cultural icon. It has inspired countless artists, photographers, and
filmmakers, and continues to be a source of fascination and inspiration for people around the world.
Training Performance Results#
We measured the throughput of training and fine-tuning NeVA models on different numbers of DGX A100 nodes and DGX H100 nodes, and we achieved near-linear scaling on both platforms.
We are comparing the out-of-box performance on DGX H100 machines with the same configuration from DGX A100 machines. This comparison is an apple-to-apple assessment, ensuring that we evaluate the relative performance of the two machine types under equivalent conditions and configurations.
The tables and charts below show the performance results.
Pretraining Performance:
NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Pretraining)
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
NeVA Llama2 Chat 13B |
Samples per Second |
37 |
74 |
145 |
285 |
554 |
Perfect Linear Scaling (Samples) |
37 |
75 |
150 |
299 |
598 |
|
Speedup |
1x |
1.99x |
3.87x |
7.63x |
14.82x |
NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Pretraining)
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
NeVA Llama2 Chat 13B |
Samples per Second |
79 |
157 |
302 |
569 |
1124 |
Perfect Linear Scaling (Samples) |
79 |
158 |
315 |
631 |
1262 |
|
Speedup |
1x |
1.99x |
3.83x |
7.21x |
14.26x |
Fine-tuning Performance:
NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
NeVA Llama2 Chat 13B |
Samples per Second |
12 |
23 |
45 |
88 |
172 |
Perfect Linear Scaling (Samples) |
12 |
25 |
49 |
99 |
197 |
|
Speedup |
1x |
1.88x |
3.68x |
7.11x |
13.94x |
NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)
1 |
2 |
4 |
8 |
16 |
||
---|---|---|---|---|---|---|
NeVA Llama2 Chat 13B |
Samples per Second |
24 |
43 |
86 |
164 |
316 |
Perfect Linear Scaling (Samples) |
24 |
47 |
94 |
189 |
377 |
|
Speedup |
1x |
1.83x |
3.66x |
6.96x |
13.41x |
DGX A100 vs. DGX H100: A Comparative Analysis of NeVA Training
Model |
Stage |
Nodes |
Global Batch Size |
Micro Batch Size |
Precision |
Global Batch / Sec (A100) |
Global Batch / Sec (H100) |
Speedup (x) |
---|---|---|---|---|---|---|---|---|
NeVA Llama2 Chat 7B |
Pretraining |
2 |
128 |
32 |
bf16 (O2) |
1.30 |
2.67 |
2.1 |
NeVA Llama2 Chat 13B |
Pretraining |
2 |
128 |
32 |
bf16 (O2) |
0.58 |
1.21 |
2.1 |
NeVA Llama2 Chat 7B |
Fine-tuning |
4 |
32 |
4 |
bf16 (O2) |
2.94 |
5.81 |
2.0 |
NeVA Llama2 Chat 13B |
Fine-tuning |
4 |
32 |
4 |
bf16 (O2) |
1.41 |
2.68 |
1.9 |