Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Performance#

Training Quality Results#

Training Accuracy: NVIDIA DGX SuperPOD (8 x 8 x A100 80GB)

Here we show two responses generated by NeVA 13B model.

../../../_images/mona_lisa.jpg
prompt_1: <image>Describe this painting in detail.
response_1: The painting is a portrait of a woman, likely Leonardo da Vinci's famous painting, \"Mona Lisa.\"
The woman is depicted with a smile on her face, which is a characteristic feature of the painting. The portrait
is painted in a realistic style, capturing the woman's facial expression and features in great detail. The
painting is set against a backdrop of a mountainous landscape, which adds depth and context to the scene. The
overall composition of the painting is visually appealing and showcases the artist's skill in capturing the
essence of the subject.

prompt_2: <image>Discuss the historical impact and the significance of this painting in the art world.
response_2: The painting, featuring a woman with a smiling expression and a red background, is a portrait of Mona
Lisa, created by the famous artist Leonardo da Vinci. Mona Lisa is widely considered one of the most iconic paintings
in the history of art, and it has had a significant impact on the art world. The painting's mysterious and enigmatic
smile has captivated viewers for centuries, and it has been the subject of numerous interpretations and artistic
reinterpretations.

The painting's historical significance lies in its representation of the Renaissance period,
which marked a significant shift in artistic styles and techniques. The painting's intricate details, such as the
subtle smile and the sfumato technique used to create a soft, hazy effect, showcase the mastery of Leonardo da Vinci
and the artistic advancements of the time.

Moreover, the painting's enduring popularity and influence have made
it a symbol of artistic excellence and a cultural icon. It has inspired countless artists, photographers, and
filmmakers, and continues to be a source of fascination and inspiration for people around the world.

Training Performance Results#

We measured the throughput of training and fine-tuning NeVA models on different numbers of DGX A100 nodes and DGX H100 nodes, and we achieved near-linear scaling on both platforms.

We are comparing the out-of-box performance on DGX H100 machines with the same configuration from DGX A100 machines. This comparison is an apple-to-apple assessment, ensuring that we evaluate the relative performance of the two machine types under equivalent conditions and configurations.

The tables and charts below show the performance results.

Pretraining Performance:

  • NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Pretraining)

1

2

4

8

16

NeVA Llama2 Chat 13B

Samples per Second

37

74

145

285

554

Perfect Linear Scaling (Samples)

37

75

150

299

598

Speedup

1x

1.99x

3.87x

7.63x

14.82x

../../../_images/NeVA%20Llama2%20Chat%2013B%20NeMo%20Throughput%20%28A100%29.svg
  • NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Pretraining)

1

2

4

8

16

NeVA Llama2 Chat 13B

Samples per Second

79

157

302

569

1124

Perfect Linear Scaling (Samples)

79

158

315

631

1262

Speedup

1x

1.99x

3.83x

7.21x

14.26x

../../../_images/NeVA%20Llama2%20Chat%2013B%20NeMo%20Throughput%20%28H100%29.svg

Fine-tuning Performance:

  • NVIDIA DGX SuperPODs (16 x 8 x A100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)

1

2

4

8

16

NeVA Llama2 Chat 13B

Samples per Second

12

23

45

88

172

Perfect Linear Scaling (Samples)

12

25

49

99

197

Speedup

1x

1.88x

3.68x

7.11x

13.94x

../../../_images/NeVA%20Llama2%20Chat%2013B%20NeMo%20Throughput%20%28A100%29%20FT.svg
  • NVIDIA DGX SuperPODs (16 x 8 x H100 80GB for NeVA Llama2 Chat 13B Model Fine-tuning)

1

2

4

8

16

NeVA Llama2 Chat 13B

Samples per Second

24

43

86

164

316

Perfect Linear Scaling (Samples)

24

47

94

189

377

Speedup

1x

1.83x

3.66x

6.96x

13.41x

../../../_images/NeVA%20Llama2%20Chat%2013B%20NeMo%20Throughput%20%28H100%29%20FT.svg
  • DGX A100 vs. DGX H100: A Comparative Analysis of NeVA Training

Model

Stage

Nodes

Global Batch Size

Micro Batch Size

Precision

Global Batch / Sec (A100)

Global Batch / Sec (H100)

Speedup (x)

NeVA Llama2 Chat 7B

Pretraining

2

128

32

bf16 (O2)

1.30

2.67

2.1

NeVA Llama2 Chat 13B

Pretraining

2

128

32

bf16 (O2)

0.58

1.21

2.1

NeVA Llama2 Chat 7B

Fine-tuning

4

32

4

bf16 (O2)

2.94

5.81

2.0

NeVA Llama2 Chat 13B

Fine-tuning

4

32

4

bf16 (O2)

1.41

2.68

1.9

../../../_images/NeVA%20Training%20Throughput%20Comparison.svg