Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Qwen2

Released in 2024, Qwen2 is a new series of large language models from the Alibaba group. Qwen2 is trained on data in 29 languages, including English and Chinese. Qwen2 uses GQA across all model sizes, which results in faster inference speeds and reduced memory usage. Regarding context length, all base language models are pretrained on data with a context length of 32K tokens. Additionally, the Qwen2-7B-Instruct and Qwen2-72B-Instruct models are capable of managing context lengths of up to 128K tokens. More information is available in the blog, “Hello Qwen2”.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

N/A

Slurm

Base Command Manager

Base Command Platform

Distributed data preprcessing

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

N/A

Distributed Optimizer

Distributed Checkpoint

Fully Shared Data Parallel

N/A