Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Qwen2
Released in 2024, Qwen2 is a new series of large language models from the Alibaba group. Qwen2 is trained on data in 29 languages, including English and Chinese. Qwen2 uses GQA across all model sizes, which results in faster inference speeds and reduced memory usage. Regarding context length, all base language models are pretrained on data with a context length of 32K tokens. Additionally, the Qwen2-7B-Instruct and Qwen2-72B-Instruct models are capable of managing context lengths of up to 128K tokens. More information is available in the blog, “Hello Qwen2”.
Feature |
Status |
---|---|
Data parallelism |
✓ |
Tensor parallelism |
✓ |
Pipeline parallelism |
✓ |
Interleaved Pipeline Parallelism Sched |
N/A |
Sequence parallelism |
✓ |
Selective activation checkpointing |
✓ |
Gradient checkpointing |
✓ |
Partial gradient checkpointing |
✓ |
FP32/TF32 |
✓ |
AMP/FP16 |
✗ |
BF16 |
✓ |
TransformerEngine/FP8 |
✗ |
Multi-GPU |
✓ |
Multi-Node |
✓ |
Inference |
N/A |
Slurm |
✓ |
Base Command Manager |
✓ |
Base Command Platform |
✓ |
Distributed data preprcessing |
✓ |
NVfuser |
✗ |
P-Tuning and Prompt Tuning |
✓ |
IA3 and Adapter learning |
N/A |
Distributed Optimizer |
✓ |
Distributed Checkpoint |
✓ |
Fully Shared Data Parallel |
N/A |