Parallelisms

NeMo Megatron supports 5 types of parallelisms (which can be mixed together arbitraritly):

Distributed Data parallelism (DDP) creates idential copies of the model across multiple GPUs.

ddp.gif

With Tensor Paralellism (TP) a tensor is split into non-overlapping pieces and different parts are distributed and processed on separate GPUs.

tp.gif

With Pipeline Paralellism (PP) consecutive layer chunks are assigned to different GPUs.

pp.gif

Expert Paralellim (EP) distributes experts across GPUs.

ep.png

When reading and modifying NeMo Megatron code you will encounter the following terms.

pnom.gif

Previous Batching
Next NeMo RETRO Model
© Copyright 2023-2024, NVIDIA. Last updated on Apr 22, 2024.