Custom Pipeline Model Parallel Layout#

This is an experimental feature and may be changed.

--pipeline-model-parallel-layout takes a string that defines pipeline parallel partitioning. Use it to balance partitioning for an imbalanced model. For example, to partition a DeepSeek-V3-style stack (61 decoder layers and one MTP layer) with PP16 and VPP2, pass arguments similar to the following:

--pipeline-model-parallel-size 16
--pipeline-model-parallel-layout "Et*3|(tt|)*29,m|L"

The table below shows one possible rank map for that layout:

PP \ VPP rank

0

1

0

embedding + 3 × decoder

2 × decoder

1~13

2 × decoder

2 × decoder

14

2 × decoder

mtp

15

2 × decoder

loss

In the layout string, stages are split by |. Replicated stages or layers use multiplication (for example, t*3). Commas are optional for readability. Symbols:

  • E: embedding layer

  • t: transformer decoder layer

  • m: MTP layer

  • L: loss calculation layer

Note: Empty stages are allowed, for example E||t|L (the second stage is empty).