`core.pipeline_parallel.p2p_communication`#

Module Contents#

Classes#

P2PCommunicator

P2P (Point-to-Point) Communicator for pipeline parallelism.

Functions#

`_batched_p2p_ops`
`_p2p_ops`
`is_single_shape`	Check if the input is a single shape.

Data#

Shape

API#

core.pipeline_parallel.p2p_communication.Shape#: None

core.pipeline_parallel.p2p_communication._batched_p2p_ops( *, tensor_send_prev: Optional[torch.Tensor], tensor_recv_prev: Optional[torch.Tensor], tensor_send_next: Optional[torch.Tensor], tensor_recv_next: Optional[torch.Tensor], group: torch.distributed.ProcessGroup, prev_pipeline_rank: int, next_pipeline_rank: int, )#

core.pipeline_parallel.p2p_communication._p2p_ops( *, tensor_send_prev: Optional[torch.Tensor], tensor_recv_prev: Optional[torch.Tensor], tensor_send_next: Optional[torch.Tensor], tensor_recv_next: Optional[torch.Tensor], group: torch.distributed.ProcessGroup, prev_pipeline_rank: int, next_pipeline_rank: int, )#

core.pipeline_parallel.p2p_communication.is_single_shape(x) → bool#: Check if the input is a single shape.

class core.pipeline_parallel.p2p_communication.P2PCommunicator( pp_group: torch.distributed.ProcessGroup, config: megatron.core.model_parallel_config.ModelParallelConfig, )#

P2P (Point-to-Point) Communicator for pipeline parallelism.

This class handles communication between pipeline stages by managing tensor exchanges between consecutive stages in the pipeline.

Initialization

_communicate_shapes( tensor_send_next, tensor_send_prev, recv_prev, recv_next, )#

Communicate tensor shapes between stages. Used to communicate tensor shapes before the actual tensor communication happens. This is required when the sequence lengths across micro batches are not uniform.

Parameters:

tensor_send_next – tensor to send to next rank (no tensor sent if set to None).
tensor_send_prev – tensor to send to prev rank (no tensor sent if set to None).
recv_prev – boolean for whether tensor should be received from previous rank.
recv_next – boolean for whether tensor should be received from next rank.

Returns:

(recv_prev_shape, recv_next_shape)

_communicate( *, tensor_send_next: Optional[torch.Tensor], tensor_send_prev: Optional[torch.Tensor], recv_prev: bool, recv_next: bool, tensor_shape: core.pipeline_parallel.p2p_communication.Shape, wait_on_reqs: bool = True, ) → Tuple[torch.Tensor, torch.Tensor]#

Communicate tensors between stages. Used as helper method in other communication methods that are used in megatron/schedules.py.