core.models.multimodal.context_parallel#

Multimodal Sequence Parallel (SP) and Context Parallel (CP) functionality.

Module Contents#

Functions#

get_padding

Calculate padding needed for SP, CP, TP comm overlap, and FP8.

get_packed_seq_params

Get PackedSeqParams for CP.

API#

core.models.multimodal.context_parallel.get_padding(
seq_len,
cp_size,
tp_size,
has_sp,
decoder_tp_comm_overlap=False,
decoder_seq_len=None,
fp8_enabled=False,
fp8_recipe=None,
)#

Calculate padding needed for SP, CP, TP comm overlap, and FP8.

Parameters:
  • seq_len (int) – Model sequence length.

  • cp_size (int) – Context parallel size.

  • tp_size (int) – Tensor parallel size.

  • has_sp (bool) – Model uses sequence parallelism.

  • decoder_tp_comm_overlap (bool) – Decoder (LLM) uses tensor parallel communication overlap.

  • decoder_seq_len (int) – Decoder (LLM) maximum sequence length.

  • fp8_enabled (bool) – FP8 is enabled.

  • fp8_recipe (str) – FP8 recipe. Affects required padding.

Returns:

Padding needed given model configuration.

Return type:

padding (int)

core.models.multimodal.context_parallel.get_packed_seq_params(
tokens,
img_seq_len,
padding_needed,
cp_size,
use_packed_sequence=False,
)#

Get PackedSeqParams for CP.

Parameters:
  • tokens (torch.Tensor) – [batch, seq_len] input tokens.

  • img_seq_len (int) – Image sequence length.

  • padding_needed (int) – Padding to add.

  • cp_size (int) – Context parallel size.

  • use_packed_sequence (bool) – Uses sequence packing.

Returns:

Parameters to be sent to Transformer Engine.

Return type:

packed_seq_params (PackedSeqParams)