core.models.multimodal.context_parallel#
Multimodal Sequence Parallel (SP) and Context Parallel (CP) functionality.
Module Contents#
Functions#
Calculate padding needed for SP, CP, TP comm overlap, and FP8. |
|
Get PackedSeqParams for CP. |
API#
- core.models.multimodal.context_parallel.get_padding(
- seq_len,
- cp_size,
- tp_size,
- has_sp,
- decoder_tp_comm_overlap=False,
- decoder_seq_len=None,
- fp8_enabled=False,
- fp8_recipe=None,
Calculate padding needed for SP, CP, TP comm overlap, and FP8.
- Parameters:
seq_len (int) – Model sequence length.
cp_size (int) – Context parallel size.
tp_size (int) – Tensor parallel size.
has_sp (bool) – Model uses sequence parallelism.
decoder_tp_comm_overlap (bool) – Decoder (LLM) uses tensor parallel communication overlap.
decoder_seq_len (int) – Decoder (LLM) maximum sequence length.
fp8_enabled (bool) – FP8 is enabled.
fp8_recipe (str) – FP8 recipe. Affects required padding.
- Returns:
Padding needed given model configuration.
- Return type:
padding (int)
- core.models.multimodal.context_parallel.get_packed_seq_params(
- tokens,
- img_seq_len,
- padding_needed,
- cp_size,
- use_packed_sequence=False,
Get PackedSeqParams for CP.
- Parameters:
tokens (torch.Tensor) – [batch, seq_len] input tokens.
img_seq_len (int) – Image sequence length.
padding_needed (int) – Padding to add.
cp_size (int) – Context parallel size.
use_packed_sequence (bool) – Uses sequence packing.
- Returns:
Parameters to be sent to Transformer Engine.
- Return type:
packed_seq_params (PackedSeqParams)