core.packed_seq_params#
Module Contents#
Classes#
parameters to TEDotProductAttention and fused rope kernels for the
|
API#
- class core.packed_seq_params.PackedSeqParams#
parameters to TEDotProductAttention and fused rope kernels for the
thd(packed) sequence format- qkv_format: str#
None
- cu_seqlens_q: torch.Tensor#
None
- cu_seqlens_kv: torch.Tensor#
None
- cu_seqlens_q_padded: torch.Tensor#
None
- cu_seqlens_kv_padded: torch.Tensor#
None
- max_seqlen_q: int#
None
- max_seqlen_kv: int#
None
- local_cp_size: int#
None
- cp_group: torch.distributed.ProcessGroup#
None
- total_tokens: int#
None
- seq_idx: torch.Tensor#
None
- __post_init__()#
Pre-compute seq_idx for Mamba mixer CUDA graph compatibility.
If total_tokens is 16 (for example), this method takes packed_seq_params.cu_seqlens_q_padded (or cu_seqlens_q) which is of the form [0, 5, 7, 11] and returns a tensor of the form [0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], which is [0](5-0) + [1](7-5) + [2](11-7) + [3](16-11) In the above example, there are three sequences in the pack. In general, the output has an additional sequence index (e.g. 0, 1, 2, 3) so that any tokens beyond the last padded input sequence are accounted for as an extra sequence. However, If cu_seqlens_q_padded[-1] == max_seqlen then this additional sequence index will not be included.