core.transformer.enums#
Module Contents#
Classes#
Model Type |
|
Layer type embedding: embedding layer loss: loss layer encoder: encoder layer, not implemented yet, expect to be used in MLLM models decoder: decoder layer mtp: multi-token prediction layer, not implemented yet |
|
Attention type |
|
Attention Mask Type |
|
Attention Backend |
|
Named capture regions for per-layer CUDA graphs. |
|
Deprecated predecessor of CudaGraphModule. |
|
Inference CUDA graph scope. |
API#
- class core.transformer.enums.ModelType(*args, **kwds)#
Bases:
enum.EnumModel Type
encoder_or_decoder for bert, gpt etc
Initialization
- encoder_or_decoder#
1
- class core.transformer.enums.LayerType(*args, **kwds)#
Bases:
enum.EnumLayer type embedding: embedding layer loss: loss layer encoder: encoder layer, not implemented yet, expect to be used in MLLM models decoder: decoder layer mtp: multi-token prediction layer, not implemented yet
Initialization
- embedding#
1
- loss#
2
- encoder#
3
- decoder#
4
- mtp#
5
- class core.transformer.enums.AttnType(*args, **kwds)#
Bases:
enum.EnumAttention type
Initialization
- self_attn#
1
- cross_attn#
2
- class core.transformer.enums.AttnMaskType(*args, **kwds)#
Bases:
enum.EnumAttention Mask Type
Initialization
- padding#
1
- causal#
2
- no_mask#
3
- padding_causal#
4
- arbitrary#
5
- causal_bottom_right#
6
- class core.transformer.enums.AttnBackend(*args, **kwds)#
Bases:
enum.EnumAttention Backend
Initialization
- flash#
1
- fused#
2
- unfused#
3
- local#
4
- auto#
5
- class core.transformer.enums.CudaGraphModule(*args, **kwds)#
Bases:
enum.EnumNamed capture regions for per-layer CUDA graphs.
Whole-layer capture is represented outside this enum by an empty scope. Current per-layer implementations that consume these values are
localandtransformer_engine.Initialization
- attn#
1
- mlp#
2
- moe#
3
- moe_router#
4
- moe_preprocess#
5
- mamba#
6
- class core.transformer.enums.CudaGraphScope(*args, **kwds)#
Bases:
enum.EnumDeprecated predecessor of CudaGraphModule.
Preserved as a standalone class (not an alias) so that pre-refactor checkpoints that stored CudaGraphScope enum instances can be deserialized correctly. The original ordinals differ from CudaGraphModule (full_iteration=1, attn=2, …), so a simple alias would silently reconstruct enum members with the wrong identity.
Do NOT use in new code. Migration guide:
full_iteration → cuda_graph_impl=”full_iteration”
full_iteration_inference → inference_cuda_graph_scope=InferenceCudaGraphScope.block
all other members → equivalent CudaGraphModule member
Initialization
- full_iteration#
1
- attn#
2
- mlp#
3
- moe#
4
- moe_router#
5
- moe_preprocess#
6
- mamba#
7
- full_iteration_inference#
8
- class core.transformer.enums.InferenceCudaGraphScope(*args, **kwds)#
Bases:
enum.EnumInference CUDA graph scope.
This controls the ownership boundary for inference CUDA graphs:
none: inference runs in eager mode (no CUDA graphs).
layer: graphs are owned at the module/layer boundary, e.g. TransformerLayer or MambaLayer.
block: graphs are owned by the enclosing block, e.g. TransformerBlock or HybridBlock.
Initialization
- none#
1
- layer#
2
- block#
3