core.transformer.enums#

Module Contents#

Classes#

ModelType

Model Type

LayerType

Layer type embedding: embedding layer loss: loss layer encoder: encoder layer, not implemented yet, expect to be used in MLLM models decoder: decoder layer mtp: multi-token prediction layer, not implemented yet

AttnType

Attention type

AttnMaskType

Attention Mask Type

AttnBackend

Attention Backend

CudaGraphModule

Named capture regions for per-layer CUDA graphs.

CudaGraphScope

Deprecated predecessor of CudaGraphModule.

InferenceCudaGraphScope

Inference CUDA graph scope.

API#

class core.transformer.enums.ModelType(*args, **kwds)#

Bases: enum.Enum

Model Type

encoder_or_decoder for bert, gpt etc

Initialization

encoder_or_decoder#

1

class core.transformer.enums.LayerType(*args, **kwds)#

Bases: enum.Enum

Layer type embedding: embedding layer loss: loss layer encoder: encoder layer, not implemented yet, expect to be used in MLLM models decoder: decoder layer mtp: multi-token prediction layer, not implemented yet

Initialization

embedding#

1

loss#

2

encoder#

3

decoder#

4

mtp#

5

class core.transformer.enums.AttnType(*args, **kwds)#

Bases: enum.Enum

Attention type

Initialization

self_attn#

1

cross_attn#

2

class core.transformer.enums.AttnMaskType(*args, **kwds)#

Bases: enum.Enum

Attention Mask Type

Initialization

padding#

1

causal#

2

no_mask#

3

padding_causal#

4

arbitrary#

5

causal_bottom_right#

6

class core.transformer.enums.AttnBackend(*args, **kwds)#

Bases: enum.Enum

Attention Backend

Initialization

flash#

1

fused#

2

unfused#

3

local#

4

auto#

5

class core.transformer.enums.CudaGraphModule(*args, **kwds)#

Bases: enum.Enum

Named capture regions for per-layer CUDA graphs.

Whole-layer capture is represented outside this enum by an empty scope. Current per-layer implementations that consume these values are local and transformer_engine.

Initialization

attn#

1

mlp#

2

moe#

3

moe_router#

4

moe_preprocess#

5

mamba#

6

class core.transformer.enums.CudaGraphScope(*args, **kwds)#

Bases: enum.Enum

Deprecated predecessor of CudaGraphModule.

Preserved as a standalone class (not an alias) so that pre-refactor checkpoints that stored CudaGraphScope enum instances can be deserialized correctly. The original ordinals differ from CudaGraphModule (full_iteration=1, attn=2, …), so a simple alias would silently reconstruct enum members with the wrong identity.

Do NOT use in new code. Migration guide:

  • full_iteration → cuda_graph_impl=”full_iteration”

  • full_iteration_inference → inference_cuda_graph_scope=InferenceCudaGraphScope.block

  • all other members → equivalent CudaGraphModule member

Initialization

full_iteration#

1

attn#

2

mlp#

3

moe#

4

moe_router#

5

moe_preprocess#

6

mamba#

7

full_iteration_inference#

8

class core.transformer.enums.InferenceCudaGraphScope(*args, **kwds)#

Bases: enum.Enum

Inference CUDA graph scope.

This controls the ownership boundary for inference CUDA graphs:

  • none: inference runs in eager mode (no CUDA graphs).

  • layer: graphs are owned at the module/layer boundary, e.g. TransformerLayer or MambaLayer.

  • block: graphs are owned by the enclosing block, e.g. TransformerBlock or HybridBlock.

Initialization

none#

1

layer#

2

block#

3