`core.transformer.enums`#

Module Contents#

Classes#

`ModelType`	Model Type
`LayerType`	Layer type embedding: embedding layer loss: loss layer encoder: encoder layer, not implemented yet, expect to be used in MLLM models decoder: decoder layer mtp: multi-token prediction layer, not implemented yet
`AttnType`	Attention type
`AttnMaskType`	Attention Mask Type
`AttnBackend`	Attention Backend
`CudaGraphModule`	Named capture regions for per-layer CUDA graphs.
`CudaGraphScope`	Deprecated predecessor of CudaGraphModule.
`InferenceCudaGraphScope`	Inference CUDA graph scope.

API#

class core.transformer.enums.ModelType(*args, **kwds)#

Bases: enum.Enum

Model Type

encoder_or_decoder for bert, gpt etc

Initialization

encoder_or_decoder#: 1

class core.transformer.enums.LayerType(*args, **kwds)#

Bases: enum.Enum

Layer type embedding: embedding layer loss: loss layer encoder: encoder layer, not implemented yet, expect to be used in MLLM models decoder: decoder layer mtp: multi-token prediction layer, not implemented yet

Initialization

embedding#: 1

loss#: 2

encoder#: 3

decoder#: 4

mtp#: 5

class core.transformer.enums.AttnType(*args, **kwds)#

Bases: enum.Enum

Attention type

Initialization

self_attn#: 1

cross_attn#: 2

class core.transformer.enums.AttnMaskType(*args, **kwds)#

Bases: enum.Enum

Attention Mask Type

Initialization

padding#: 1

causal#: 2

no_mask#: 3

padding_causal#: 4

arbitrary#: 5

causal_bottom_right#: 6

class core.transformer.enums.AttnBackend(*args, **kwds)#

Bases: enum.Enum

Attention Backend

Initialization

flash#: 1

fused#: 2

unfused#: 3

local#: 4

auto#: 5

class core.transformer.enums.CudaGraphModule(*args, **kwds)#

Bases: enum.Enum

Named capture regions for per-layer CUDA graphs.

Whole-layer capture is represented outside this enum by an empty scope. Current per-layer implementations that consume these values are local and transformer_engine.

Initialization

attn#: 1

mlp#: 2

moe#: 3

moe_router#: 4

moe_preprocess#: 5

mamba#: 6

class core.transformer.enums.CudaGraphScope(*args, **kwds)#

Bases: enum.Enum

Deprecated predecessor of CudaGraphModule.

Preserved as a standalone class (not an alias) so that pre-refactor checkpoints that stored CudaGraphScope enum instances can be deserialized correctly. The original ordinals differ from CudaGraphModule (full_iteration=1, attn=2, …), so a simple alias would silently reconstruct enum members with the wrong identity.

Do NOT use in new code. Migration guide:

full_iteration → cuda_graph_impl=”full_iteration”
full_iteration_inference → inference_cuda_graph_scope=InferenceCudaGraphScope.block
all other members → equivalent CudaGraphModule member

Initialization

full_iteration#: 1

attn#: 2

mlp#: 3

moe#: 4

moe_router#: 5

moe_preprocess#: 6

mamba#: 7

full_iteration_inference#: 8

class core.transformer.enums.InferenceCudaGraphScope(*args, **kwds)#

Bases: enum.Enum

Inference CUDA graph scope.

This controls the ownership boundary for inference CUDA graphs:

none: inference runs in eager mode (no CUDA graphs).
layer: graphs are owned at the module/layer boundary, e.g. TransformerLayer or MambaLayer.
block: graphs are owned by the enclosing block, e.g. TransformerBlock or HybridBlock.

Initialization

none#: 1

layer#: 2

block#: 3

core.transformer.enums#

Module Contents#

Classes#

API#

`core.transformer.enums`#