nemo_automodel.components.models.step3p5.model#

Module Contents#

Classes#

Block

Step3p5 transformer block with attention, MLP/MoE, and shared experts.

Step3p5Model

Step3p5 transformer model.

Step3p5ForCausalLM

Step3p5 model for causal language modeling.

Functions#

parse_moe_layers_enum

Parse moe_layers_enum to get set of MoE layer indices.

Data#

API#

nemo_automodel.components.models.step3p5.model.parse_moe_layers_enum(
moe_layers_enum: str | tuple | list | None,
num_hidden_layers: int,
) set[int]#

Parse moe_layers_enum to get set of MoE layer indices.

Parameters:
  • moe_layers_enum – Tuple/list of layer indices, comma-separated string, or None. HF Step-3.5-Flash uses tuple format like (3, 4, 5, …, 44).

  • num_hidden_layers – Total number of hidden layers.

Returns:

Set of layer indices that should be MoE layers.

class nemo_automodel.components.models.step3p5.model.Block(
layer_idx: int,
config: Any,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
)#

Bases: torch.nn.Module

Step3p5 transformer block with attention, MLP/MoE, and shared experts.

Initialization

forward(
x: torch.Tensor,
*,
freqs_cis: torch.Tensor,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
position_ids: torch.Tensor | None = None,
**attn_kwargs: Any,
) torch.Tensor#
init_weights(buffer_device: torch.device) None#
class nemo_automodel.components.models.step3p5.model.Step3p5Model(
config: Any,
backend: nemo_automodel.components.models.common.BackendConfig,
*,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
)#

Bases: torch.nn.Module

Step3p5 transformer model.

Initialization

forward(
input_ids: torch.Tensor,
*,
position_ids: torch.Tensor | None = None,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
**attn_kwargs: Any,
) torch.Tensor#
init_weights(buffer_device: torch.device | None = None) None#
class nemo_automodel.components.models.step3p5.model.Step3p5ForCausalLM(
config: Any,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
**kwargs,
)#

Bases: nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin, torch.nn.Module, nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixin

Step3p5 model for causal language modeling.

Initialization

classmethod from_config(
config: Any,
moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
backend: nemo_automodel.components.models.common.BackendConfig | None = None,
**kwargs,
)#
classmethod from_pretrained(
pretrained_model_name_or_path: str,
*model_args,
**kwargs,
)#
forward(
input_ids: torch.Tensor,
*,
position_ids: torch.Tensor | None = None,
attention_mask: torch.Tensor | None = None,
padding_mask: torch.Tensor | None = None,
**attn_kwargs: Any,
) torch.Tensor#
initialize_weights(
buffer_device: torch.device | None = None,
dtype: torch.dtype = torch.bfloat16,
) None#
nemo_automodel.components.models.step3p5.model.ModelClass#

None