`nemo_automodel.components.models.step3p5.model`#

Module Contents#

Classes#

`Block`	Step3p5 transformer block with attention, MLP/MoE, and shared experts.
`Step3p5Model`	Step3p5 transformer model.
`Step3p5ForCausalLM`	Step3p5 model for causal language modeling.

Functions#

parse_moe_layers_enum

Parse moe_layers_enum to get set of MoE layer indices.

Data#

ModelClass

API#

nemo_automodel.components.models.step3p5.model.parse_moe_layers_enum( moe_layers_enum: str | tuple | list | None, num_hidden_layers: int, ) → set[int]#

Parse moe_layers_enum to get set of MoE layer indices.

Parameters:

moe_layers_enum – Tuple/list of layer indices, comma-separated string, or None. HF Step-3.5-Flash uses tuple format like (3, 4, 5, …, 44).
num_hidden_layers – Total number of hidden layers.

Returns:

Set of layer indices that should be MoE layers.

class nemo_automodel.components.models.step3p5.model.Block( layer_idx: int, config: Any, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, )#

Bases: torch.nn.Module

Step3p5 transformer block with attention, MLP/MoE, and shared experts.

Initialization

forward(

x: torch.Tensor,

*,

freqs_cis: torch.Tensor,

attention_mask: torch.Tensor | None = None,

padding_mask: torch.Tensor | None = None,

position_ids: torch.Tensor | None = None,

**attn_kwargs: Any,

) → torch.Tensor#

init_weights(buffer_device: torch.device) → None#

class nemo_automodel.components.models.step3p5.model.Step3p5Model( config: Any, backend: nemo_automodel.components.models.common.BackendConfig, *, moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None, )#

Bases: torch.nn.Module

Step3p5 transformer model.

Initialization

forward(

input_ids: torch.Tensor,

*,

position_ids: torch.Tensor | None = None,

attention_mask: torch.Tensor | None = None,

padding_mask: torch.Tensor | None = None,

**attn_kwargs: Any,

) → torch.Tensor#

init_weights(buffer_device: torch.device | None = None) → None#

class nemo_automodel.components.models.step3p5.model.Step3p5ForCausalLM(

config: Any,

moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,

backend: nemo_automodel.components.models.common.BackendConfig | None = None,

**kwargs,

)#

Bases: nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin, torch.nn.Module, nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixin

Step3p5 model for causal language modeling.

Initialization

classmethod from_config(

config: Any,

moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,

backend: nemo_automodel.components.models.common.BackendConfig | None = None,

**kwargs,

)#

classmethod from_pretrained(

pretrained_model_name_or_path: str,

*model_args,

**kwargs,

)#

forward(

input_ids: torch.Tensor,

*,

position_ids: torch.Tensor | None = None,

attention_mask: torch.Tensor | None = None,

padding_mask: torch.Tensor | None = None,

**attn_kwargs: Any,

) → torch.Tensor#

initialize_weights( buffer_device: torch.device | None = None, dtype: torch.dtype = torch.bfloat16, ) → None#

nemo_automodel.components.models.step3p5.model.ModelClass#: None

nemo_automodel.components.models.step3p5.model#

Module Contents#

Classes#

Functions#

Data#

API#

`nemo_automodel.components.models.step3p5.model`#