nemo_automodel.components.models.step3p5.model#
Module Contents#
Classes#
Step3p5 transformer block with attention, MLP/MoE, and shared experts. |
|
Step3p5 transformer model. |
|
Step3p5 model for causal language modeling. |
Functions#
Parse moe_layers_enum to get set of MoE layer indices. |
Data#
API#
- nemo_automodel.components.models.step3p5.model.parse_moe_layers_enum(
- moe_layers_enum: str | tuple | list | None,
- num_hidden_layers: int,
Parse moe_layers_enum to get set of MoE layer indices.
- Parameters:
moe_layers_enum – Tuple/list of layer indices, comma-separated string, or None. HF Step-3.5-Flash uses tuple format like (3, 4, 5, …, 44).
num_hidden_layers – Total number of hidden layers.
- Returns:
Set of layer indices that should be MoE layers.
- class nemo_automodel.components.models.step3p5.model.Block(
- layer_idx: int,
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
torch.nn.ModuleStep3p5 transformer block with attention, MLP/MoE, and shared experts.
Initialization
- forward(
- x: torch.Tensor,
- *,
- freqs_cis: torch.Tensor,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- position_ids: torch.Tensor | None = None,
- **attn_kwargs: Any,
- init_weights(buffer_device: torch.device) None#
- class nemo_automodel.components.models.step3p5.model.Step3p5Model(
- config: Any,
- backend: nemo_automodel.components.models.common.BackendConfig,
- *,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
Bases:
torch.nn.ModuleStep3p5 transformer model.
Initialization
- forward(
- input_ids: torch.Tensor,
- *,
- position_ids: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- init_weights(buffer_device: torch.device | None = None) None#
- class nemo_automodel.components.models.step3p5.model.Step3p5ForCausalLM(
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
Bases:
nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin,torch.nn.Module,nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixinStep3p5 model for causal language modeling.
Initialization
- classmethod from_config(
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
- classmethod from_pretrained(
- pretrained_model_name_or_path: str,
- *model_args,
- **kwargs,
- forward(
- input_ids: torch.Tensor,
- *,
- position_ids: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- **attn_kwargs: Any,
- initialize_weights(
- buffer_device: torch.device | None = None,
- dtype: torch.dtype = torch.bfloat16,
- nemo_automodel.components.models.step3p5.model.ModelClass#
None