nemo_automodel.components.models.nemotron_v3.model#
Module Contents#
Classes#
NemotronV3 base model (without LM head). |
|
NemotronV3 model with language modeling head. |
Data#
API#
- class nemo_automodel.components.models.nemotron_v3.model.NemotronV3Model(
- config,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- *,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
Bases:
torch.nn.ModuleNemotronV3 base model (without LM head).
This is a hybrid architecture with Mamba2, Attention, MLP, and MoE layers.
Initialization
Initialize NemotronV3Model.
- Parameters:
config – NemotronH config with model parameters
backend – Backend configuration for MoE and other components
moe_config – MoE configuration (optional, will create default if None)
- forward(
- input_ids: torch.LongTensor | None = None,
- *,
- attention_mask: torch.Tensor | None = None,
- causal_mask_mapping: dict[str, torch.Tensor] | None = None,
- inputs_embeds: torch.Tensor | None = None,
- **kwargs: Any,
Forward pass through the model.
- Parameters:
input_ids – Input token IDs [batch_size, seq_len] (optional)
attention_mask – 2D padding mask [batch_size, seq_len] (1=real, 0=padding)
causal_mask_mapping – Dict with precomputed 4D causal masks for attention layers
inputs_embeds – Input embeddings [batch_size, seq_len, hidden_size] (optional)
**kwargs – Additional arguments (ignored)
- Returns:
Hidden states tensor [batch_size, seq_len, hidden_size]
- initialize_weights(buffer_device: torch.device | None = None) None#
Initialize model weights according to NemotronV3 spec.
- Parameters:
buffer_device – Device to use for buffer initialization
- class nemo_automodel.components.models.nemotron_v3.model.NemotronHForCausalLM(
- config,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
Bases:
nemo_automodel.components.models.common.HFCheckpointingMixin,torch.nn.Module,nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixinNemotronV3 model with language modeling head.
Initialization
Initialize NemotronV3ForCausalLM.
- Parameters:
config – NemotronH config
backend – Backend configuration
**kwargs – Additional arguments
- classmethod from_config(
- config,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
Create model from config.
- Parameters:
config – NemotronH config
backend – Backend configuration
**kwargs – Additional arguments
- Returns:
NemotronHForCausalLM instance
- classmethod from_pretrained(
- pretrained_model_name_or_path: str,
- *model_args,
- **kwargs,
Load pretrained model.
- Parameters:
pretrained_model_name_or_path – Path or name of pretrained model
*model_args – Additional positional arguments
**kwargs – Additional keyword arguments
- Returns:
NemotronHForCausalLM instance
- forward(
- input_ids: torch.LongTensor | None = None,
- *,
- attention_mask: torch.Tensor | None = None,
- causal_mask_mapping: dict[str, torch.Tensor] | None = None,
- **kwargs: Any,
Forward pass with optional loss computation.
- Parameters:
input_ids – Input token IDs [batch_size, seq_len] (optional)
attention_mask – 2D padding mask [batch_size, seq_len]
causal_mask_mapping – Dict with precomputed 4D causal masks
**kwargs – Additional arguments
- Returns:
logits tensor [batch_size, seq_len, vocab_size]
- initialize_weights(
- buffer_device: torch.device | None = None,
- dtype: torch.dtype = torch.bfloat16,
Initialize model weights.
- Parameters:
buffer_device – Device to use for buffer initialization
dtype – Target dtype for model weights
- nemo_automodel.components.models.nemotron_v3.model.ModelClass#
None