nemo_automodel.components.models.qwen2.model#
Custom Qwen2 model implementation for NeMo Automodel.
This module provides a self-contained Qwen2 implementation with combined QKV/gate_up projections. Uses shared components from common/ for fused projections.
Example (YAML):
model:
_target_: nemo_automodel.components.models.qwen2.build_qwen2_model
pretrained_model_name_or_path: Qwen/Qwen2.5-7B
use_fused_qkv: true
use_fused_gate_up: true
Module Contents#
Classes#
Multi-headed attention with combined QKV projection. |
|
Single Qwen2 decoder layer with RMSNorm, attention, and combined MLP. |
|
Abstract class for Qwen2 pretrained models. |
|
Qwen2 transformer model (embeddings + decoder layers + norm). |
|
Qwen2 model with causal language modeling head. |
Functions#
Build a custom Qwen2 model with combined projections. |
Data#
API#
- nemo_automodel.components.models.qwen2.model.__all__#
[âbuild_qwen2_modelâ, âQwen2ForCausalLMâ]
- class nemo_automodel.components.models.qwen2.model.Qwen2Attention(config: transformers.Qwen2Config, layer_idx: int)#
Bases:
nemo_automodel.components.models.common.combined_projection.CombinedQKVAttentionMixin,torch.nn.ModuleMulti-headed attention with combined QKV projection.
Uses CombinedQKVAttentionMixin for efficient combined QKV projection. ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- forward(
- hidden_states: torch.Tensor,
- position_embeddings: tuple[torch.Tensor, torch.Tensor],
- attention_mask: Optional[torch.Tensor],
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- cache_position: Optional[torch.LongTensor] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- class nemo_automodel.components.models.qwen2.model.Qwen2DecoderLayer(config: transformers.Qwen2Config, layer_idx: int)#
Bases:
transformers.modeling_layers.GradientCheckpointingLayerSingle Qwen2 decoder layer with RMSNorm, attention, and combined MLP.
ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- forward(
- hidden_states: torch.Tensor,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- use_cache: Optional[bool] = False,
- cache_position: Optional[torch.LongTensor] = None,
- position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- class nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModel#
Bases:
transformers.modeling_utils.PreTrainedModelAbstract class for Qwen2 pretrained models.
- config_class#
None
- base_model_prefix#
âmodelâ
- supports_gradient_checkpointing#
True
- _no_split_modules#
[âQwen2DecoderLayerâ]
- _skip_keys_device_placement#
[âpast_key_valuesâ]
- _supports_flash_attn#
True
- _supports_sdpa#
True
- _supports_flex_attn#
True
- _can_compile_fullgraph#
True
- _supports_attention_backend#
True
- _can_record_outputs#
None
- class nemo_automodel.components.models.qwen2.model.Qwen2Model(config: transformers.Qwen2Config)#
Bases:
nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModelQwen2 transformer model (embeddings + decoder layers + norm).
ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- use_cache: Optional[bool] = None,
- cache_position: Optional[torch.LongTensor] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- class nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM(
- config: transformers.Qwen2Config,
- backend: Optional[nemo_automodel.components.moe.utils.BackendConfig] = None,
Bases:
nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModelQwen2 model with causal language modeling head.
ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- _tied_weights_keys#
[âlm_head.weightâ]
- _tp_plan#
None
- _pp_plan#
None
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- cache_position: Optional[torch.LongTensor] = None,
- logits_to_keep: Union[int, torch.Tensor] = 0,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
Forward pass returning CausalLMOutputWithPast.
- save_pretrained_hf_format(save_directory: str, **kwargs)#
Save model in HuggingFace-compatible format by converting combined projections.
This method converts the custom modelâs combined projections (qkv_proj, gate_up_proj) back to HuggingFaceâs separate projections format before saving, making the checkpoint loadable with AutoModelForCausalLM.from_pretrained().
- Parameters:
save_directory â Directory where the model will be saved
**kwargs â Additional arguments passed to config.save_pretrained and save_file
- nemo_automodel.components.models.qwen2.model.build_qwen2_model(
- pretrained_model_name_or_path: str,
- **kwargs: Any,
Build a custom Qwen2 model with combined projections.
This custom implementation ALWAYS uses combined QKV and gate_up projections for better efficiency. The state dict adapter handles conversion from HuggingFace checkpoints (which have separate projections) to the combined format.
- Parameters:
pretrained_model_name_or_path â HuggingFace model card name (e.g., âQwen/Qwen2.5-7Bâ)
**kwargs â
Override config parameters. Common parameters include:
torch_dtype: Model dtype (âbf16â, âfp32â, etc.)
attn_implementation: Attention backend (âeagerâ, âsdpaâ, âflash_attention_2â)
num_hidden_layers: Number of layers (useful for testing)
- Returns:
Qwen2ForCausalLM model instance with combined projections
.. rubric:: Example
Load custom Qwen2 with combined projections (ALWAYS enabled)#
model = build_qwen2_model(âQwen/Qwen2.5-7Bâ, torch_dtype=âbf16â)