`nemo_automodel.components.models.qwen2.model`#

Custom Qwen2 model implementation for NeMo Automodel.

This module provides a self-contained Qwen2 implementation with combined QKV/gate_up projections. Uses shared components from common/ for fused projections.

Example (YAML):

model:
  _target_: nemo_automodel.components.models.qwen2.build_qwen2_model
  pretrained_model_name_or_path: Qwen/Qwen2.5-7B
  use_fused_qkv: true
  use_fused_gate_up: true

Module Contents#

Classes#

`Qwen2Attention`	Multi-headed attention with combined QKV projection.
`Qwen2DecoderLayer`	Single Qwen2 decoder layer with RMSNorm, attention, and combined MLP.
`Qwen2PreTrainedModel`	Abstract class for Qwen2 pretrained models.
`Qwen2Model`	Qwen2 transformer model (embeddings + decoder layers + norm).
`Qwen2ForCausalLM`	Qwen2 model with causal language modeling head.

Functions#

build_qwen2_model

Build a custom Qwen2 model with combined projections.

Data#

__all__

API#

nemo_automodel.components.models.qwen2.model.__all__#: [‘build_qwen2_model’, ‘Qwen2ForCausalLM’]

class nemo_automodel.components.models.qwen2.model.Qwen2Attention(config: transformers.Qwen2Config, layer_idx: int)#

Bases: nemo_automodel.components.models.common.combined_projection.CombinedQKVAttentionMixin, torch.nn.Module

Multi-headed attention with combined QKV projection.

Uses CombinedQKVAttentionMixin for efficient combined QKV projection. ALWAYS uses combined projections - this is the whole point of the custom implementation.

Initialization

forward(

hidden_states: torch.Tensor,

position_embeddings: tuple[torch.Tensor, torch.Tensor],

attention_mask: Optional[torch.Tensor],

past_key_values: Optional[transformers.cache_utils.Cache] = None,

cache_position: Optional[torch.LongTensor] = None,

**kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],

) → tuple[torch.Tensor, torch.Tensor]#

class nemo_automodel.components.models.qwen2.model.Qwen2DecoderLayer(config: transformers.Qwen2Config, layer_idx: int)#

Bases: transformers.modeling_layers.GradientCheckpointingLayer

Single Qwen2 decoder layer with RMSNorm, attention, and combined MLP.

ALWAYS uses combined projections - this is the whole point of the custom implementation.

Initialization

forward(

hidden_states: torch.Tensor,

attention_mask: Optional[torch.Tensor] = None,

position_ids: Optional[torch.LongTensor] = None,

past_key_values: Optional[transformers.cache_utils.Cache] = None,

use_cache: Optional[bool] = False,

cache_position: Optional[torch.LongTensor] = None,

position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None,

**kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],

) → torch.Tensor#

class nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModel#

Bases: transformers.modeling_utils.PreTrainedModel

Abstract class for Qwen2 pretrained models.

config_class#: None

base_model_prefix#: ‘model’

supports_gradient_checkpointing#: True

_no_split_modules#: [‘Qwen2DecoderLayer’]

_skip_keys_device_placement#: [‘past_key_values’]

_supports_flash_attn#: True

_supports_sdpa#: True

_supports_flex_attn#: True

_can_compile_fullgraph#: True

_supports_attention_backend#: True

_can_record_outputs#: None

class nemo_automodel.components.models.qwen2.model.Qwen2Model(config: transformers.Qwen2Config)#

Bases: nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModel

Qwen2 transformer model (embeddings + decoder layers + norm).

ALWAYS uses combined projections - this is the whole point of the custom implementation.

Initialization

forward(

input_ids: Optional[torch.LongTensor] = None,

attention_mask: Optional[torch.Tensor] = None,

position_ids: Optional[torch.LongTensor] = None,

past_key_values: Optional[transformers.cache_utils.Cache] = None,

inputs_embeds: Optional[torch.FloatTensor] = None,

use_cache: Optional[bool] = None,

cache_position: Optional[torch.LongTensor] = None,

**kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],

) → transformers.modeling_outputs.BaseModelOutputWithPast#

class nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM( config: transformers.Qwen2Config, backend: Optional[nemo_automodel.components.moe.utils.BackendConfig] = None, )#

Bases: nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModel

Qwen2 model with causal language modeling head.

ALWAYS uses combined projections - this is the whole point of the custom implementation.

Initialization

_tied_weights_keys#: [‘lm_head.weight’]

_tp_plan#: None

_pp_plan#: None

forward(

input_ids: Optional[torch.LongTensor] = None,

attention_mask: Optional[torch.Tensor] = None,

position_ids: Optional[torch.LongTensor] = None,

past_key_values: Optional[transformers.cache_utils.Cache] = None,

inputs_embeds: Optional[torch.FloatTensor] = None,

labels: Optional[torch.LongTensor] = None,

use_cache: Optional[bool] = None,

cache_position: Optional[torch.LongTensor] = None,

logits_to_keep: Union[int, torch.Tensor] = 0,

**kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],

) → transformers.modeling_outputs.CausalLMOutputWithPast#: Forward pass returning CausalLMOutputWithPast.

save_pretrained_hf_format(save_directory: str, **kwargs)#

Save model in HuggingFace-compatible format by converting combined projections.

This method converts the custom model’s combined projections (qkv_proj, gate_up_proj) back to HuggingFace’s separate projections format before saving, making the checkpoint loadable with AutoModelForCausalLM.from_pretrained().

Parameters:

save_directory – Directory where the model will be saved
**kwargs – Additional arguments passed to config.save_pretrained and save_file

nemo_automodel.components.models.qwen2.model.build_qwen2_model(

pretrained_model_name_or_path: str,

**kwargs: Any,

) → torch.nn.Module#

Build a custom Qwen2 model with combined projections.

This custom implementation ALWAYS uses combined QKV and gate_up projections for better efficiency. The state dict adapter handles conversion from HuggingFace checkpoints (which have separate projections) to the combined format.

Parameters:

pretrained_model_name_or_path – HuggingFace model card name (e.g., “Qwen/Qwen2.5-7B”)
**kwargs –
Override config parameters. Common parameters include:
- torch_dtype: Model dtype (“bf16”, “fp32”, etc.)
- attn_implementation: Attention backend (“eager”, “sdpa”, “flash_attention_2”)
- num_hidden_layers: Number of layers (useful for testing)

Returns:

Qwen2ForCausalLM model instance with combined projections

.. rubric:: Example

Load custom Qwen2 with combined projections (ALWAYS enabled)#

model = build_qwen2_model(“Qwen/Qwen2.5-7B”, torch_dtype=”bf16”)

nemo_automodel.components.models.qwen2.model#

Module Contents#

Classes#

Functions#

Data#

API#

Load custom Qwen2 with combined projections (ALWAYS enabled)#

`nemo_automodel.components.models.qwen2.model`#