nemo_automodel.components.models.qwen2.model#
Custom Qwen2 model implementation for NeMo Automodel.
This module provides a self-contained Qwen2 implementation with combined QKV/gate_up projections. Uses shared components from common/ for fused projections.
Example (YAML):
model:
_target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
pretrained_model_name_or_path: Qwen/Qwen2.5-7B
use_fused_qkv: true
use_fused_gate_up: true
Module Contents#
Classes#
Multi-headed attention with combined QKV projection. |
|
Single Qwen2 decoder layer with RMSNorm, attention, and combined MLP. |
|
Abstract class for Qwen2 pretrained models. |
|
Qwen2 transformer model (embeddings + decoder layers + norm). |
|
Qwen2 model with causal language modeling head. |
Data#
API#
- nemo_automodel.components.models.qwen2.model.__all__#
[‘Qwen2ForCausalLM’]
- nemo_automodel.components.models.qwen2.model.check_model_inputs#
‘get_check_model_inputs_decorator(…)’
- class nemo_automodel.components.models.qwen2.model.Qwen2Attention(config: transformers.Qwen2Config, layer_idx: int)#
Bases:
nemo_automodel.components.models.common.CombinedQKVAttentionMixin,torch.nn.ModuleMulti-headed attention with combined QKV projection.
Uses CombinedQKVAttentionMixin for efficient combined QKV projection. ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- forward(
- hidden_states: torch.Tensor,
- position_embeddings: tuple[torch.Tensor, torch.Tensor],
- attention_mask: Optional[torch.Tensor],
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- cache_position: Optional[torch.LongTensor] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- class nemo_automodel.components.models.qwen2.model.Qwen2DecoderLayer(
- config: transformers.Qwen2Config,
- layer_idx: int,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
transformers.modeling_layers.GradientCheckpointingLayerSingle Qwen2 decoder layer with RMSNorm, attention, and combined MLP.
ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- forward(
- hidden_states: torch.Tensor,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- use_cache: Optional[bool] = False,
- cache_position: Optional[torch.LongTensor] = None,
- position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- class nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModel#
Bases:
transformers.modeling_utils.PreTrainedModelAbstract class for Qwen2 pretrained models.
- config_class#
None
- base_model_prefix#
‘model’
- supports_gradient_checkpointing#
True
- _no_split_modules#
[‘Qwen2DecoderLayer’]
- _skip_keys_device_placement#
[‘past_key_values’]
- _supports_flash_attn#
True
- _supports_sdpa#
True
- _supports_flex_attn#
True
- _can_compile_fullgraph#
True
- _supports_attention_backend#
True
- _can_record_outputs#
None
- class nemo_automodel.components.models.qwen2.model.Qwen2Model(
- config: transformers.Qwen2Config,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModelQwen2 transformer model (embeddings + decoder layers + norm).
ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
- cache_position: Optional[torch.LongTensor] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- class nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM(
- config: transformers.Qwen2Config,
- backend: Optional[nemo_automodel.components.models.common.BackendConfig] = None,
Bases:
nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin,nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModelQwen2 model with causal language modeling head.
ALWAYS uses combined projections - this is the whole point of the custom implementation.
Initialization
- _tied_weights_keys#
None
- _tp_plan#
None
- _pp_plan#
None
- get_input_embeddings()#
- set_input_embeddings(value)#
- get_output_embeddings()#
- set_output_embeddings(new_embeddings)#
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
- cache_position: Optional[torch.LongTensor] = None,
- logits_to_keep: Union[int, torch.Tensor] = 0,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
Forward pass returning CausalLMOutputWithPast.
- nemo_automodel.components.models.qwen2.model.ModelClass#
None