> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.qwen2.model

Custom Qwen2 model implementation for NeMo Automodel.

This module provides a self-contained Qwen2 implementation with separate
HuggingFace-style q/k/v and gate/up projections.

Example (YAML):

```python
model:
  _target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
  pretrained_model_name_or_path: Qwen/Qwen2.5-7B
```

## Module Contents

### Classes

| Name                                                                                         | Description                                                                           |
| -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| [`Qwen2Attention`](#nemo_automodel-components-models-qwen2-model-Qwen2Attention)             | Multi-headed attention with separate QKV projections — HuggingFace default layout.    |
| [`Qwen2DecoderLayer`](#nemo_automodel-components-models-qwen2-model-Qwen2DecoderLayer)       | Single Qwen2 decoder layer with RMSNorm, attention, and MLP.                          |
| [`Qwen2ForCausalLM`](#nemo_automodel-components-models-qwen2-model-Qwen2ForCausalLM)         | Qwen2 model with causal language modeling head.                                       |
| [`Qwen2Model`](#nemo_automodel-components-models-qwen2-model-Qwen2Model)                     | Qwen2 transformer model (embeddings + decoder layers + norm).                         |
| [`Qwen2PreTrainedModel`](#nemo_automodel-components-models-qwen2-model-Qwen2PreTrainedModel) | Abstract class for Qwen2 pretrained models.                                           |
| [`Qwen2SeparateMLP`](#nemo_automodel-components-models-qwen2-model-Qwen2SeparateMLP)         | SwiGLU MLP with separate gate\_proj and up\_proj -- identical to HuggingFace default. |

### Data

[`ModelClass`](#nemo_automodel-components-models-qwen2-model-ModelClass)

[`__all__`](#nemo_automodel-components-models-qwen2-model-__all__)

[`check_model_inputs`](#nemo_automodel-components-models-qwen2-model-check_model_inputs)

### API

```python
class nemo_automodel.components.models.qwen2.model.Qwen2Attention(
    config: transformers.Qwen2Config,
    layer_idx: int,
    backend: typing.Optional['BackendConfig'] = None
)
```

**Bases:** `Module`

Multi-headed attention with separate QKV projections — HuggingFace default layout.

```python
nemo_automodel.components.models.qwen2.model.Qwen2Attention.forward(
    hidden_states: torch.Tensor,
    position_embeddings: tuple[torch.Tensor, torch.Tensor],
    attention_mask: typing.Optional[torch.Tensor],
    past_key_values: typing.Optional[transformers.cache_utils.Cache] = None,
    cache_position: typing.Optional[torch.LongTensor] = None,
    kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs] = {}
) -> tuple[torch.Tensor, torch.Tensor]
```

```python
class nemo_automodel.components.models.qwen2.model.Qwen2DecoderLayer(
    config: transformers.Qwen2Config,
    layer_idx: int,
    backend: nemo_automodel.components.models.common.BackendConfig
)
```

**Bases:** `GradientCheckpointingLayer`

Single Qwen2 decoder layer with RMSNorm, attention, and MLP.

```python
nemo_automodel.components.models.qwen2.model.Qwen2DecoderLayer.forward(
    hidden_states: torch.Tensor,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_values: typing.Optional[transformers.cache_utils.Cache] = None,
    use_cache: typing.Optional[bool] = False,
    cache_position: typing.Optional[torch.LongTensor] = None,
    position_embeddings: typing.Optional[tuple[torch.Tensor, torch.Tensor]] = None,
    kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs] = {}
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM(
    config: transformers.Qwen2Config,
    backend: typing.Optional[nemo_automodel.components.models.common.BackendConfig] = None
)
```

**Bases:** [HFCheckpointingMixin](/nemo-automodel/nemo_automodel/components/models/common/hf_checkpointing_mixin#nemo_automodel-components-models-common-hf_checkpointing_mixin-HFCheckpointingMixin), [Qwen2PreTrainedModel](#nemo_automodel-components-models-qwen2-model-Qwen2PreTrainedModel)

Qwen2 model with causal language modeling head.

Uses separate q/k/v and gate/up projections -- HuggingFace layout.

```python
nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM.forward(
    input_ids: typing.Optional[torch.LongTensor] = None,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_values: typing.Optional[transformers.cache_utils.Cache] = None,
    inputs_embeds: typing.Optional[torch.FloatTensor] = None,
    labels: typing.Optional[torch.LongTensor] = None,
    use_cache: typing.Optional[bool] = None,
    output_attentions: typing.Optional[bool] = None,
    output_hidden_states: typing.Optional[bool] = None,
    return_dict: typing.Optional[bool] = None,
    cache_position: typing.Optional[torch.LongTensor] = None,
    logits_to_keep: typing.Union[int, torch.Tensor] = 0,
    kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs] = {}
) -> transformers.modeling_outputs.CausalLMOutputWithPast
```

Forward pass returning CausalLMOutputWithPast.

```python
nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM.get_input_embeddings()
```

```python
nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM.get_output_embeddings()
```

```python
nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM.set_input_embeddings(
    value
)
```

```python
nemo_automodel.components.models.qwen2.model.Qwen2ForCausalLM.set_output_embeddings(
    new_embeddings
)
```

```python
class nemo_automodel.components.models.qwen2.model.Qwen2Model(
    config: transformers.Qwen2Config,
    backend: nemo_automodel.components.models.common.BackendConfig
)
```

**Bases:** [Qwen2PreTrainedModel](#nemo_automodel-components-models-qwen2-model-Qwen2PreTrainedModel)

Qwen2 transformer model (embeddings + decoder layers + norm).

```python
nemo_automodel.components.models.qwen2.model.Qwen2Model.forward(
    input_ids: typing.Optional[torch.LongTensor] = None,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_values: typing.Optional[transformers.cache_utils.Cache] = None,
    inputs_embeds: typing.Optional[torch.FloatTensor] = None,
    use_cache: typing.Optional[bool] = None,
    output_attentions: typing.Optional[bool] = None,
    output_hidden_states: typing.Optional[bool] = None,
    return_dict: typing.Optional[bool] = None,
    cache_position: typing.Optional[torch.LongTensor] = None,
    kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs] = {}
) -> transformers.modeling_outputs.BaseModelOutputWithPast
```

```python
class nemo_automodel.components.models.qwen2.model.Qwen2PreTrainedModel()
```

**Bases:** `PreTrainedModel`

Abstract class for Qwen2 pretrained models.

```python
class nemo_automodel.components.models.qwen2.model.Qwen2SeparateMLP(
    config: transformers.Qwen2Config
)
```

**Bases:** `Module`

SwiGLU MLP with separate gate\_proj and up\_proj -- identical to HuggingFace default.

```python
nemo_automodel.components.models.qwen2.model.Qwen2SeparateMLP.forward(
    x: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.models.qwen2.model.ModelClass = Qwen2ForCausalLM
```

```python
nemo_automodel.components.models.qwen2.model.__all__ = ['Qwen2ForCausalLM']
```

```python
nemo_automodel.components.models.qwen2.model.check_model_inputs = get_check_model_inputs_decorator()
```