> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.baichuan.model

Native Baichuan2 model implementation for NeMo Automodel.

Adapted from the Baichuan2 remote-code model on HuggingFace with the
following changes:

* Removed xformers / quantization / chat / streaming dependencies.
* Added `**kwargs` to forward signatures so that extra batch keys
  (`padding_mask`, `loss_mask`, …) pass through without error.
* Uses `HFCheckpointingMixin` for unified checkpointing.
* Uses `torch.nn.functional.scaled_dot_product_attention` only.

Example (YAML)::

model:
*target*: nemo\_automodel.NeMoAutoModelForCausalLM.from\_pretrained
pretrained\_model\_name\_or\_path: baichuan-inc/Baichuan2-7B-Chat

## Module Contents

### Classes

| Name                                                                                                  | Description |
| ----------------------------------------------------------------------------------------------------- | ----------- |
| [`Attention`](#nemo_automodel-components-models-baichuan-model-Attention)                             | -           |
| [`BaichuanForCausalLM`](#nemo_automodel-components-models-baichuan-model-BaichuanForCausalLM)         | -           |
| [`BaichuanModel`](#nemo_automodel-components-models-baichuan-model-BaichuanModel)                     | -           |
| [`BaichuanPreTrainedModel`](#nemo_automodel-components-models-baichuan-model-BaichuanPreTrainedModel) | -           |
| [`DecoderLayer`](#nemo_automodel-components-models-baichuan-model-DecoderLayer)                       | -           |
| [`MLP`](#nemo_automodel-components-models-baichuan-model-MLP)                                         | -           |
| [`NormHead`](#nemo_automodel-components-models-baichuan-model-NormHead)                               | -           |
| [`RMSNorm`](#nemo_automodel-components-models-baichuan-model-RMSNorm)                                 | -           |
| [`RotaryEmbedding`](#nemo_automodel-components-models-baichuan-model-RotaryEmbedding)                 | -           |

### Functions

| Name                                                                                              | Description |
| ------------------------------------------------------------------------------------------------- | ----------- |
| [`_apply_rotary_pos_emb`](#nemo_automodel-components-models-baichuan-model-_apply_rotary_pos_emb) | -           |
| [`_expand_mask`](#nemo_automodel-components-models-baichuan-model-_expand_mask)                   | -           |
| [`_make_causal_mask`](#nemo_automodel-components-models-baichuan-model-_make_causal_mask)         | -           |
| [`_rotate_half`](#nemo_automodel-components-models-baichuan-model-_rotate_half)                   | -           |

### Data

[`ModelClass`](#nemo_automodel-components-models-baichuan-model-ModelClass)

[`logger`](#nemo_automodel-components-models-baichuan-model-logger)

### API

```python
class nemo_automodel.components.models.baichuan.model.Attention(
    config: nemo_automodel.components.models.baichuan.configuration.BaichuanConfig
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.baichuan.model.Attention.forward(
    hidden_states: torch.Tensor,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_value: typing.Optional[typing.Tuple[torch.Tensor]] = None,
    output_attentions: bool = False,
    use_cache: bool = False
) -> typing.Tuple[torch.Tensor, typing.Optional[torch.Tensor], typing.Optional[typing.Tuple[torch.Tensor]]]
```

```python
class nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM(
    config: nemo_automodel.components.models.baichuan.configuration.BaichuanConfig,
    model_kwargs = {}
)
```

**Bases:** [HFCheckpointingMixin](/nemo-automodel/nemo_automodel/components/models/common/hf_checkpointing_mixin#nemo_automodel-components-models-common-hf_checkpointing_mixin-HFCheckpointingMixin), [BaichuanPreTrainedModel](#nemo_automodel-components-models-baichuan-model-BaichuanPreTrainedModel), `GenerationMixin`

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM._reorder_cache(
    past_key_values,
    beam_idx
)
```

staticmethod

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.forward(
    input_ids: torch.LongTensor = None,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None,
    inputs_embeds: typing.Optional[torch.FloatTensor] = None,
    labels: typing.Optional[torch.LongTensor] = None,
    use_cache: typing.Optional[bool] = None,
    output_attentions: typing.Optional[bool] = None,
    output_hidden_states: typing.Optional[bool] = None,
    return_dict: typing.Optional[bool] = None,
    logits_to_keep: typing.Union[int, torch.Tensor] = 0,
    kwargs = {}
) -> typing.Union[typing.Tuple, transformers.modeling_outputs.CausalLMOutputWithPast]
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.get_decoder()
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.get_input_embeddings()
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.get_output_embeddings()
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.prepare_inputs_for_generation(
    input_ids,
    past_key_values = None,
    attention_mask = None,
    inputs_embeds = None,
    kwargs = {}
)
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.set_decoder(
    decoder
)
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.set_input_embeddings(
    value
)
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanForCausalLM.set_output_embeddings(
    new_embeddings
)
```

```python
class nemo_automodel.components.models.baichuan.model.BaichuanModel(
    config: nemo_automodel.components.models.baichuan.configuration.BaichuanConfig
)
```

**Bases:** [BaichuanPreTrainedModel](#nemo_automodel-components-models-baichuan-model-BaichuanPreTrainedModel)

```python
nemo_automodel.components.models.baichuan.model.BaichuanModel._prepare_decoder_attention_mask(
    attention_mask,
    input_shape,
    inputs_embeds,
    past_key_values_length
)
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanModel.forward(
    input_ids: torch.LongTensor = None,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None,
    inputs_embeds: typing.Optional[torch.FloatTensor] = None,
    use_cache: typing.Optional[bool] = None,
    output_attentions: typing.Optional[bool] = None,
    output_hidden_states: typing.Optional[bool] = None,
    return_dict: typing.Optional[bool] = None,
    kwargs = {}
) -> typing.Union[typing.Tuple, transformers.modeling_outputs.BaseModelOutputWithPast]
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanModel.get_input_embeddings()
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanModel.set_input_embeddings(
    value
)
```

```python
class nemo_automodel.components.models.baichuan.model.BaichuanPreTrainedModel()
```

**Bases:** `PreTrainedModel`

```python
nemo_automodel.components.models.baichuan.model.BaichuanPreTrainedModel._init_weights(
    module
)
```

```python
nemo_automodel.components.models.baichuan.model.BaichuanPreTrainedModel._set_gradient_checkpointing(
    module,
    value = False
)
```

```python
class nemo_automodel.components.models.baichuan.model.DecoderLayer(
    config: nemo_automodel.components.models.baichuan.configuration.BaichuanConfig
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.baichuan.model.DecoderLayer.forward(
    hidden_states: torch.Tensor,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    past_key_value: typing.Optional[typing.Tuple[torch.Tensor]] = None,
    output_attentions: typing.Optional[bool] = False,
    use_cache: typing.Optional[bool] = False
) -> typing.Tuple[torch.FloatTensor, ...]
```

```python
class nemo_automodel.components.models.baichuan.model.MLP(
    hidden_size,
    intermediate_size,
    hidden_act
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.baichuan.model.MLP.forward(
    x
)
```

```python
class nemo_automodel.components.models.baichuan.model.NormHead(
    hidden_size,
    vocab_size,
    bias = False
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.baichuan.model.NormHead.forward(
    hidden_states
)
```

```python
class nemo_automodel.components.models.baichuan.model.RMSNorm(
    hidden_size,
    eps = 1e-06
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.baichuan.model.RMSNorm.forward(
    hidden_states
)
```

```python
class nemo_automodel.components.models.baichuan.model.RotaryEmbedding(
    dim,
    max_position_embeddings = 2048,
    base = 10000,
    device = None
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.baichuan.model.RotaryEmbedding.forward(
    x,
    seq_len = None
)
```

```python
nemo_automodel.components.models.baichuan.model._apply_rotary_pos_emb(
    q,
    k,
    cos_,
    sin_,
    position_ids
)
```

```python
nemo_automodel.components.models.baichuan.model._expand_mask(
    mask,
    dtype,
    tgt_len = None
)
```

```python
nemo_automodel.components.models.baichuan.model._make_causal_mask(
    input_ids_shape,
    dtype,
    device,
    past_key_values_length = 0
)
```

```python
nemo_automodel.components.models.baichuan.model._rotate_half(
    x
)
```

```python
nemo_automodel.components.models.baichuan.model.ModelClass = BaichuanForCausalLM
```

```python
nemo_automodel.components.models.baichuan.model.logger = logging.get_logger(__name__)
```