> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.step3p7.model

## Module Contents

### Classes

| Name                                                                                                                 | Description                                                              |
| -------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`Step3p7CausalLMOutput`](#nemo_automodel-components-models-step3p7-model-Step3p7CausalLMOutput)                     | `CausalLMOutputWithPast` plus optional per-depth MTP logits.             |
| [`Step3p7ForConditionalGeneration`](#nemo_automodel-components-models-step3p7-model-Step3p7ForConditionalGeneration) | Native Step3.7 VLM implementation for MedPix fine-tuning with EP and PP. |
| [`Step3p7Model`](#nemo_automodel-components-models-step3p7-model-Step3p7Model)                                       | Step3.7 VLM wrapper using the native Step3.5 MoE language backbone.      |

### Functions

| Name                                                                                             | Description |
| ------------------------------------------------------------------------------------------------ | ----------- |
| [`_debug_vision_enabled`](#nemo_automodel-components-models-step3p7-model-_debug_vision_enabled) | -           |
| [`_debug_vision_log`](#nemo_automodel-components-models-step3p7-model-_debug_vision_log)         | -           |
| [`_rank`](#nemo_automodel-components-models-step3p7-model-_rank)                                 | -           |

### Data

[`ModelClass`](#nemo_automodel-components-models-step3p7-model-ModelClass)

[`logger`](#nemo_automodel-components-models-step3p7-model-logger)

### API

```python
class nemo_automodel.components.models.step3p7.model.Step3p7CausalLMOutput(
    mtp_per_depth_logits: list[torch.Tensor] | None = None,
    mtp_loss_scaling_factor: float | None = None
)
```

Dataclass

**Bases:** `CausalLMOutputWithPast`

`CausalLMOutputWithPast` plus optional per-depth MTP logits.

Subclassing the HF `ModelOutput` gives this output the standard
`logits`/`hidden_states` fields (so `"hidden_states" in out` and
`getattr(out, "hidden_states")` behave like every other model and the
fused-CE path can read the final hidden states), while the MTP fields stay
declared dataclass fields so they survive output-restructuring layers like
FSDP2's mixed-precision output cast, which rebuild `ModelOutput`
instances from declared fields only.

```python
class nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration(
    config: nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7Config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
    backend: nemo_automodel.components.models.common.BackendConfig | None = None,
    kwargs: typing.Any = {}
)
```

**Bases:** [HFCheckpointingMixin](/nemo-automodel/nemo_automodel/components/models/common/hf_checkpointing_mixin#nemo_automodel-components-models-common-hf_checkpointing_mixin-HFCheckpointingMixin), `Module`, [MoEFSDPSyncMixin](/nemo-automodel/nemo_automodel/components/moe/fsdp_mixin#nemo_automodel-components-moe-fsdp_mixin-MoEFSDPSyncMixin)

Native Step3.7 VLM implementation for MedPix fine-tuning with EP and PP.

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration._build_mtp_embed_inputs_from_embeds(
    inputs_embeds: torch.Tensor
) -> tuple[torch.Tensor, ...]
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration._is_pipeline_parallel_stage() -> bool
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration._make_position_ids(
    hidden: torch.Tensor,
    position_ids: torch.Tensor | None
) -> torch.Tensor
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.customize_pipeline_stage_modules(
    module_names_per_stage: list[list[str]],
    layers_prefix: str,
    text_model: torch.nn.Module | None = None
) -> list[list[str]]
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.forward(
    input_ids: torch.Tensor | None = None,
    mtp_embed_inputs: torch.Tensor = (),
    position_ids: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    inputs_embeds: torch.Tensor | None = None,
    cache_position: torch.Tensor | None = None,
    logits_to_keep: typing.Union[int, torch.Tensor] = 0,
    output_hidden_states: typing.Optional[bool] = None,
    kwargs: typing.Any = {}
) -> torch.Tensor | nemo_automodel.components.models.step3p7.model.Step3p7CausalLMOutput
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.from_config(
    config: nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7Config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
    backend: nemo_automodel.components.models.common.BackendConfig | None = None,
    kwargs: typing.Any = {}
)
```

classmethod

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.from_pretrained(
    pretrained_model_name_or_path: str,
    model_args: typing.Any = (),
    kwargs: typing.Any = {}
)
```

classmethod

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.get_decoder()
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.get_input_embeddings()
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.get_output_embeddings()
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.get_pipeline_stage_metas(
    is_first: bool,
    microbatch_size: int,
    seq_len: int,
    dtype: torch.dtype
) -> tuple[tuple[torch.Tensor, ...], tuple[torch.Tensor, ...]]
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.initialize_weights(
    buffer_device: torch.device | None = None,
    dtype: torch.dtype = torch.bfloat16
) -> None
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.prepare_model_inputs_for_cp(
    input_ids: torch.Tensor,
    pixel_values: torch.Tensor | None = None,
    patch_pixel_values: torch.Tensor | None = None,
    num_patches: torch.Tensor | list[int] | tuple[int, ...] | None = None,
    image_embeds: torch.Tensor | None = None,
    _: typing.Any = {}
) -> dict[str, torch.Tensor]
```

Merge vision features into token embeddings before CP sequence sharding.

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.set_decoder(
    decoder
)
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.set_input_embeddings(
    value
)
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7ForConditionalGeneration.set_output_embeddings(
    new_embeddings
)
```

```python
class nemo_automodel.components.models.step3p7.model.Step3p7Model(
    config: nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7Config,
    backend: nemo_automodel.components.models.common.BackendConfig,
    moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
    moe_overrides: dict | None = None
)
```

**Bases:** `Module`

Step3.7 VLM wrapper using the native Step3.5 MoE language backbone.

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model._process_image_features(
    image_features: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model._process_image_input(
    pixel_values: torch.Tensor,
    patch_pixel_values: torch.Tensor | None = None,
    num_patches: torch.Tensor | list[int] | tuple[int, ...] | None = None
) -> list[torch.Tensor]
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model._vision_dtype_device() -> tuple[torch.dtype, torch.device]
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.forward(
    input_ids: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    inputs_embeds: torch.Tensor | None = None,
    pixel_values: torch.Tensor | None = None,
    patch_pixel_values: torch.Tensor | None = None,
    num_patches: torch.Tensor | list[int] | tuple[int, ...] | None = None,
    image_embeds: torch.Tensor | None = None,
    kwargs: typing.Any = {}
) -> torch.Tensor
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.get_decoder()
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.get_input_embeddings()
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.get_multimodal_embeddings(
    pixel_values: torch.Tensor | None = None,
    patch_pixel_values: torch.Tensor | None = None,
    num_patches: torch.Tensor | list[int] | tuple[int, ...] | None = None,
    image_embeds: torch.Tensor | None = None,
    _: typing.Any = {}
) -> list[torch.Tensor] | torch.Tensor | None
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.prepare_inputs_embeds(
    input_ids: torch.Tensor,
    multimodal_embeddings: list[torch.Tensor] | torch.Tensor | None = None
) -> torch.Tensor
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.set_decoder(
    decoder
)
```

```python
nemo_automodel.components.models.step3p7.model.Step3p7Model.set_input_embeddings(
    value
)
```

```python
nemo_automodel.components.models.step3p7.model._debug_vision_enabled() -> bool
```

```python
nemo_automodel.components.models.step3p7.model._debug_vision_log(
    message: str,
    args: typing.Any = ()
) -> None
```

```python
nemo_automodel.components.models.step3p7.model._rank() -> int
```

```python
nemo_automodel.components.models.step3p7.model.ModelClass = Step3p7ForConditionalGeneration
```

```python
nemo_automodel.components.models.step3p7.model.logger = logging.getLogger(__name__)
```