> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.step3p7.processing_step3

## Module Contents

### Classes

| Name                                                                                                                    | Description                                                                |
| ----------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| [`GPUToTensor`](#nemo_automodel-components-models-step3p7-processing_step3-GPUToTensor)                                 | -                                                                          |
| [`ImagePatcher`](#nemo_automodel-components-models-step3p7-processing_step3-ImagePatcher)                               | -                                                                          |
| [`Step3VLImageEmbeddingInputs`](#nemo_automodel-components-models-step3p7-processing_step3-Step3VLImageEmbeddingInputs) | -                                                                          |
| [`Step3VLImagePixelInputs`](#nemo_automodel-components-models-step3p7-processing_step3-Step3VLImagePixelInputs)         | -                                                                          |
| [`Step3VLProcessor`](#nemo_automodel-components-models-step3p7-processing_step3-Step3VLProcessor)                       | Processor that expands Step3.7 image inputs into image-token placeholders. |
| [`Step3VisionProcessor`](#nemo_automodel-components-models-step3p7-processing_step3-Step3VisionProcessor)               | -                                                                          |

### Data

[`ImageWithPatches`](#nemo_automodel-components-models-step3p7-processing_step3-ImageWithPatches)

[`MAX_IMAGE_SIZE`](#nemo_automodel-components-models-step3p7-processing_step3-MAX_IMAGE_SIZE)

[`__all__`](#nemo_automodel-components-models-step3p7-processing_step3-__all__)

### API

```python
class nemo_automodel.components.models.step3p7.processing_step3.GPUToTensor()
```

**Bases:** `Module`

```python
nemo_automodel.components.models.step3p7.processing_step3.GPUToTensor.forward(
    raw_image: typing.Union[numpy.ndarray, PIL.Image.Image]
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher()
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.__call__(
    img: PIL.Image.Image
) -> tuple[PIL.Image.Image, list[PIL.Image.Image], list[bool] | None]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.determine_window_size(
    long: int,
    short: int
) -> int
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.get_image_size_for_crop(
    img_width: int,
    img_height: int,
    window_size: int
)
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.get_image_size_for_padding(
    img_width: int,
    img_height: int
) -> tuple[int, int]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.get_image_size_for_preprocess(
    img_width: int,
    img_height: int
) -> tuple[int, int]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.get_num_patches(
    img_width: int,
    img_height: int
) -> tuple[int, int]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.patch_crop(
    img: PIL.Image.Image,
    i: int,
    j: int,
    th: int,
    tw: int
)
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.slide_window(
    width: int,
    height: int,
    sizes: list[tuple[int, int]],
    steps: list[tuple[int, int]],
    img_rate_thr: float = 0.6
) -> tuple[list[tuple[int, int, int, int]], tuple[int, int]]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImagePatcher.square_pad(
    img: PIL.Image.Image
) -> PIL.Image.Image
```

```python
class nemo_automodel.components.models.step3p7.processing_step3.Step3VLImageEmbeddingInputs
```

**Bases:** `typing.TypedDict`

```python
class nemo_automodel.components.models.step3p7.processing_step3.Step3VLImagePixelInputs
```

**Bases:** `typing.TypedDict`

```python
class nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor(
    tokenizer = None,
    chat_template = None,
    kwargs = {}
)
```

**Bases:** `ProcessorMixin`

Processor that expands Step3.7 image inputs into image-token placeholders.

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor.__call__(
    text: typing.Optional[typing.Union[str, list[str]]] = None,
    images: transformers.image_utils.ImageInput | None = None,
    return_tensors: typing.Optional[typing.Union[str, transformers.feature_extraction_utils.TensorType]] = None,
    kwargs = {}
) -> transformers.feature_extraction_utils.BatchFeature
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor._convert_images_to_pixel_values(
    images: list[PIL.Image.Image],
    is_patch: bool = False
) -> list[torch.Tensor]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor._get_image_repl(
    num_images: int
) -> tuple[str, list[int]]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor._get_image_repl_features(
    num_images: int,
    num_patches: int,
    patch_new_line_idx: typing.Optional[list[bool]]
) -> tuple[str, list[int]]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor._get_patch_repl(
    num_patches: int,
    patch_newline_mask: list[bool] | None
) -> tuple[str, list[int]]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor._normalize_batched_images(
    images,
    batch_size: int
) -> list[list[PIL.Image.Image]]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor._split_images(
    images: list[PIL.Image.Image]
) -> list[nemo_automodel.components.models.step3p7.processing_step3.ImageWithPatches]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor.batch_decode(
    args = (),
    kwargs = {}
)
```

This method forwards all its arguments to GemmaTokenizerFast's \[`~PreTrainedTokenizer.batch_decode`]. Please
refer to the docstring of this method for more information.

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor.decode(
    args = (),
    kwargs = {}
)
```

This method forwards all its arguments to GemmaTokenizerFast's \[`~PreTrainedTokenizer.decode`]. Please refer to
the docstring of this method for more information.

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor.get_num_image_tokens(
    img_width: int,
    img_height: int
) -> int
```

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VLProcessor.replace_placeholder(
    text: str,
    placeholder: str,
    repls: list[str]
) -> str
```

```python
class nemo_automodel.components.models.step3p7.processing_step3.Step3VisionProcessor(
    size,
    interpolation_mode = 'bicubic',
    patch_size = None
)
```

**Bases:** `BaseImageProcessor`

```python
nemo_automodel.components.models.step3p7.processing_step3.Step3VisionProcessor.__call__(
    image,
    is_patch = False
)
```

```python
nemo_automodel.components.models.step3p7.processing_step3.ImageWithPatches = tuple[Image.Image, list[Image.Image], list[int] | None]
```

```python
nemo_automodel.components.models.step3p7.processing_step3.MAX_IMAGE_SIZE: int = 3024
```

```python
nemo_automodel.components.models.step3p7.processing_step3.__all__ = ['Step3VLProcessor']
```