> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.llama_nemotron_vl.model

## Module Contents

### Classes

| Name                                                                                                             | Description                                                                             |
| ---------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| [`LlamaBidirectionalConfig`](#nemo_automodel-components-models-llama_nemotron_vl-model-LlamaBidirectionalConfig) | Configuration for bidirectional (non-causal) LLaMA model.                               |
| [`LlamaBidirectionalModel`](#nemo_automodel-components-models-llama_nemotron_vl-model-LlamaBidirectionalModel)   | LlamaModel modified to use bidirectional (non-causal) attention.                        |
| [`LlamaNemotronVLConfig`](#nemo_automodel-components-models-llama_nemotron_vl-model-LlamaNemotronVLConfig)       | Base configuration for vision-language models combining vision and language components. |
| [`LlamaNemotronVLModel`](#nemo_automodel-components-models-llama_nemotron_vl-model-LlamaNemotronVLModel)         | LlamaNemotron VL model for vision-language reranking.                                   |

### Functions

| Name                                                                                                                                             | Description                                                        |
| ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
| [`_filter_vision_embeddings_by_image_flags`](#nemo_automodel-components-models-llama_nemotron_vl-model-_filter_vision_embeddings_by_image_flags) | Keep only vision embeddings marked as real images.                 |
| [`_register_with_hf_auto_classes`](#nemo_automodel-components-models-llama_nemotron_vl-model-_register_with_hf_auto_classes)                     | Register bidirectional models with HuggingFace Auto classes.       |
| [`_replace_image_token_embeddings`](#nemo_automodel-components-models-llama_nemotron_vl-model-_replace_image_token_embeddings)                   | Replace image placeholder token embeddings with vision embeddings. |
| [`pool`](#nemo_automodel-components-models-llama_nemotron_vl-model-pool)                                                                         | -                                                                  |
| [`split_model`](#nemo_automodel-components-models-llama_nemotron_vl-model-split_model)                                                           | -                                                                  |

### Data

[`ModelClass`](#nemo_automodel-components-models-llama_nemotron_vl-model-ModelClass)

[`_DYNAMIC_CACHE_ACCEPTS_CONFIG`](#nemo_automodel-components-models-llama_nemotron_vl-model-_DYNAMIC_CACHE_ACCEPTS_CONFIG)

[`_HAS_NATIVE_BIDIRECTIONAL_MASK`](#nemo_automodel-components-models-llama_nemotron_vl-model-_HAS_NATIVE_BIDIRECTIONAL_MASK)

[`_USE_PLURAL_CACHE_PARAM`](#nemo_automodel-components-models-llama_nemotron_vl-model-_USE_PLURAL_CACHE_PARAM)

[`__all__`](#nemo_automodel-components-models-llama_nemotron_vl-model-__all__)

[`_decoder_forward_params`](#nemo_automodel-components-models-llama_nemotron_vl-model-_decoder_forward_params)

[`_dynamic_cache_init_params`](#nemo_automodel-components-models-llama_nemotron_vl-model-_dynamic_cache_init_params)

[`logger`](#nemo_automodel-components-models-llama_nemotron_vl-model-logger)

### API

```python
class nemo_automodel.components.models.llama_nemotron_vl.model.LlamaBidirectionalConfig(
    pooling = 'avg',
    temperature = 1.0,
    kwargs = {}
)
```

**Bases:** `LlamaConfig`

Configuration for bidirectional (non-causal) LLaMA model.

```python
class nemo_automodel.components.models.llama_nemotron_vl.model.LlamaBidirectionalModel(
    config: nemo_automodel.components.models.llama_nemotron_vl.model.LlamaBidirectionalConfig
)
```

**Bases:** `LlamaModel`

LlamaModel modified to use bidirectional (non-causal) attention.
Supports transformers 4.44+ through 5.x with a unified forward() implementation.
See [https://huggingface.co/nvidia/llama-nemotron-embed-1b-v2](https://huggingface.co/nvidia/llama-nemotron-embed-1b-v2) for version notes.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaBidirectionalModel._create_bidirectional_mask(
    input_embeds: torch.Tensor,
    attention_mask: torch.Tensor | None
) -> torch.Tensor | None
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaBidirectionalModel.forward(
    input_ids: torch.LongTensor | None = None,
    attention_mask: torch.Tensor | None = None,
    position_ids: torch.LongTensor | None = None,
    past_key_values: transformers.cache_utils.Cache | None = None,
    inputs_embeds: torch.FloatTensor | None = None,
    cache_position: torch.LongTensor | None = None,
    use_cache: bool | None = None,
    output_hidden_states: bool | None = None,
    kwargs = {}
) -> transformers.modeling_outputs.BaseModelOutputWithPast
```

```python
class nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLConfig(
    vision_config = None,
    llm_config = None,
    use_backbone_lora = 0,
    use_llm_lora = 0,
    select_layer = -1,
    force_image_size = None,
    downsample_ratio = 0.5,
    template = None,
    dynamic_image_size = False,
    use_thumbnail = False,
    min_dynamic_patch = 1,
    max_dynamic_patch = 6,
    mlp_checkpoint = True,
    pre_feature_reduction = False,
    keep_aspect_ratio = False,
    vocab_size = -1,
    q_max_length: typing.Optional[int] = 512,
    p_max_length: typing.Optional[int] = 10240,
    query_prefix: str = 'query:',
    passage_prefix: str = 'passage:',
    pooling: str = 'last',
    bidirectional_attention: bool = False,
    max_input_tiles: int = 2,
    img_context_token_id: int = 128258,
    kwargs = {}
)
```

**Bases:** `PretrainedConfig`

Base configuration for vision-language models combining vision and language components.
This serves as the foundation for LlamaNemotronVL configurations.

```python
class nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel(
    config: nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLConfig,
    vision_model: typing.Optional[transformers.PreTrainedModel] = None,
    language_model: typing.Optional[transformers.PreTrainedModel] = None
)
```

**Bases:** `PreTrainedModel`

LlamaNemotron VL model for vision-language reranking.
Combines a vision encoder (SigLIP) with a bidirectional language model (LLaMA)
for cross-modal reranking tasks.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel._embed_batch(
    inputs: typing.Dict[str, typing.Any],
    pool_type: typing.Optional[str] = None
)
```

Encodes the inputs into a tensor of embeddings.
Args:
inputs: A dictionary of inputs to the model. You can prepare the inputs using the processor.process\_queries and processor.process\_documents methods.
pool\_type: The type of pooling to use. If None, the pooling type is set to the pooling type configured in the model.
Returns:
A tensor of embeddings.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.build_collator(
    processor = None,
    kwargs = {}
)
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.encode_documents(
    images: typing.Optional[typing.List[typing.Any]] = None,
    texts: typing.Optional[typing.List[str]] = None,
    kwargs = {}
)
```

Encodes the input document images and texts into a tensor of embeddings.
Args:
images: A list of PIL.Image of document pages images.
texts: A list of document page texts.
Returns:
A tensor of embeddings.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.encode_queries(
    queries: typing.List[str],
    kwargs = {}
)
```

Encodes the input queries into a tensor of embeddings.
Args:
queries: A list of queries.
Returns:
A tensor of embeddings.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.extract_feature(
    pixel_values
)
```

Extract and project vision features to language model space.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.forward(
    pixel_values: torch.FloatTensor = None,
    input_ids: torch.LongTensor = None,
    attention_mask: typing.Optional[torch.Tensor] = None,
    position_ids: typing.Optional[torch.LongTensor] = None,
    image_flags: typing.Optional[torch.LongTensor] = None,
    past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None,
    labels: typing.Optional[torch.LongTensor] = None,
    use_cache: typing.Optional[bool] = None,
    output_attentions: typing.Optional[bool] = None,
    output_hidden_states: typing.Optional[bool] = None,
    return_dict: typing.Optional[bool] = None,
    num_patches_list: typing.Optional[typing.List[torch.Tensor]] = None,
    run_dummy_vision: typing.Optional[bool] = None
) -> typing.Union[typing.Tuple, transformers.modeling_outputs.CausalLMOutputWithPast]
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.get_input_embeddings()
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.get_output_embeddings()
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.pixel_shuffle(
    x,
    scale_factor = 0.5
)
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.LlamaNemotronVLModel.post_loss(
    loss,
    inputs
)
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model._filter_vision_embeddings_by_image_flags(
    vit_embeds: torch.Tensor,
    image_flags: typing.Optional[torch.Tensor]
) -> torch.Tensor
```

Keep only vision embeddings marked as real images.

```python
nemo_automodel.components.models.llama_nemotron_vl.model._register_with_hf_auto_classes()
```

Register bidirectional models with HuggingFace Auto classes.

This is needed so that AutoModel.from\_config(LlamaBidirectionalConfig)
works inside LlamaForSequenceClassification.**init**.

```python
nemo_automodel.components.models.llama_nemotron_vl.model._replace_image_token_embeddings(
    input_embeds: torch.Tensor,
    input_ids: torch.Tensor,
    vit_embeds: torch.Tensor,
    img_context_token_id: int
) -> torch.Tensor
```

Replace image placeholder token embeddings with vision embeddings.

```python
nemo_automodel.components.models.llama_nemotron_vl.model.pool(
    last_hidden_states: torch.Tensor,
    attention_mask: torch.Tensor,
    pool_type: str
) -> torch.Tensor
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.split_model(
    model_path,
    device
)
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.ModelClass = [LlamaNemotronVLModel]
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model._DYNAMIC_CACHE_ACCEPTS_CONFIG = 'config' in _dynamic_cache_init_params
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model._HAS_NATIVE_BIDIRECTIONAL_MASK = True
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model._USE_PLURAL_CACHE_PARAM = 'past_key_values' in _decoder_forward_params
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.__all__ = ['LlamaNemotronVLModel', 'LlamaNemotronVLConfig', 'ModelClass']
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model._decoder_forward_params = inspect.signature(LlamaDecoderLayer.forward).parameters
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model._dynamic_cache_init_params = inspect.signature(DynamicCache.__init__).parameters
```

```python
nemo_automodel.components.models.llama_nemotron_vl.model.logger = logging.get_logger(__name__)
```