nemo_automodel.components.models.ministral_bidirectional.model#
Bidirectional Ministral3 model for embedding tasks.
This module provides a modified Ministral3Model that uses bidirectional (non-causal) attention, suitable for generating embeddings where each token should attend to all other tokens in the sequence.
Module Contents#
Classes#
Configuration for Ministral3BidirectionalModel with pooling and temperature settings. |
|
Ministral3Model modified to use bidirectional (non-causal) attention. |
Functions#
Register bidirectional Ministral3 with HuggingFace Auto classes. |
Data#
API#
- nemo_automodel.components.models.ministral_bidirectional.model.logger#
‘get_logger(…)’
- class nemo_automodel.components.models.ministral_bidirectional.model.Ministral3BidirectionalConfig(
- pooling: str = 'avg',
- temperature: float = 1.0,
- **kwargs,
Bases:
transformers.models.ministral3.configuration_ministral3.Ministral3ConfigConfiguration for Ministral3BidirectionalModel with pooling and temperature settings.
Initialization
- model_type#
‘ministral3_bidirec’
- class nemo_automodel.components.models.ministral_bidirectional.model.Ministral3BidirectionalModel(config)#
Bases:
transformers.models.ministral3.modeling_ministral3.Ministral3ModelMinistral3Model modified to use bidirectional (non-causal) attention.
In standard Ministral3, each token can only attend to previous tokens (causal attention). This model removes that restriction, allowing each token to attend to all tokens in the sequence, which is useful for embedding tasks.
The key modifications are: 1. Setting is_causal=False on all attention layers 2. Using a bidirectional attention mask instead of causal mask
Loading a Mistral3 VLM checkpoint (e.g.
mistralai/Ministral-3-3B-Base-2512ormistralai/Ministral-3-3B-Instruct-2512) requires extracting the language tower; this is driven by the recipe YAML viaextract_submodel: language_modeland handled by- Func:
nemo_automodel._transformers.retrieval.build_encoder_backbone.
Text-only checkpoints (e.g.
mistralai/Ministral-3B-Instruct) load directly via the standardfrom_pretrainedpath with no extraction needed.Initialization
- config_class#
None
- forward(
- input_ids: torch.LongTensor | None = None,
- attention_mask: torch.Tensor | None = None,
- position_ids: torch.LongTensor | None = None,
- past_key_values: transformers.cache_utils.Cache | None = None,
- inputs_embeds: torch.FloatTensor | None = None,
- use_cache: bool | None = None,
- cache_position: torch.LongTensor | None = None,
- **kwargs,
Forward pass with bidirectional attention.
Identical to Ministral3Model.forward() except the causal mask is replaced with a bidirectional mask, allowing all tokens to attend to each other.
- nemo_automodel.components.models.ministral_bidirectional.model.ModelClass#
None
- nemo_automodel.components.models.ministral_bidirectional.model._register_with_hf_auto_classes() None#
Register bidirectional Ministral3 with HuggingFace Auto classes.
Needed so
AutoModel.from_config(Ministral3BidirectionalConfig)and checkpoint reload paths that use Auto resolution work consistently.
- nemo_automodel.components.models.ministral_bidirectional.model.__all__#
[‘Ministral3BidirectionalModel’, ‘Ministral3BidirectionalConfig’, ‘ModelClass’]