nemo_automodel.components.models.llama_bidirectional.model#
Llama Bidirectional model for embedding and retrieval tasks.
This module provides a bidirectional variant of Llama that is auto-discovered by the ModelRegistry via the ModelClass export.
To add support for other backbones (e.g., Qwen2, Mistral), create a similar module in a new directory (e.g., qwen2_bidirectional/) with its own ModelClass export.
Module Contents#
Classes#
Configuration class for LlamaBidirectionalModel. |
|
Llama Model with bidirectional attention. |
|
Llama Bidirectional Model with a sequence classification/regression head. |
Functions#
Pool hidden states using the specified pooling method. |
Data#
API#
- class nemo_automodel.components.models.llama_bidirectional.model.LlamaBidirectionalConfig(
- pooling: str = 'avg',
- temperature: float = 1.0,
- **kwargs,
Bases:
transformers.models.llama.configuration_llama.LlamaConfigConfiguration class for LlamaBidirectionalModel.
Extends LlamaConfig with additional parameters for bidirectional attention and pooling configurations.
Initialization
Initialize LlamaBidirectionalConfig.
- Parameters:
pooling – Pooling strategy (‘avg’, ‘cls’, ‘last’, etc.)
temperature – Temperature for scaling logits
**kwargs – Additional arguments passed to LlamaConfig
- model_type#
‘LlamaBidirectionalModel’
- class nemo_automodel.components.models.llama_bidirectional.model.LlamaBidirectionalModel(
- config: transformers.models.llama.configuration_llama.LlamaConfig,
Bases:
transformers.models.llama.modeling_llama.LlamaModelLlama Model with bidirectional attention.
This model removes causal masking from all attention layers, allowing tokens to attend to all other tokens in the sequence. This is useful for embedding and retrieval tasks where bidirectional context is beneficial.
The model is auto-discovered by ModelRegistry via the ModelClass export, enabling it to be loaded via NeMoAutoModelBiencoder.from_pretrained().
Initialization
Initialize LlamaBidirectionalModel.
- Parameters:
config – Model configuration
- config_class#
None
- _update_causal_mask(
- attention_mask: torch.Tensor,
- input_tensor: Optional[torch.Tensor] = None,
- **kwargs,
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- cache_position: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- **kwargs: transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs],
- nemo_automodel.components.models.llama_bidirectional.model._pool(
- last_hidden_states: torch.Tensor,
- attention_mask: torch.Tensor,
- pool_type: str,
Pool hidden states using the specified pooling method.
- class nemo_automodel.components.models.llama_bidirectional.model.LlamaBidirectionalForSequenceClassification(config)#
Bases:
transformers.models.llama.modeling_llama.LlamaForSequenceClassificationLlama Bidirectional Model with a sequence classification/regression head.
This model adds a classification head on top of the bidirectional Llama model and includes configurable pooling strategies.
Initialization
- config_class#
None
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[Union[transformers.cache_utils.Cache, List[torch.FloatTensor]]] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
- nemo_automodel.components.models.llama_bidirectional.model.ModelClass#
None
- nemo_automodel.components.models.llama_bidirectional.model.__all__#
[‘LlamaBidirectionalModel’, ‘LlamaBidirectionalConfig’, ‘LlamaBidirectionalForSequenceClassification…