nemo_automodel.components.models.biencoder.llama_bidirectional_model#
Llama Bidirectional Model for NeMo AutoModel.
This module provides a bidirectional attention variant of Llama that is useful for embedding and retrieval tasks. Unlike the standard causal Llama model, this version can attend to all tokens bidirectionally.
Module Contents#
Classes#
Configuration class for LlamaBidirectionalModel. |
|
Llama Model with bidirectional attention. |
|
Llama Bidirectional Model with a sequence classification/regression head. |
|
Output dataclass for biencoder model. |
|
Biencoder Model with essential functions for training. |
Functions#
Compute contrastive scores and labels without in-batch negatives. |
|
Pool hidden states using the specified pooling method. |
Data#
API#
- nemo_automodel.components.models.biencoder.llama_bidirectional_model.logger#
βget_logger(β¦)β
- nemo_automodel.components.models.biencoder.llama_bidirectional_model.contrastive_scores_and_labels(
- query: torch.Tensor,
- key: torch.Tensor,
- current_train_n_passages: int,
Compute contrastive scores and labels without in-batch negatives.
- Parameters:
query β Query embeddings [batch_size, hidden_dim]
key β Key/passage embeddings [batch_size * n_passages, hidden_dim]
current_train_n_passages β Number of passages per query
- Returns:
Tuple of (scores, labels) where scores is [batch_size, n_passages] and labels is [batch_size] of zeros (positive is first passage)
- nemo_automodel.components.models.biencoder.llama_bidirectional_model.pool(
- last_hidden_states: torch.Tensor,
- attention_mask: torch.Tensor,
- pool_type: str,
Pool hidden states using the specified pooling method.
- Parameters:
last_hidden_states β Hidden states from the model [batch_size, seq_len, hidden_size]
attention_mask β Attention mask [batch_size, seq_len]
pool_type β Type of pooling to apply
- Returns:
Pooled embeddings [batch_size, hidden_size]
- class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalConfig(
- pooling: str = 'avg',
- temperature: float = 1.0,
- **kwargs,
Bases:
transformers.models.llama.configuration_llama.LlamaConfigConfiguration class for LlamaBidirectionalModel.
Extends LlamaConfig with additional parameters for bidirectional attention and pooling configurations.
Initialization
Initialize LlamaBidirectionalConfig.
- Parameters:
pooling β Pooling strategy (βavgβ, βclsβ, βlastβ, etc.)
temperature β Temperature for scaling logits
**kwargs β Additional arguments passed to LlamaConfig
- model_type#
βllama_bidirecβ
- class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalModel(
- config: transformers.models.llama.configuration_llama.LlamaConfig,
Bases:
transformers.models.llama.modeling_llama.LlamaModelLlama Model with bidirectional attention.
This model removes causal masking from all attention layers, allowing tokens to attend to all other tokens in the sequence. This is useful for embedding and retrieval tasks where bidirectional context is beneficial.
Initialization
Initialize LlamaBidirectionalModel.
- Parameters:
config β Model configuration
- config_class#
None
- _update_causal_mask(attention_mask: torch.Tensor)#
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[transformers.cache_utils.Cache] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- cache_position: Optional[torch.LongTensor] = None,
- **flash_attn_kwargs,
- class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalForSequenceClassification(config)#
Bases:
transformers.models.llama.modeling_llama.LlamaForSequenceClassificationLlama Bidirectional Model with a sequence classification/regression head.
This model adds a classification head on top of the bidirectional Llama model and includes configurable pooling strategies.
Initialization
Initialize LlamaBidirectionalForSequenceClassification.
- Parameters:
config β Model configuration
- config_class#
None
- forward(
- input_ids: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[Union[transformers.cache_utils.Cache, List[torch.FloatTensor]]] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
Forward pass for sequence classification.
- Parameters:
input_ids β Input token IDs
attention_mask β Attention mask
position_ids β Position IDs
past_key_values β Past key values for generation
inputs_embeds β Input embeddings (alternative to input_ids)
labels β Labels for computing loss
use_cache β Whether to use cache
output_attentions β Whether to output attentions
output_hidden_states β Whether to output hidden states
return_dict β Whether to return a dict
- Returns:
SequenceClassifierOutputWithPast with loss, logits, and optional outputs
- class nemo_automodel.components.models.biencoder.llama_bidirectional_model.BiencoderOutput#
Bases:
transformers.modeling_outputs.ModelOutputOutput dataclass for biencoder model.
- q_reps: Optional[torch.Tensor]#
None
- p_reps: Optional[torch.Tensor]#
None
- loss: Optional[torch.Tensor]#
None
- labels: Optional[torch.Tensor]#
None
- scores: Optional[torch.Tensor]#
None
- class nemo_automodel.components.models.biencoder.llama_bidirectional_model.BiencoderModel(
- lm_q: transformers.PreTrainedModel,
- lm_p: transformers.PreTrainedModel,
- linear_pooler: torch.nn.Module = None,
- train_n_passages: int = 1,
- eval_negative_size: int = 0,
- pooling: str = 'avg',
- l2_normalize: bool = True,
- t: float = 1.0,
- share_encoder: bool = True,
- add_linear_pooler: bool = False,
Bases:
torch.nn.ModuleBiencoder Model with essential functions for training.
This model encodes queries and passages separately and computes contrastive loss.
Initialization
- forward(
- query: Dict[str, torch.Tensor] = None,
- passage: Dict[str, torch.Tensor] = None,
Forward pass for training.
- _encode(
- encoder: transformers.PreTrainedModel,
- input_dict: dict,
Encode input using the encoder.
- _compute_scores(
- current_train_n_passages: int,
- query: Dict[str, torch.Tensor] = None,
- passage: Dict[str, torch.Tensor] = None,
Compute similarity scores and labels.
- classmethod build(
- model_name_or_path: str,
- share_encoder: bool = True,
- add_linear_pooler: bool = False,
- out_dimension: int = 768,
- do_gradient_checkpointing: bool = False,
- train_n_passages: int = 1,
- eval_negative_size: int = 0,
- pooling: str = 'avg',
- l2_normalize: bool = True,
- t: float = 1.0,
- **hf_kwargs,
Build biencoder model from pretrained.
- Parameters:
model_name_or_path β Path to pretrained model or model identifier
share_encoder β Whether to share encoder weights between query and passage
add_linear_pooler β Whether to add a linear pooler layer
out_dimension β Output dimension for linear pooler
do_gradient_checkpointing β Whether to enable gradient checkpointing
train_n_passages β Number of passages per query during training
eval_negative_size β Number of negative samples during evaluation
pooling β Pooling strategy (βavgβ, βclsβ, βlastβ, etc.)
l2_normalize β Whether to L2 normalize embeddings
t β Temperature for scaling similarity scores
**hf_kwargs β Additional arguments passed to model loading
- save(output_dir: str)#
Save model to output directory.