nemo_automodel.components.models.biencoder.llama_bidirectional_model#

Llama Bidirectional Model for NeMo AutoModel.

This module provides a bidirectional attention variant of Llama that is useful for embedding and retrieval tasks. Unlike the standard causal Llama model, this version can attend to all tokens bidirectionally.

Module Contents#

Classes#

LlamaBidirectionalConfig

Configuration class for LlamaBidirectionalModel.

LlamaBidirectionalModel

Llama Model with bidirectional attention.

LlamaBidirectionalForSequenceClassification

Llama Bidirectional Model with a sequence classification/regression head.

BiencoderOutput

Output dataclass for biencoder model.

BiencoderModel

Biencoder Model with essential functions for training.

Functions#

contrastive_scores_and_labels

Compute contrastive scores and labels without in-batch negatives.

pool

Pool hidden states using the specified pooling method.

Data#

API#

nemo_automodel.components.models.biencoder.llama_bidirectional_model.logger#

β€˜get_logger(…)’

nemo_automodel.components.models.biencoder.llama_bidirectional_model.contrastive_scores_and_labels(
query: torch.Tensor,
key: torch.Tensor,
current_train_n_passages: int,
) Tuple[torch.Tensor, torch.Tensor]#

Compute contrastive scores and labels without in-batch negatives.

Parameters:
  • query – Query embeddings [batch_size, hidden_dim]

  • key – Key/passage embeddings [batch_size * n_passages, hidden_dim]

  • current_train_n_passages – Number of passages per query

Returns:

Tuple of (scores, labels) where scores is [batch_size, n_passages] and labels is [batch_size] of zeros (positive is first passage)

nemo_automodel.components.models.biencoder.llama_bidirectional_model.pool(
last_hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
pool_type: str,
) torch.Tensor#

Pool hidden states using the specified pooling method.

Parameters:
  • last_hidden_states – Hidden states from the model [batch_size, seq_len, hidden_size]

  • attention_mask – Attention mask [batch_size, seq_len]

  • pool_type – Type of pooling to apply

Returns:

Pooled embeddings [batch_size, hidden_size]

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalConfig(
pooling: str = 'avg',
temperature: float = 1.0,
**kwargs,
)#

Bases: transformers.models.llama.configuration_llama.LlamaConfig

Configuration class for LlamaBidirectionalModel.

Extends LlamaConfig with additional parameters for bidirectional attention and pooling configurations.

Initialization

Initialize LlamaBidirectionalConfig.

Parameters:
  • pooling – Pooling strategy (β€˜avg’, β€˜cls’, β€˜last’, etc.)

  • temperature – Temperature for scaling logits

  • **kwargs – Additional arguments passed to LlamaConfig

model_type#

β€˜llama_bidirec’

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalModel(
config: transformers.models.llama.configuration_llama.LlamaConfig,
)#

Bases: transformers.models.llama.modeling_llama.LlamaModel

Llama Model with bidirectional attention.

This model removes causal masking from all attention layers, allowing tokens to attend to all other tokens in the sequence. This is useful for embedding and retrieval tasks where bidirectional context is beneficial.

Initialization

Initialize LlamaBidirectionalModel.

Parameters:

config – Model configuration

config_class#

None

_update_causal_mask(attention_mask: torch.Tensor)#
forward(
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[transformers.cache_utils.Cache] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
cache_position: Optional[torch.LongTensor] = None,
**flash_attn_kwargs,
) transformers.modeling_outputs.BaseModelOutputWithPast#
class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalForSequenceClassification(config)#

Bases: transformers.models.llama.modeling_llama.LlamaForSequenceClassification

Llama Bidirectional Model with a sequence classification/regression head.

This model adds a classification head on top of the bidirectional Llama model and includes configurable pooling strategies.

Initialization

Initialize LlamaBidirectionalForSequenceClassification.

Parameters:

config – Model configuration

config_class#

None

forward(
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[Union[transformers.cache_utils.Cache, List[torch.FloatTensor]]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
labels: Optional[torch.LongTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) Union[Tuple, transformers.modeling_outputs.SequenceClassifierOutputWithPast]#

Forward pass for sequence classification.

Parameters:
  • input_ids – Input token IDs

  • attention_mask – Attention mask

  • position_ids – Position IDs

  • past_key_values – Past key values for generation

  • inputs_embeds – Input embeddings (alternative to input_ids)

  • labels – Labels for computing loss

  • use_cache – Whether to use cache

  • output_attentions – Whether to output attentions

  • output_hidden_states – Whether to output hidden states

  • return_dict – Whether to return a dict

Returns:

SequenceClassifierOutputWithPast with loss, logits, and optional outputs

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.BiencoderOutput#

Bases: transformers.modeling_outputs.ModelOutput

Output dataclass for biencoder model.

q_reps: Optional[torch.Tensor]#

None

p_reps: Optional[torch.Tensor]#

None

loss: Optional[torch.Tensor]#

None

labels: Optional[torch.Tensor]#

None

scores: Optional[torch.Tensor]#

None

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.BiencoderModel(
lm_q: transformers.PreTrainedModel,
lm_p: transformers.PreTrainedModel,
linear_pooler: torch.nn.Module = None,
train_n_passages: int = 1,
eval_negative_size: int = 0,
pooling: str = 'avg',
l2_normalize: bool = True,
t: float = 1.0,
share_encoder: bool = True,
add_linear_pooler: bool = False,
)#

Bases: torch.nn.Module

Biencoder Model with essential functions for training.

This model encodes queries and passages separately and computes contrastive loss.

Initialization

forward(
query: Dict[str, torch.Tensor] = None,
passage: Dict[str, torch.Tensor] = None,
)#

Forward pass for training.

_encode(
encoder: transformers.PreTrainedModel,
input_dict: dict,
) Optional[torch.Tensor]#

Encode input using the encoder.

_compute_scores(
current_train_n_passages: int,
query: Dict[str, torch.Tensor] = None,
passage: Dict[str, torch.Tensor] = None,
) Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]#

Compute similarity scores and labels.

classmethod build(
model_name_or_path: str,
share_encoder: bool = True,
add_linear_pooler: bool = False,
out_dimension: int = 768,
do_gradient_checkpointing: bool = False,
train_n_passages: int = 1,
eval_negative_size: int = 0,
pooling: str = 'avg',
l2_normalize: bool = True,
t: float = 1.0,
**hf_kwargs,
)#

Build biencoder model from pretrained.

Parameters:
  • model_name_or_path – Path to pretrained model or model identifier

  • share_encoder – Whether to share encoder weights between query and passage

  • add_linear_pooler – Whether to add a linear pooler layer

  • out_dimension – Output dimension for linear pooler

  • do_gradient_checkpointing – Whether to enable gradient checkpointing

  • train_n_passages – Number of passages per query during training

  • eval_negative_size – Number of negative samples during evaluation

  • pooling – Pooling strategy (β€˜avg’, β€˜cls’, β€˜last’, etc.)

  • l2_normalize – Whether to L2 normalize embeddings

  • t – Temperature for scaling similarity scores

  • **hf_kwargs – Additional arguments passed to model loading

save(output_dir: str)#

Save model to output directory.