`nemo_automodel.components.models.biencoder.llama_bidirectional_model`#

Llama Bidirectional Model for NeMo AutoModel.

This module provides a bidirectional attention variant of Llama that is useful for embedding and retrieval tasks. Unlike the standard causal Llama model, this version can attend to all tokens bidirectionally.

Module Contents#

Classes#

`LlamaBidirectionalConfig`	Configuration class for LlamaBidirectionalModel.
`LlamaBidirectionalModel`	Llama Model with bidirectional attention.
`LlamaBidirectionalForSequenceClassification`	Llama Bidirectional Model with a sequence classification/regression head.
`BiencoderOutput`	Output dataclass for biencoder model.
`BiencoderModel`	Biencoder Model with essential functions for training.

Functions#

`contrastive_scores_and_labels`	Compute contrastive scores and labels without in-batch negatives.
`pool`	Pool hidden states using the specified pooling method.

Data#

logger

API#

nemo_automodel.components.models.biencoder.llama_bidirectional_model.logger#: ‘get_logger(…)’

nemo_automodel.components.models.biencoder.llama_bidirectional_model.contrastive_scores_and_labels( query: torch.Tensor, key: torch.Tensor, current_train_n_passages: int, ) → Tuple[torch.Tensor, torch.Tensor]#

Compute contrastive scores and labels without in-batch negatives.

Parameters:

query – Query embeddings [batch_size, hidden_dim]
key – Key/passage embeddings [batch_size * n_passages, hidden_dim]
current_train_n_passages – Number of passages per query

Returns:

Tuple of (scores, labels) where scores is [batch_size, n_passages] and labels is [batch_size] of zeros (positive is first passage)

nemo_automodel.components.models.biencoder.llama_bidirectional_model.pool( last_hidden_states: torch.Tensor, attention_mask: torch.Tensor, pool_type: str, ) → torch.Tensor#

Pool hidden states using the specified pooling method.

Parameters:

last_hidden_states – Hidden states from the model [batch_size, seq_len, hidden_size]
attention_mask – Attention mask [batch_size, seq_len]
pool_type – Type of pooling to apply

Returns:

Pooled embeddings [batch_size, hidden_size]

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalConfig(

pooling: str = 'avg',

temperature: float = 1.0,

**kwargs,

)#

Bases: transformers.models.llama.configuration_llama.LlamaConfig

Configuration class for LlamaBidirectionalModel.

Extends LlamaConfig with additional parameters for bidirectional attention and pooling configurations.

Initialization

Initialize LlamaBidirectionalConfig.

Parameters:

pooling – Pooling strategy (‘avg’, ‘cls’, ‘last’, etc.)
temperature – Temperature for scaling logits
**kwargs – Additional arguments passed to LlamaConfig

model_type#: ‘llama_bidirec’

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalModel( config: transformers.models.llama.configuration_llama.LlamaConfig, )#

Bases: transformers.models.llama.modeling_llama.LlamaModel

Llama Model with bidirectional attention.

This model removes causal masking from all attention layers, allowing tokens to attend to all other tokens in the sequence. This is useful for embedding and retrieval tasks where bidirectional context is beneficial.

Initialization

Initialize LlamaBidirectionalModel.

Parameters:: config – Model configuration

config_class#: None

_update_causal_mask(attention_mask: torch.Tensor)#

forward(

input_ids: Optional[torch.LongTensor] = None,

attention_mask: Optional[torch.Tensor] = None,

position_ids: Optional[torch.LongTensor] = None,

past_key_values: Optional[transformers.cache_utils.Cache] = None,

inputs_embeds: Optional[torch.FloatTensor] = None,

use_cache: Optional[bool] = None,

output_attentions: Optional[bool] = None,

output_hidden_states: Optional[bool] = None,

cache_position: Optional[torch.LongTensor] = None,

**flash_attn_kwargs,

) → transformers.modeling_outputs.BaseModelOutputWithPast#

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.LlamaBidirectionalForSequenceClassification(config)#

Bases: transformers.models.llama.modeling_llama.LlamaForSequenceClassification

Llama Bidirectional Model with a sequence classification/regression head.

This model adds a classification head on top of the bidirectional Llama model and includes configurable pooling strategies.

Initialization

Initialize LlamaBidirectionalForSequenceClassification.

Parameters:: config – Model configuration

config_class#: None

forward( input_ids: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[Union[transformers.cache_utils.Cache, List[torch.FloatTensor]]] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, ) → Union[Tuple, transformers.modeling_outputs.SequenceClassifierOutputWithPast]#

Forward pass for sequence classification.

Parameters:

input_ids – Input token IDs
attention_mask – Attention mask
position_ids – Position IDs
past_key_values – Past key values for generation
inputs_embeds – Input embeddings (alternative to input_ids)
labels – Labels for computing loss
use_cache – Whether to use cache
output_attentions – Whether to output attentions
output_hidden_states – Whether to output hidden states
return_dict – Whether to return a dict

Returns:

SequenceClassifierOutputWithPast with loss, logits, and optional outputs

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.BiencoderOutput#

Bases: transformers.modeling_outputs.ModelOutput

Output dataclass for biencoder model.

q_reps: Optional[torch.Tensor]#: None

p_reps: Optional[torch.Tensor]#: None

loss: Optional[torch.Tensor]#: None

labels: Optional[torch.Tensor]#: None

scores: Optional[torch.Tensor]#: None

class nemo_automodel.components.models.biencoder.llama_bidirectional_model.BiencoderModel( lm_q: transformers.PreTrainedModel, lm_p: transformers.PreTrainedModel, linear_pooler: torch.nn.Module = None, train_n_passages: int = 1, eval_negative_size: int = 0, pooling: str = 'avg', l2_normalize: bool = True, t: float = 1.0, share_encoder: bool = True, add_linear_pooler: bool = False, )#

Bases: torch.nn.Module

Biencoder Model with essential functions for training.

This model encodes queries and passages separately and computes contrastive loss.

Initialization

forward( query: Dict[str, torch.Tensor] = None, passage: Dict[str, torch.Tensor] = None, )#: Forward pass for training.

_encode( encoder: transformers.PreTrainedModel, input_dict: dict, ) → Optional[torch.Tensor]#: Encode input using the encoder.

_compute_scores( current_train_n_passages: int, query: Dict[str, torch.Tensor] = None, passage: Dict[str, torch.Tensor] = None, ) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]#: Compute similarity scores and labels.

classmethod build(

model_name_or_path: str,

share_encoder: bool = True,

add_linear_pooler: bool = False,

out_dimension: int = 768,

do_gradient_checkpointing: bool = False,

train_n_passages: int = 1,

eval_negative_size: int = 0,

pooling: str = 'avg',

l2_normalize: bool = True,

t: float = 1.0,

**hf_kwargs,

)#

Build biencoder model from pretrained.

Parameters:

model_name_or_path – Path to pretrained model or model identifier
share_encoder – Whether to share encoder weights between query and passage
add_linear_pooler – Whether to add a linear pooler layer
out_dimension – Output dimension for linear pooler
do_gradient_checkpointing – Whether to enable gradient checkpointing
train_n_passages – Number of passages per query during training
eval_negative_size – Number of negative samples during evaluation
pooling – Pooling strategy (‘avg’, ‘cls’, ‘last’, etc.)
l2_normalize – Whether to L2 normalize embeddings
t – Temperature for scaling similarity scores
**hf_kwargs – Additional arguments passed to model loading

save(output_dir: str)#: Save model to output directory.

nemo_automodel.components.models.biencoder.llama_bidirectional_model#

Module Contents#

Classes#

Functions#

Data#

API#

`nemo_automodel.components.models.biencoder.llama_bidirectional_model`#