Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

HFAutoModelForCausalLM Class#

Overview#

HFAutoModelForCausalLM is a PyTorch Lightning module designed to facilitate the integration and training of Hugging Face’s causal language models within the NeMo Framework. It leverages functionalities from lightning.pytorch, transformers, and NeMo’s utilities to provide a flexible and efficient setup for language model training and fine-tuning.

Inheritance#

Constructor#

Attributes#

model_namestr: Name or path of the pre-trained model.
_tokenizerAutoTokenizer or None: Tokenizer instance used for encoding and decoding text.
modelAutoModelForCausalLM or None: The underlying Hugging Face causal language model.
loss_fncallable: Loss function used for training.
load_pretrained_weightsbool: Flag indicating whether to load pre-trained weights.
is_hf_modelbool: Indicates if the model is a Hugging Face model.
model_transformcallable or None: Optional transformation function applied to the model.
model_acceleratorcallable or None: Accelerator function applied to the model.
trust_remote_codebool: Flag to trust remote code when loading the model.
default_dtypetorch.dtype: Default data type for model parameters.
load_in_4bitbool: Flag to load the model in 4-bit precision.
attn_implementationstr: Attention mechanism implementation used.

Properties#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.tokenizer#: Returns the tokenizer instance. If not already set, it configures a tokenizer based on model_name.

Methods#

configure_tokenizer#

static nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.configure_tokenizer(model_name, trust_remote_code=False)#

Configures and returns a tokenizer based on the given model_name.

Parameters#

model_namestr: Name or path of the pre-trained tokenizer to load.
trust_remote_codebool, optional: Whether to trust remote code when loading the tokenizer (default is False).

Returns#

AutoTokenizer: Configured tokenizer instance.

configure_model#

Configures the underlying Hugging Face model. If load_pretrained_weights is True, it loads the model with pre-trained weights. Otherwise, it initializes the model from scratch using the specified configuration. Additionally, it applies Fully Sharded Data Parallel (FSDP) and tensor parallelism if device_mesh is provided and applies any specified model accelerators.

forward#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.forward(batch)#

Performs a forward pass through the model.

Parameters#

batchdict: A batch of input data containing necessary inputs for the model.

Returns#

transformers.modeling_outputs.CausalLMOutput: The output from the causal language model.

training_step#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.training_step(batch, batch_idx=None)#

Defines the training step. Computes the loss using the specified loss function and logs the training loss.

Parameters#

batchdict: A batch of training data.
batch_idxint, optional: Index of the batch (default is None).

Returns#

torch.Tensor: Computed loss for the batch.

validation_step#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.validation_step(batch, batch_idx)#

Defines the validation step. Computes and logs the validation loss.

Parameters#

batchdict: A batch of validation data.
batch_idxint: Index of the batch.

save_pretrained#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.save_pretrained(path)#

Saves the model and tokenizer to the specified directory.

Parameters#

pathstr: Directory path where the model and tokenizer will be saved.

_remove_extra_batch_keys#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm._remove_extra_batch_keys(batch, reserved_keys=['labels', 'loss_mask'])#

Removes keys from the batch that are not required by the model’s forward method, except for reserved keys.

Parameters#

batchdict: Dictionary of tensors representing a batch.
reserved_keyslist of str, optional: Keys to retain in the batch regardless of the model’s forward method requirements (default is [‘labels’, ‘loss_mask’]).

Returns#

dict: Filtered batch containing only the necessary keys.

Utility Functions#

nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.masked_cross_entropy(logits, targets, mask=None)#

Computes the cross-entropy loss, optionally applying a mask to the loss values.

Parameters#

logitstorch.Tensor: Logits output from the model.
targetstorch.Tensor: Ground truth target indices.
masktorch.Tensor or None, optional: Mask to apply to the loss values (default is None).

Returns#

torch.Tensor: Computed loss.

Usage Example#

```python from your_module_name import HFAutoModelForCausalLM

# Initialize the model model = HFAutoModelForCausalLM(

model_name=’gpt2’, load_pretrained_weights=True, trust_remote_code=False

)

# Configure the model (typically called internally) model.configure_model()

# Example training step batch = {

‘input_ids’: torch.tensor([[…]]), ‘labels’: torch.tensor([[…]]), ‘loss_mask’: torch.tensor([[…]])

} loss = model.training_step(batch)