Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
HFAutoModelForCausalLM Class#
Overview#
HFAutoModelForCausalLM is a PyTorch Lightning module designed to facilitate the integration and training of Hugging Face’s causal language models within the NeMo Framework. It leverages functionalities from lightning.pytorch, transformers, and NeMo’s utilities to provide a flexible and efficient setup for language model training and fine-tuning.
Inheritance#
Constructor#
Attributes#
- model_namestr
Name or path of the pre-trained model.
- _tokenizerAutoTokenizer or None
Tokenizer instance used for encoding and decoding text.
- modelAutoModelForCausalLM or None
The underlying Hugging Face causal language model.
- loss_fncallable
Loss function used for training.
- load_pretrained_weightsbool
Flag indicating whether to load pre-trained weights.
- is_hf_modelbool
Indicates if the model is a Hugging Face model.
- model_transformcallable or None
Optional transformation function applied to the model.
- model_acceleratorcallable or None
Accelerator function applied to the model.
- trust_remote_codebool
Flag to trust remote code when loading the model.
- default_dtypetorch.dtype
Default data type for model parameters.
- load_in_4bitbool
Flag to load the model in 4-bit precision.
- attn_implementationstr
Attention mechanism implementation used.
Properties#
- nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.tokenizer#
Returns the tokenizer instance. If not already set, it configures a tokenizer based on model_name.
Methods#
configure_tokenizer#
- static nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.configure_tokenizer(model_name, trust_remote_code=False)#
Configures and returns a tokenizer based on the given model_name.
Parameters#
- model_namestr
Name or path of the pre-trained tokenizer to load.
- trust_remote_codebool, optional
Whether to trust remote code when loading the tokenizer (default is False).
Returns#
- AutoTokenizer
Configured tokenizer instance.
configure_model#
Configures the underlying Hugging Face model. If load_pretrained_weights is True, it loads the model with pre-trained weights. Otherwise, it initializes the model from scratch using the specified configuration. Additionally, it applies Fully Sharded Data Parallel (FSDP) and tensor parallelism if device_mesh is provided and applies any specified model accelerators.
forward#
training_step#
- nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.training_step(batch, batch_idx=None)#
Defines the training step. Computes the loss using the specified loss function and logs the training loss.
Parameters#
- batchdict
A batch of training data.
- batch_idxint, optional
Index of the batch (default is None).
Returns#
- torch.Tensor
Computed loss for the batch.
validation_step#
save_pretrained#
_remove_extra_batch_keys#
- nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm._remove_extra_batch_keys(batch, reserved_keys=['labels', 'loss_mask'])#
Removes keys from the batch that are not required by the model’s forward method, except for reserved keys.
Parameters#
- batchdict
Dictionary of tensors representing a batch.
- reserved_keyslist of str, optional
Keys to retain in the batch regardless of the model’s forward method requirements (default is [‘labels’, ‘loss_mask’]).
Returns#
- dict
Filtered batch containing only the necessary keys.
Utility Functions#
- nemo.collections.llm.gpt.model.hf_auto_model_for_causal_lm.masked_cross_entropy(logits, targets, mask=None)#
Computes the cross-entropy loss, optionally applying a mask to the loss values.
Parameters#
- logitstorch.Tensor
Logits output from the model.
- targetstorch.Tensor
Ground truth target indices.
- masktorch.Tensor or None, optional
Mask to apply to the loss values (default is None).
Returns#
- torch.Tensor
Computed loss.
Usage Example#
```python from your_module_name import HFAutoModelForCausalLM
# Initialize the model model = HFAutoModelForCausalLM(
model_name=’gpt2’, load_pretrained_weights=True, trust_remote_code=False
)
# Configure the model (typically called internally) model.configure_model()
# Example training step batch = {
‘input_ids’: torch.tensor([[…]]), ‘labels’: torch.tensor([[…]]), ‘loss_mask’: torch.tensor([[…]])
} loss = model.training_step(batch)