HFAutoModelForImageTextToText#
A PyTorch Lightning module that wraps Hugging Face’s AutoModelForImageTextToText for seamless integration with NeMo Framework. This class facilitates training and evaluation of image-text-to-text models, providing functionalities such as loading pretrained weights, customizing loss functions, and handling processors.
Inheritance#
HFAutoModelForImageTextToText
inherits from:
- pytorch_lightning.LightningModule
- nemo.lightning.io.IOMixin
- nemo.collections.llm.fn.FNMixin
Initialization#
Constructor Parameters#
- model_namestr, optional
Name or path of the Hugging Face model to load. Default is ‘gpt2’.
- load_pretrained_weightsbool, optional
Whether to load pretrained weights from the specified model. Default is True.
- processortransformers.PreTrainedProcessor, optional
A Hugging Face processor instance. If not provided, it will be configured based on model_name.
- loss_fncallable, optional
Loss function to use during training. Defaults to masked_cross_entropy.
- model_transformcallable, optional
A function to apply transformations to the model after loading.
- trust_remote_codebool, optional
Whether to trust remote code when loading the model. Default is False.
- default_dtypetorch.dtype, optional
Default data type for the model. Default is torch.bfloat16.
- load_in_4bitbool, optional
Whether to load the model in 4-bit precision. Default is False.
Attributes#
- model_namestr
Name or path of the Hugging Face model.
- _processortransformers.PreTrainedProcessor or None
Processor instance for preprocessing inputs.
- tokenizertransformers.PreTrainedTokenizer or None
Tokenizer associated with the model.
- modeltransformers.PreTrainedModel or None
The underlying Hugging Face model.
- loss_fncallable
The loss function used for training.
- load_pretrained_weightsbool
Flag indicating whether pretrained weights are loaded.
- is_hf_modelbool
Indicates if the model is a Hugging Face model.
- model_transformcallable or None
Transformation function applied to the model.
- trust_remote_codebool
Flag indicating whether to trust remote code.
- load_in_4bitbool
Flag indicating whether to load the model in 4-bit precision.
Methods#
Detailed Method Descriptions#
processor#
Property
Returns the processor associated with the model. If not already set, it initializes the processor using the model_name.
Getter - Returns: transformers.PreTrainedProcessor
Setter - Parameters:
- valuetransformers.PreTrainedProcessor
The processor to set.
forward#
Signature: forward(batch)
Runs a forward pass through the model with the provided batch.
Parameters: - batch : dict
A batch of input data.
Returns: - outputs : transformers.model_outputs.ModelOutput
The model’s output.
training_step#
Signature: training_step(batch)
Executes a single training step.
Parameters: - batch : dict
A batch of input data.
Returns: - loss : torch.Tensor
The computed loss for the batch.
validation_step#
Signature: validation_step(batch, batch_idx)
Executes a single validation step.
Parameters: - batch : dict
A batch of input data.
- batch_idxint
Index of the batch.
Returns: - None
save_pretrained#
Signature: save_pretrained(path)
Saves the model and processor to the specified path using Hugging Face’s save_pretrained method.
Parameters: - path : str
Directory path where the model and processor will be saved.
Returns: - None
extract_skipped_token_ids#
Signature: extract_skipped_token_ids(tokenizer)
Identifies and returns token IDs that should be masked in the labels based on predefined special tokens.
Parameters: - tokenizer : transformers.PreTrainedTokenizer
The tokenizer to inspect for special tokens.
Returns: - skipped_token_ids : torch.IntTensor
Tensor containing the IDs of tokens to skip.
configure_model#
Signature: configure_model()
Initializes the Hugging Face model based on the provided configuration and parameters. Loads pretrained weights if specified.
Parameters: - None
Returns: - None
configure_processor#
Signature: configure_processor(model_name, trust_remote_code=False)
Initializes and returns a Hugging Face AutoProcessor based on the model name.
Parameters: - model_name : str
Name or path of the Hugging Face model.
- trust_remote_codebool, optional
Whether to trust remote code. Default is False.
Returns: - processor : transformers.PreTrainedProcessor
The initialized processor.
Utility Functions#
masked_cross_entropy#
Signature: masked_cross_entropy(logits, targets, mask=None)
Computes the cross-entropy loss with an optional mask to ignore certain tokens.
Parameters: - logits : torch.Tensor
Logits output from the model.
- targetstorch.Tensor
Ground truth target tokens.
- masktorch.Tensor or None, optional
Mask to apply to the loss. Tokens with mask=0 will be ignored.
Returns: - loss : torch.Tensor
The computed loss.
Example Usage#
from your_module import HFAutoModelForImageTextToText
# Initialize the model
model = HFAutoModelForImageTextToText(
model_name='gpt2',
load_pretrained_weights=True,
trust_remote_code=True,
load_in_4bit=True
)
# Configure the model
model.configure_model()
# Example training loop using PyTorch Lightning Trainer
from pytorch_lightning import Trainer
trainer = Trainer(max_epochs=3)
trainer.fit(model, train_dataloader, val_dataloader)
# Save the pretrained model
model.save_pretrained('/path/to/save')