Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
HFAutoModelForImageTextToText#
A PyTorch Lightning module that wraps Hugging Face’s AutoModelForImageTextToText for seamless integration with NeMo Framework. This class facilitates training and evaluation of image-text-to-text models, providing functionalities such as loading pretrained weights, customizing loss functions, and handling processors.
Inheritance#
HFAutoModelForImageTextToText
inherits from:
- pytorch_lightning.LightningModule
- nemo.lightning.io.IOMixin
- nemo.collections.llm.fn.FNMixin
Initialization#
Constructor Parameters#
- model_namestr, optional
Name or path of the Hugging Face model to load. Default is ‘gpt2’.
- load_pretrained_weightsbool, optional
Whether to load pretrained weights from the specified model. Default is True.
- processortransformers.PreTrainedProcessor, optional
A Hugging Face processor instance. If not provided, it will be configured based on model_name.
- loss_fncallable, optional
Loss function to use during training. Defaults to masked_cross_entropy.
- model_transformcallable, optional
A function to apply transformations to the model after loading.
- trust_remote_codebool, optional
Whether to trust remote code when loading the model. Default is False.
- default_dtypetorch.dtype, optional
Default data type for the model. Default is torch.bfloat16.
- load_in_4bitbool, optional
Whether to load the model in 4-bit precision. Default is False.
Attributes#
- model_namestr
Name or path of the Hugging Face model.
- _processortransformers.PreTrainedProcessor or None
Processor instance for preprocessing inputs.
- tokenizertransformers.PreTrainedTokenizer or None
Tokenizer associated with the model.
- modeltransformers.PreTrainedModel or None
The underlying Hugging Face model.
- loss_fncallable
The loss function used for training.
- load_pretrained_weightsbool
Flag indicating whether pretrained weights are loaded.
- is_hf_modelbool
Indicates if the model is a Hugging Face model.
- model_transformcallable or None
Transformation function applied to the model.
- trust_remote_codebool
Flag indicating whether to trust remote code.
- load_in_4bitbool
Flag indicating whether to load the model in 4-bit precision.
Methods#
Detailed Method Descriptions#
processor#
Property
Returns the processor associated with the model. If not already set, it initializes the processor using the model_name.
Getter - Returns: transformers.PreTrainedProcessor
Setter - Parameters:
- valuetransformers.PreTrainedProcessor
The processor to set.
forward#
Signature: forward(batch)
Runs a forward pass through the model with the provided batch.
Parameters: - batch : dict
A batch of input data.
Returns: - outputs : transformers.model_outputs.ModelOutput
The model’s output.
training_step#
Signature: training_step(batch)
Executes a single training step.
Parameters: - batch : dict
A batch of input data.
Returns: - loss : torch.Tensor
The computed loss for the batch.
validation_step#
Signature: validation_step(batch, batch_idx)
Executes a single validation step.
Parameters: - batch : dict
A batch of input data.
- batch_idxint
Index of the batch.
Returns: - None
save_pretrained#
Signature: save_pretrained(path)
Saves the model and processor to the specified path using Hugging Face’s save_pretrained method.
Parameters: - path : str
Directory path where the model and processor will be saved.
Returns: - None
extract_skipped_token_ids#
Signature: extract_skipped_token_ids(tokenizer)
Identifies and returns token IDs that should be masked in the labels based on predefined special tokens.
Parameters: - tokenizer : transformers.PreTrainedTokenizer
The tokenizer to inspect for special tokens.
Returns: - skipped_token_ids : torch.IntTensor
Tensor containing the IDs of tokens to skip.
configure_model#
Signature: configure_model()
Initializes the Hugging Face model based on the provided configuration and parameters. Loads pretrained weights if specified.
Parameters: - None
Returns: - None
configure_processor#
Signature: configure_processor(model_name, trust_remote_code=False)
Initializes and returns a Hugging Face AutoProcessor based on the model name.
Parameters: - model_name : str
Name or path of the Hugging Face model.
- trust_remote_codebool, optional
Whether to trust remote code. Default is False.
Returns: - processor : transformers.PreTrainedProcessor
The initialized processor.
Utility Functions#
masked_cross_entropy#
Signature: masked_cross_entropy(logits, targets, mask=None)
Computes the cross-entropy loss with an optional mask to ignore certain tokens.
Parameters: - logits : torch.Tensor
Logits output from the model.
- targetstorch.Tensor
Ground truth target tokens.
- masktorch.Tensor or None, optional
Mask to apply to the loss. Tokens with mask=0 will be ignored.
Returns: - loss : torch.Tensor
The computed loss.
Example Usage#
from your_module import HFAutoModelForImageTextToText
# Initialize the model
model = HFAutoModelForImageTextToText(
model_name='gpt2',
load_pretrained_weights=True,
trust_remote_code=True,
load_in_4bit=True
)
# Configure the model
model.configure_model()
# Example training loop using PyTorch Lightning Trainer
from pytorch_lightning import Trainer
trainer = Trainer(max_epochs=3)
trainer.fit(model, train_dataloader, val_dataloader)
# Save the pretrained model
model.save_pretrained('/path/to/save')