HFAutoModelForImageTextToText#

A PyTorch Lightning module that wraps Hugging Face’s AutoModelForImageTextToText for seamless integration with NeMo Framework. This class facilitates training and evaluation of image-text-to-text models, providing functionalities such as loading pretrained weights, customizing loss functions, and handling processors.

Inheritance#

HFAutoModelForImageTextToText inherits from: - pytorch_lightning.LightningModule - nemo.lightning.io.IOMixin - nemo.collections.llm.fn.FNMixin

Initialization#

Constructor Parameters#

model_namestr, optional

Name or path of the Hugging Face model to load. Default is ‘gpt2’.

load_pretrained_weightsbool, optional

Whether to load pretrained weights from the specified model. Default is True.

processortransformers.PreTrainedProcessor, optional

A Hugging Face processor instance. If not provided, it will be configured based on model_name.

loss_fncallable, optional

Loss function to use during training. Defaults to masked_cross_entropy.

model_transformcallable, optional

A function to apply transformations to the model after loading.

trust_remote_codebool, optional

Whether to trust remote code when loading the model. Default is False.

default_dtypetorch.dtype, optional

Default data type for the model. Default is torch.bfloat16.

load_in_4bitbool, optional

Whether to load the model in 4-bit precision. Default is False.

Attributes#

model_namestr

Name or path of the Hugging Face model.

_processortransformers.PreTrainedProcessor or None

Processor instance for preprocessing inputs.

tokenizertransformers.PreTrainedTokenizer or None

Tokenizer associated with the model.

modeltransformers.PreTrainedModel or None

The underlying Hugging Face model.

loss_fncallable

The loss function used for training.

load_pretrained_weightsbool

Flag indicating whether pretrained weights are loaded.

is_hf_modelbool

Indicates if the model is a Hugging Face model.

model_transformcallable or None

Transformation function applied to the model.

trust_remote_codebool

Flag indicating whether to trust remote code.

load_in_4bitbool

Flag indicating whether to load the model in 4-bit precision.

Methods#

Detailed Method Descriptions#

processor#

Property

Returns the processor associated with the model. If not already set, it initializes the processor using the model_name.

Getter - Returns: transformers.PreTrainedProcessor

Setter - Parameters:

  • valuetransformers.PreTrainedProcessor

    The processor to set.

forward#

Signature: forward(batch)

Runs a forward pass through the model with the provided batch.

Parameters: - batch : dict

A batch of input data.

Returns: - outputs : transformers.model_outputs.ModelOutput

The model’s output.

training_step#

Signature: training_step(batch)

Executes a single training step.

Parameters: - batch : dict

A batch of input data.

Returns: - loss : torch.Tensor

The computed loss for the batch.

validation_step#

Signature: validation_step(batch, batch_idx)

Executes a single validation step.

Parameters: - batch : dict

A batch of input data.

  • batch_idxint

    Index of the batch.

Returns: - None

save_pretrained#

Signature: save_pretrained(path)

Saves the model and processor to the specified path using Hugging Face’s save_pretrained method.

Parameters: - path : str

Directory path where the model and processor will be saved.

Returns: - None

extract_skipped_token_ids#

Signature: extract_skipped_token_ids(tokenizer)

Identifies and returns token IDs that should be masked in the labels based on predefined special tokens.

Parameters: - tokenizer : transformers.PreTrainedTokenizer

The tokenizer to inspect for special tokens.

Returns: - skipped_token_ids : torch.IntTensor

Tensor containing the IDs of tokens to skip.

configure_model#

Signature: configure_model()

Initializes the Hugging Face model based on the provided configuration and parameters. Loads pretrained weights if specified.

Parameters: - None

Returns: - None

configure_processor#

Signature: configure_processor(model_name, trust_remote_code=False)

Initializes and returns a Hugging Face AutoProcessor based on the model name.

Parameters: - model_name : str

Name or path of the Hugging Face model.

  • trust_remote_codebool, optional

    Whether to trust remote code. Default is False.

Returns: - processor : transformers.PreTrainedProcessor

The initialized processor.

Utility Functions#

masked_cross_entropy#

Signature: masked_cross_entropy(logits, targets, mask=None)

Computes the cross-entropy loss with an optional mask to ignore certain tokens.

Parameters: - logits : torch.Tensor

Logits output from the model.

  • targetstorch.Tensor

    Ground truth target tokens.

  • masktorch.Tensor or None, optional

    Mask to apply to the loss. Tokens with mask=0 will be ignored.

Returns: - loss : torch.Tensor

The computed loss.

Example Usage#

from your_module import HFAutoModelForImageTextToText

# Initialize the model
model = HFAutoModelForImageTextToText(
    model_name='gpt2',
    load_pretrained_weights=True,
    trust_remote_code=True,
    load_in_4bit=True
)

# Configure the model
model.configure_model()

# Example training loop using PyTorch Lightning Trainer
from pytorch_lightning import Trainer

trainer = Trainer(max_epochs=3)
trainer.fit(model, train_dataloader, val_dataloader)

# Save the pretrained model
model.save_pretrained('/path/to/save')