Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

HFAutoModelForSpeechSeq2Seq#

A PyTorch Lightning module for speech sequence-to-sequence tasks using Hugging Face transformers and NeMo Framework.

Overview#

HFAutoModelForSpeechSeq2Seq is a versatile PyTorch Lightning module designed for speech-to-text and other sequence-to-sequence tasks. It leverages Hugging Face’s AutoModelForSpeechSeq2Seq and integrates seamlessly with NVIDIA NeMo’s utilities for enhanced functionality. The class supports loading pretrained models, custom tokenizers, and processors, and provides flexible configuration options for training and inference.

Initialization#

Attributes#

model_namestr

The name or path of the pretrained model.

_tokenizerOptional[AutoTokenizer]

The tokenizer instance. Initialized lazily.

_processorOptional[AutoProcessor]

The processor instance for handling input features. Initialized lazily.

modelOptional[AutoModelForSpeechSeq2Seq]

The underlying Hugging Face model.

loss_fnCallable

The loss function used for training.

load_pretrained_weightsbool

Flag indicating whether to load pretrained weights.

is_hf_modelbool

Flag indicating if the model is a Hugging Face model.

model_transformOptional[Any]

Transformation applied to the model.

model_acceleratorOptional[Any]

Accelerator configuration for the model.

trust_remote_codebool

Flag indicating whether to trust remote code for model loading.

Properties#

nemo.collections.speechlm.models.hf_auto_model_for_speech_seq2seq.tokenizer#

The tokenizer used for encoding and decoding. Initialized on first access.

nemo.collections.speechlm.models.hf_auto_model_for_speech_seq2seq.processor#

The processor for handling input features. Initialized on first access.

Methods#

Function: masked_cross_entropy#

nemo.collections.speechlm.models.hf_auto_model_for_speech_seq2seq.masked_cross_entropy()#

Computes the masked cross-entropy loss.

Parameters#

logitstorch.Tensor

The predicted logits from the model.

targetstorch.Tensor

The target labels.

maskOptional[torch.Tensor], optional

A mask to apply to the loss computation (default is None).

Returns#

torch.Tensor

The computed loss.

Usage Example#

 1import lightning.pytorch as pl
 2from your_module_name import HFAutoModelForSpeechSeq2Seq
 3
 4# Initialize the model
 5model = HFAutoModelForSpeechSeq2Seq(
 6  model_name='facebook/wav2vec2-base-960h',
 7  load_pretrained_weights=True,
 8  trust_remote_code=True
 9)
10
11# Set up the trainer
12trainer = pl.Trainer(max_epochs=10, gpus=1)
13
14# Train the model
15trainer.fit(model, train_dataloader, val_dataloader)
16
17# Save the trained model
18model.save_pretrained('path/to/save')