Changing Embedding Type

This section shows an example of changing the embedding type for T5 models.

Assume you want to train a 220M T5 model. Instead of using the default absolute learnable position embeddings, you want to use relative position embeddings.

First, you may want to check the training configuration file in conf/training/<model_type>/<model_size>.yaml. In this case it is conf/training/t5/220m.yaml. This file contains all supported configurations. In this case the configurations of interest are:

Copy
Copied!
            

position_embedding_type: 'learned_absolute' # Position embedding type. Options ['learned_absolute', 'relative'] relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.

For Slurm-based systems, you can directly modify the configuration file in line. In this case, you change above three lines into:

Copy
Copied!
            

position_embedding_type: 'relative' # Position embedding type. Options ['learned_absolute', 'relative'] relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.

Submit the training job with the modified configuration file.

For BCP, you can override the default configurations by adding the argument training.model.position_embedding_type='relative' when you submit the training job.

For more details on submitting training jobs on Slurm and BCP systems, see Model Training.

Previous Training with Predefined Configurations
Next Checkpoint Conversion
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.