Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Changing Embedding Type

This section shows an example of changing the embedding type for T5 models.

Assume you want to train a 220M T5 model. Instead of using the default absolute learnable position embeddings, you want to use relative position embeddings.

First, you may want to check the training configuration file in conf/training/<model_type>/<model_size>.yaml. In this case it is conf/training/t5/220m.yaml. This file contains all supported configurations. In this case the configurations of interest are:

position_embedding_type: 'learned_absolute' # Position embedding type. Options ['learned_absolute', 'relative']
relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias
relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.

For Slurm-based systems, you can directly modify the configuration file in line. In this case, you change above three lines into:

position_embedding_type: 'relative' # Position embedding type. Options ['learned_absolute', 'relative']
relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias
relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.

Submit the training job with the modified configuration file.

For BCP, you can override the default configurations by adding the argument training.model.position_embedding_type='relative' when you submit the training job.

For more details on submitting training jobs on Slurm and BCP systems, see Model Training.