Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Changing Embedding Type
This section shows an example of changing the embedding type for T5 models.
Assume you want to train a 220M T5 model. Instead of using the default absolute learnable position embeddings, you want to use relative position embeddings.
First, you may want to check the training configuration file in
conf/training/<model_type>/<model_size>.yaml
. In this case it is
conf/training/t5/220m.yaml
. This file contains
all supported configurations. In this case the configurations of interest are:
position_embedding_type: 'learned_absolute' # Position embedding type. Options ['learned_absolute', 'relative']
relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias
relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.
For Slurm-based systems, you can directly modify the configuration file in line. In this case, you change above three lines into:
position_embedding_type: 'relative' # Position embedding type. Options ['learned_absolute', 'relative']
relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias
relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.
Submit the training job with the modified configuration file.
For BCP, you can override the default configurations by adding the argument
training.model.position_embedding_type='relative'
when you submit
the training job.
For more details on submitting training jobs on Slurm and BCP systems, see Model Training.