Training with Custom Configurations

You can modify the training configuration files, or create completely new files for training. New files must follow the same structure and guidelines as the existing model configurations.

This section shows an example of changing the embedding type for T5 models.

Assume you want to train a 220M T5 model. Instead of using the default absolute learnable position embeddings, you want to use relative position embeddings.

First, you may want to check the training configuration file in conf/training/<model_type>/<model_size>.yaml. In this case it is conf/training/t5/220m.yaml. This file contains all supported configurations. In this case the configurations of interest are:

Copy
Copied!
            

position_embedding_type: 'learned_absolute' # Position embedding type. Options ['learned_absolute', 'relative'] relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.

For Slurm-based systems, you can directly modify the configuration file in line. In this case, you change above three lines into:

Copy
Copied!
            

position_embedding_type: 'relative' # Position embedding type. Options ['learned_absolute', 'relative'] relative_attention_num_buckets: 32 # Relative position number of buckets for computing the bias relative_attention_max_distance: 128 # max_distance to keep relative distance in the attention_num_buckets.

Submit the training job with the modified configuration file.

For BCP, you can override the default configurations by adding the argument training.model.position_embedding_type='relative' when you submit the training job.

For more details on submitting training jobs on Slurm and BCP systems, see Model Training.

© Copyright 2023, NVIDIA. Last updated on Nov 14, 2023.