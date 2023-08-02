Example of config spec for training train.yaml file. You can change any of these parameters and pass them to the training command.

Copy Copied! trainer: max_epochs: 100 model: # Labels that will be used to "decode" predictions. class_labels: class_labels_file : null # optional to specify a file containing the list of the labels tokenizer: tokenizer_name: ${model.language_model.pretrained_model_name} # or sentencepiece vocab_file: null # path to vocab file tokenizer_model: null # only used if tokenizer is sentencepiece special_tokens: null language_model: pretrained_model_name: bert-base-uncased lm_checkpoint: null config_file: null # json file, precedence over config config: null classifier_head: # This comes directly from number of labels/target classes. num_output_layers: 2 fc_dropout: 0.1 training_ds: file_path: ??? batch_size: 64 shuffle: true num_samples: -1 # number of samples to be considered, -1 means all the dataset num_workers: 3 drop_last: false pin_memory: false validation_ds: file_path: ??? batch_size: 64 shuffle: false num_samples: -1 # number of samples to be considered, -1 means all the dataset num_workers: 3 drop_last: false pin_memory: false optim: name: adam lr: 2e-5 # optimizer arguments betas: [0.9, 0.999] weight_decay: 0.001 # scheduler setup sched: name: WarmupAnnealing # Scheduler params warmup_steps: null warmup_ratio: 0.1 last_epoch: -1 # pytorch lightning args monitor: val_loss reduce_on_plateau: false

Example of the command for training the model on four GPUs for 50 epochs:

Copy Copied! tao text_classification train -e /specs/nlp/text_classification/train.yaml \ training_ds.file_path=PATH_TO_TRAIN_FILE \ trainer.max_epochs=50 \ -g 4 \ -k $KEY

By default, the final model after training is done is saved in ‘trained-model.tlt’.

-e : The experiment specification file to set up training.

training_ds.file_path : Path to the training ‘.tsv’ file

-k : Encryption key

trainer.max_epochs : Training epochs number.

-g : Number of GPUs to use for training

Other arguments to override fields in the specification file.

The following table lists some of the parameters you may use in the config files and set them from command line when training a model:

Parameter Data Type Default Description model.class_labels.class_labels_file string null Path to an optional file containing the labels; each line is the string label corresponding to a label model.intent_loss_weight float 0.6 Relation of intent to slot loss in total loss model.tokenizer.tokenizer_name string Will be filled automatically based on model.language_model.pretrained_model_name Tokenizer name model.tokenizer.vocab_file string null Path to tokenizer vocabulary model.tokenizer.tokenizer_model string null Path to tokenizer model (only for sentencepiece tokenizer) model.tokenizer.special_tokens string null Special tokens of the tokenizer if it exists model.language_model.max_seq_length integer 50 Maximal length of the input queries (in tokens) model.language_model.pretrained_model_name string bert-base-uncased Pre-trained language model name (choose from bert-base-cased , bert-base-uncased ,

megatron_bert_345m_uncased , distilbert-base-uncased and biomegatron-bert-345m-uncased model.language_model.lm_checkpoint string null Path to the pre-trained language model checkpoint model.language_model.config_file string null Path to the pre-trained language model config file model.language_model.config dictionary null Config of the pre-trained language model model.head.num_output_layers integer 2 Number of fully connected layers of the Classifier on top of Bert model model.head.fc_dropout float 0.1 Dropout ratio of the fully connected layers {training,validation,test}_ds.file_path string ?? Path of the training ‘.tsv file {training,validation,test}_ds.batch_size integer 32 Data loader’s batch size {training,validation,test}_ds.num_workers integer 2 Number of worker threads for data loader {training,validation,test}_ds.shuffle boolean true (training), false (test and validation) Shuffles data for each epoch {training,validation,test}_ds.drop_last boolean false Specifies if last batch of data needs to get dropped if it is smaller than batch size {training,validation,test}_ds.pin_memory boolean false Enables pin_memory of PyTorch’s data loader to enhance speed {training,validation,test}_ds.num_samples integer -1 Number of samples to be used from the dataset; -1 means all samples optim.name string adam Optimizer to use for training optim.lr float 2e-5 Learning rate to use for training optim.weight_decay float 0.01 Weight decay to use for training optim.sched.name string WarmupAnnealing Warmup schedule optim.sched.warmup_ratio float 0.1 Warmup ratio

At the start of each training experiment, TAO Toolkit will print out a log of the experiment specification, including any parameters added or overridden via the command line. It will also show additional information, such as which GPUs are available and where logs will be saved. Then it shows some samples from the datasets with their corresponding inputs to the model.

Copy Copied! GPU available: True, used: True TPU available: None, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3] [NeMo W 2021-01-20 19:49:30 exp_manager:304] There was no checkpoint folder at checkpoint_dir :/home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_19-49-30/checkpoints. Training from scratch. [NeMo I 2021-01-20 19:49:30 exp_manager:194] Experiments will be logged at /home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_19-49-30

Then for each dataset provided, it shows some samples from the dataset with their corresponding inputs to the model. It also provides some stats on the lengths of sequences in the dataset.

Copy Copied! [NeMo I 2021-01-20 19:49:36 text_classification_dataset:120] Read 67350 examples from ../data/SST-2/train.tsv. [NeMo I 2021-01-20 19:49:37 text_classification_dataset:233] *** Example *** [NeMo I 2021-01-20 19:49:37 text_classification_dataset:234] example 0: ['girl-meets-girl', 'romantic', 'comedy'] [NeMo I 2021-01-20 19:49:37 text_classification_dataset:235] subtokens: [CLS] girl - meets - girl romantic comedy [SEP] [NeMo I 2021-01-20 19:49:37 text_classification_dataset:236] input_ids: 101 2611 1011 6010 1011 2611 6298 4038 102 [NeMo I 2021-01-20 19:49:37 text_classification_dataset:237] segment_ids: 0 0 0 0 0 0 0 0 0 [NeMo I 2021-01-20 19:49:37 text_classification_dataset:238] input_mask: 1 1 1 1 1 1 1 1 1 [NeMo I 2021-01-20 19:49:37 text_classification_dataset:239] label: 1

Before training starts, information on the optimizer and scheduler will be shown in the logs:

Copy Copied! [NeMo I 2021-01-20 19:50:19 modelPT:830] Optimizer config = Adam ( Parameter Group 0 amsgrad: False betas: [0.9, 0.999] eps: 1e-08 lr: 2e-05 weight_decay: 0.01 ) [NeMo I 2021-01-20 19:50:19 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.WarmupAnnealing object at 0x7fcd2232b160>" will be used during training (effective maximum steps = 1053) - Parameters : (warmup_steps: null warmup_ratio: 0.1 last_epoch: -1 max_steps: 1053 )

You should next see a full printout of the number of parameters in each module and submodule, as well as the total number of trainable and non-trainable parameters in the model. For example, this model has 100M parameters in total:

Copy Copied! | Name | Type | Params -------------------------------------------------------------------------------------------------- 0 | bert_model | BertEncoder | 109 M 1 | bert_model.embeddings | BertEmbeddings | 23.8 M 2 | bert_model.embeddings.word_embeddings | Embedding | 23.4 M 3 | bert_model.embeddings.position_embeddings | Embedding | 393 K 4 | bert_model.embeddings.token_type_embeddings | Embedding | 1.5 K 5 | bert_model.embeddings.LayerNorm | LayerNorm | 1.5 K 6 | bert_model.embeddings.dropout | Dropout | 0 7 | bert_model.encoder | BertEncoder | 85.1 M 8 | bert_model.encoder.layer | ModuleList | 85.1 M 9 | bert_model.encoder.layer.0 | BertLayer | 7.1 M 10 | bert_model.encoder.layer.0.attention | BertAttention | 2.4 M 11 | bert_model.encoder.layer.0.attention.self | BertSelfAttention | 1.8 M 12 | bert_model.encoder.layer.0.attention.self.query | Linear | 590 K ... 212 | bert_model.encoder.layer.11.output.dropout | Dropout | 0 213 | bert_model.pooler | BertPooler | 590 K 214 | bert_model.pooler.dense | Linear | 590 K 215 | bert_model.pooler.activation | Tanh | 0 216 | classifier | SequenceClassifier | 592 K 217 | classifier.dropout | Dropout | 0 218 | classifier.mlp | MultiLayerPerceptron | 592 K 219 | classifier.mlp.layer0 | Linear | 590 K 220 | classifier.mlp.layer2 | Linear | 1.5 K 221 | loss | CrossEntropyLoss | 0 222 | classification_report | ClassificationReport | 0 -------------------------------------------------------------------------------------------------- 110 M Trainable params 0 Non-trainable params 110 M Total params

As the model starts training, you should see a progress bar per epoch.

Copy Copied! Epoch 0: 100%|████████████████████████████| 1067/1067 [03:10<00:00, 5.60it/s, loss=0.252, val_loss=0.258, Epoch 0, global step 1052: val_loss reached 0.25792 (best 0.25792), saving model to "/home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_20-19-44/checkpoints/trained-model---val_loss=0.26-epoch=0.ckpt" as top 3 Epoch 1: 100%|████████████████████████████| 1067/1067 [03:10<00:00, 5.60it/s, loss=0.187, val_loss=0.245, Epoch 1, global step 2105: val_loss reached 0.24499 (best 0.24499), saving model to "/home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_20-19-44/checkpoints/trained-model---val_loss=0.24-epoch=1.ckpt" as top 3 Epoch 2: 100%|████████████████████████████| 1067/1067 [03:09<00:00, 5.62it/s, loss=0.158, val_loss=0.235, Epoch 2, global step 3158: val_loss reached 0.23505 (best 0.23505), saving model to "/home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_20-19-44/checkpoints/trained-model---val_loss=0.24-epoch=2.ckpt" as top 3 ...

After each epoch, you should see a summary table of metrics on the validation set.

Copy Copied! Validating: 100%|████████████████████████████| 14/14 [00:00<00:00, 13.94it/s] [NeMo I 2021-01-20 19:53:32 text_classification_model:173] val_report: label precision recall f1 support label_id: 0 91.97 88.32 90.11 428 label_id: 1 89.15 92.57 90.83 444 ------------------- micro avg 90.48 90.48 90.48 872 macro avg 90.56 90.44 90.47 872 weighted avg 90.54 90.48 90.47 872

At the end of training, TAO Toolkit will save the last checkpoint at the path specified by the experiment spec file before finishing.

Copy Copied! Saving latest checkpoint... [NeMo I 2021-01-20 21:09:39 train:124] Experiment logs saved to '/home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_21-06-17' [NeMo I 2021-01-20 21:09:39 train:127] Trained model saved to '/home/user/tao-toolkit-pyt/nlp/text_classification/entrypoint/nemo_experiments/trained-model/2021-01-20_21-06-17/checkpoints/trained-model.tlt'

The output logs for the evaluation and fine-tuning look similar.