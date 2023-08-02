The following is an example of the config spec for training train.yaml file. You can change any of these parameters and pass them to the train command.

Copy Copied! model: intermediate: True order: 2 pruning: - 0 training_ds: is_tarred: false is_file: true data_dir: ??? validation_ds: is_tarred: false is_file: true data_dir: ??? vocab_file: "" encryption_key: "tlt_encode"

Parameter Data Type Default Description training_ds.data_dir string – Path to dataset file. model.order int – Order of N-Gram model (maximum number of grams) vocab_file string – Optional path to vocab file to limit vocabulary learned by model. model.intermediate boolean true Choose from [true,false]. If True, creates intermediate file - required for finetune and interpolate model.pruning list[int] [0] Prune grams with counts less than or equal to threshold provided for each gram. Non-decreasing. Starts with 0 export_to string – The path to the trained .tlt model

The following is an example of the command for training the model:

Copy Copied! !tao n_gram train -e /specs/nlp/lm/n_gra/train.yaml \ training_ds.data_dir=PATH_TO_DATA \ model.order=4 \ model.pruning=[0,1,1,3] \ -k $KEY

-e : The experiment-specification file to set up training

model.order : Model order

training_ds.data_dir : The dataset directory

-k : The encryption key

model.intermediate : If true, saves intermediate file format as well

model.pruning : List of pruning thresholds for each gram order, ascending in order. Must be non-decreasing and start with 0.

At the start of evaluation, TAO Toolkit will print out a log of the experiment specification, a summary of the training dataset, and the model parameters.

As the model starts training, you will see a progress bar. At the end of training, TAO Toolkit will save the model ARPA to the results directory.