Neural Rescoring#
When using the neural rescoring approach, a neural network is used to score candidates. A candidate is the text transcript predicted by the ASR model’s decoder. The top K candidates produced by beam search decoding (with a beam width of K) are given to a neural language model for ranking. The language model assigns a score to each candidate, which is usually combined with the scores from beam search decoding to produce the final scores and rankings.
Train Neural Rescorer#
An example script to train such a language model with Transformer can be found at examples/nlp/language_modeling/transformer_lm.py.
It trains a TransformerLMModel
which can be used as a neural rescorer for an ASR system. For more information on language models training, see LLM/NLP documentation.
You can also use a pretrained language model from the Hugging Face library, such as Transformer-XL and GPT, instead of training your model. Models like BERT and RoBERTa are not supported by this script because they are trained as Masked Language Models. As a result, they are not efficient or effective for scoring sentences out of the box.
Evaluation#
Given a trained TransformerLMModel .nemo file or a pretrained HF model, the script available at scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py can be used to re-score beams obtained with ASR model. You need the .tsv file containing the candidates produced by the acoustic model and the beam search decoding to use this script. The candidates can be the result of just the beam search decoding or the result of fusion with an N-gram LM. You can generate this file by specifying –preds_output_folder for scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py.
The neural rescorer would rescore the beams/candidates by using two parameters of rescorer_alpha and rescorer_beta, as follows:
final_score = beam_search_score + rescorer_alpha*neural_rescorer_score + rescorer_beta*seq_length
The parameter rescorer_alpha specifies the importance placed on the neural rescorer model, while rescorer_beta is a penalty term that accounts for sequence length in the scores. These parameters have similar effects to beam_alpha and beam_beta in the beam search decoder and N-gram language model.
Use the following steps to evaluate a neural LM:
Obtain .tsv file with beams and their corresponding scores. Scores can be from a regular beam search decoder or in fusion with an N-gram LM scores. For a given beam size beam_size and a number of examples for evaluation num_eval_examples, it should contain (num_eval_examples x beam_size) lines of form beam_candidate_text t score. This file can be generated by scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py
Rescore the candidates by scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py.
python eval_neural_rescorer.py
--lm_model=[path to .nemo file of the LM or the name of a HF pretrained model]
--beams_file=[path to beams .tsv file]
--beam_size=[size of the beams]
--eval_manifest=[path to eval manifest .json file]
--batch_size=[batch size used for inference on the LM model]
--alpha=[the value for the parameter rescorer_alpha]
--beta=[the value for the parameter rescorer_beta]
--scores_output_file=[the optional path to store the rescored candidates]
The candidates, along with their new scores, are stored at the file specified by –scores_output_file.
The following is the list of the arguments for the evaluation script:
Argument |
Type |
Default |
Description |
lm_model |
str |
Required |
The path of the ‘.nemo’ file of an ASR model, or the name of a Hugging Face pretrained model like ‘transfo-xl-wt103’ or ‘gpt2’. |
eval_manifest |
str |
Required |
Path to the evaluation manifest file (.json manifest file). |
beams_file |
str |
Required |
Path to beams file (.tsv) containing the candidates and their scores. |
beam_size |
int |
Required |
The width of the beams (number of candidates) generated by the decoder. |
alpha |
float |
None |
The value for parameter rescorer_alpha Not passing value would enable linear search for rescorer_alpha. |
beta |
float |
None |
The value for parameter rescorer_beta Not passing value would enable linear search for rescorer_beta. |
batch_size |
int |
16 |
The batch size used to calculate the scores. |
max_seq_length |
int |
512 |
Maximum sequence length (in tokens) for the input. |
scores_output_file |
str |
None |
The optional file to store the rescored beams. |
use_amp |
bool |
|
Whether to use AMP if available calculate the scores. |
device |
str |
cuda |
The device to load LM model onto to calculate the scores It can be ‘cpu’, ‘cuda’, ‘cuda:0’, ‘cuda:1’, … |
Hyperparameter Linear Search#
The hyperparameter linear search script also supports linear search for parameters alpha and beta. If any of the two is not provided, a linear search is performed to find the best value for that parameter. When linear search is used, initially beta is set to zero and the best value for alpha is found, then alpha is fixed with that value and another linear search is done to find the best value for beta. If any of the of these two parameters is already specified, then search for that one is skipped. After each search for a parameter, the plot of WER% for different values of the parameter is also shown.
It is recommended to first use the linear search for both parameters on a validation set by not providing any values for –alpha and –beta. Then check the WER curves and decide on the best values for each parameter. Finally, evaluate the best values on the test set.