Neural Rescoring#

When using the neural rescoring approach, a neural network is used to score candidates. A candidate is the text transcript predicted by the ASR model’s decoder. The top K candidates produced by beam search decoding (with a beam width of K) are given to a neural language model for ranking. The language model assigns a score to each candidate, which is usually combined with the scores from beam search decoding to produce the final scores and rankings.

Train Neural Rescorer#

An example script to train such a language model with Transformer can be found at examples/nlp/language_modeling/transformer_lm.py. It trains a TransformerLMModel which can be used as a neural rescorer for an ASR system. For more information on language models training, see LLM/NLP documentation.

You can also use a pretrained language model from the Hugging Face library, such as Transformer-XL and GPT, instead of training your model. Models like BERT and RoBERTa are not supported by this script because they are trained as Masked Language Models. As a result, they are not efficient or effective for scoring sentences out of the box.

Evaluation#

Given a trained TransformerLMModel .nemo file or a pretrained HF model, the script available at scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py can be used to re-score beams obtained with ASR model. You need the .tsv file containing the candidates produced by the acoustic model and the beam search decoding to use this script. The candidates can be the result of just the beam search decoding or the result of fusion with an N-gram LM. You can generate this file by specifying –preds_output_folder for scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py.

The neural rescorer would rescore the beams/candidates by using two parameters of rescorer_alpha and rescorer_beta, as follows:

final_score = beam_search_score + rescorer_alpha*neural_rescorer_score + rescorer_beta*seq_length

The parameter rescorer_alpha specifies the importance placed on the neural rescorer model, while rescorer_beta is a penalty term that accounts for sequence length in the scores. These parameters have similar effects to beam_alpha and beam_beta in the beam search decoder and N-gram language model.

Use the following steps to evaluate a neural LM:

  1. Obtain .tsv file with beams and their corresponding scores. Scores can be from a regular beam search decoder or in fusion with an N-gram LM scores. For a given beam size beam_size and a number of examples for evaluation num_eval_examples, it should contain (num_eval_examples x beam_size) lines of form beam_candidate_text t score. This file can be generated by scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py

  2. Rescore the candidates by scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py.

python eval_neural_rescorer.py
    --lm_model=[path to .nemo file of the LM or the name of a HF pretrained model]
    --beams_file=[path to beams .tsv file]
    --beam_size=[size of the beams]
    --eval_manifest=[path to eval manifest .json file]
    --batch_size=[batch size used for inference on the LM model]
    --alpha=[the value for the parameter rescorer_alpha]
    --beta=[the value for the parameter rescorer_beta]
    --scores_output_file=[the optional path to store the rescored candidates]

The candidates, along with their new scores, are stored at the file specified by –scores_output_file.

The following is the list of the arguments for the evaluation script:

Argument

Type

Default

Description

lm_model

str

Required

The path of the ‘.nemo’ file of an ASR model, or the name of a Hugging Face pretrained model like ‘transfo-xl-wt103’ or ‘gpt2’.

eval_manifest

str

Required

Path to the evaluation manifest file (.json manifest file).

beams_file

str

Required

Path to beams file (.tsv) containing the candidates and their scores.

beam_size

int

Required

The width of the beams (number of candidates) generated by the decoder.

alpha

float

None

The value for parameter rescorer_alpha Not passing value would enable linear search for rescorer_alpha.

beta

float

None

The value for parameter rescorer_beta Not passing value would enable linear search for rescorer_beta.

batch_size

int

16

The batch size used to calculate the scores.

max_seq_length

int

512

Maximum sequence length (in tokens) for the input.

scores_output_file

str

None

The optional file to store the rescored beams.

use_amp

bool

False

Whether to use AMP if available calculate the scores.

device

str

cuda

The device to load LM model onto to calculate the scores It can be ‘cpu’, ‘cuda’, ‘cuda:0’, ‘cuda:1’, …