ASR Advanced Details
Contents
ASR Advanced Details#
Confidence Estimates#
Riva supports both utterance level and word level confidence estimates. A greater confidence indicates a greater estimated likelihood that the associated word or utterance is correct. This is an experimental feature. The accuracy, format, and presence of these confidences should not be relied upon. Currently, utterance and word level confidence estimates can be roughly interpreted as estimated natural-log probabilities. The estimation of confidence scores varies by decoder. The following table gives a general idea of the confidence estimation method for each decoder supported by Riva ASR.
Decoder |
Word Confidence |
Utterance Confidence |
---|---|---|
Greedy |
Minimum log probability across the span of acoustic frames which represent the word, excluding blank tokens. |
Mean word confidence |
OpenSeq2Seq (os2s) |
Scores are accumulated via a prefix beam search for CTC with an LM. Word scores are simply the accumulation from the frames associated with that word. |
Scores are accumulated as above for the entire utterance. |
Flashlight |
roughly a simple sum of log AM probabilities plus LM scores for the frames of a the word |
roughly a simple sum of log AM probabilities plus LM scores for the whole utterance |
Kaldi |
log probability of the word given by the associated arc in the lattice. |
log probability of the utterance given by the associated path through the lattice. |