ASR Advanced Details#

Confidence Estimates#

Riva supports both utterance level and word level confidence estimates. A greater confidence indicates a greater estimated likelihood that the associated word or utterance is correct. This is an experimental feature. The accuracy, format, and presence of these confidences should not be relied upon. Currently, utterance and word level confidence estimates can be roughly interpreted as estimated natural-log probabilities. The estimation of confidence scores varies by decoder. The following table gives a general idea of the confidence estimation method for each decoder supported by Riva ASR.

Decoder

Word Confidence

Utterance Confidence

Greedy

Minimum log probability across the span of acoustic frames which represent the word, excluding blank tokens.

Mean word confidence

OpenSeq2Seq (os2s)

Scores are accumulated via a prefix beam search for CTC with an LM. Word scores are simply the accumulation from the frames associated with that word.

Scores are accumulated as above for the entire utterance.

Flashlight

roughly a simple sum of log AM probabilities plus LM scores for the frames of a the word

roughly a simple sum of log AM probabilities plus LM scores for the whole utterance

Kaldi

log probability of the word given by the associated arc in the lattice.

log probability of the utterance given by the associated path through the lattice.