ASR Advanced Details#

Confidence Estimates#

Riva supports both utterance level and word level confidence estimates. A greater confidence indicates a greater estimated likelihood that the associated word or utterance is correct. This is an experimental feature. The accuracy, format, and presence of these confidences should not be relied upon. Currently, utterance and word level confidence estimates can be roughly interpreted as estimated natural-log probabilities. The estimation of confidence scores varies by decoder. The following table gives a general idea of the confidence estimation method for each decoder supported by Riva ASR.

Decoder

Word Confidence

Utterance Confidence

Greedy

Minimum log probabilty accross the span of accoutic frames which represent the word, excluding blank tokens.

Mean word confidence

OpenSeq2Seq (os2s)

Scores are accumulated via a prefix beam search for CTC with an LM. Word scores are simply the accumulation from the frames accociated with that word.

Scores are accumulated as above for the entire utterance.

Flashlight

roughly a simple sum of log AM probabilities plus LM scores for the frames of a the word

roughly a simple sum of log AM probabilties plus LM scores for the whole utterance

Kaldi

log probabilty of the word given by the associated arc in the lattice.

log probabilty of the utternace given by the associated path through the lattice.