ASR Advanced Details#

Confidence Estimates#

Riva supports both utterance level and word level confidence estimates. A greater confidence indicates a greater estimated likelihood that the associated word or utterance is correct. This is an experimental feature. The accuracy, format, and presence of these confidences should not be relied upon. Currently, utterance and word level confidence estimates can be roughly interpreted as estimated natural-log probabilities. The estimation of confidence scores varies by decoder. The following table gives a general idea of the confidence estimation method for each decoder supported by Riva ASR.

Decoder	Word Confidence	Utterance Confidence
Greedy	Minimum log probability across the span of acoustic frames which represent the word, excluding blank tokens.	Mean word confidence
OpenSeq2Seq (os2s)	Scores are accumulated via a prefix beam search for CTC with an LM. Word scores are simply the accumulation from the frames associated with that word.	Scores are accumulated as above for the entire utterance.
Flashlight	roughly a simple sum of log AM probabilities plus LM scores for the frames of a the word	roughly a simple sum of log AM probabilities plus LM scores for the whole utterance
Kaldi	log probability of the word given by the associated arc in the lattice.	log probability of the utterance given by the associated path through the lattice.

NVIDIA Riva

ASR Advanced Details

Contents

ASR Advanced Details#

Confidence Estimates#