Automatic speech recognition learning using user corrections
First Claim
Patent Images
1. A computer-implemented method of learning with an automatic speech recognition system, the method comprising:
- detecting a change to a word included in a collection of dictated text, the change producing a changed version of the word;
utilizing a computer processor that is a component of a computing device to automatically infer whether the change is a correction or editing;
if the change is inferred to be a correction, selectively learning from the nature of the correction without additional user interaction;
wherein selectively learning from the nature of the correction includes;
making a first determination as to whether a user'"'"'s pronunciation, during an utterance that gave rise to the dictated text, deviated from existing pronunciations known by the system, the utterance including an utterance of said word as well as a related context word, and wherein making the first determination comprises;
doing a forced alignment of a wave that corresponds to the utterance of said word and the related context word;
analyzing the forced alignment so as to identify a portion of the wave that is the user'"'"'s pronunciation of said word;
generating a confidence score based at least upon a distance of the user'"'"'s pronunciation of said word to each of a plurality of possible pronunciations;
wherein the confidence score is calculated using the function 1/[d/f/log(len1+len2)], where d is the distance between the user'"'"'s pronunciation of said word to one of said possible pronunciations, f is a frequency that the user'"'"'s pronunciation of said word is pronounced, and len1 and len2 are values representing the length of phonemes;
making a second determination as to whether said word is included in the existing lexicon known by the system; and
if the second determination indicates that said word does exist in the existing lexicon, and if the first determination indicates the user'"'"'s pronunciation of said word is in said existing pronunciations known by the system, then selectively changing a parameter associated within the system with the user'"'"'s pronunciation of said word.
2 Assignments
0 Petitions
Accused Products
Abstract
An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.
59 Citations
6 Claims
-
1. A computer-implemented method of learning with an automatic speech recognition system, the method comprising:
-
detecting a change to a word included in a collection of dictated text, the change producing a changed version of the word; utilizing a computer processor that is a component of a computing device to automatically infer whether the change is a correction or editing; if the change is inferred to be a correction, selectively learning from the nature of the correction without additional user interaction; wherein selectively learning from the nature of the correction includes; making a first determination as to whether a user'"'"'s pronunciation, during an utterance that gave rise to the dictated text, deviated from existing pronunciations known by the system, the utterance including an utterance of said word as well as a related context word, and wherein making the first determination comprises; doing a forced alignment of a wave that corresponds to the utterance of said word and the related context word; analyzing the forced alignment so as to identify a portion of the wave that is the user'"'"'s pronunciation of said word; generating a confidence score based at least upon a distance of the user'"'"'s pronunciation of said word to each of a plurality of possible pronunciations; wherein the confidence score is calculated using the function 1/[d/f/log(len1+len2)], where d is the distance between the user'"'"'s pronunciation of said word to one of said possible pronunciations, f is a frequency that the user'"'"'s pronunciation of said word is pronounced, and len1 and len2 are values representing the length of phonemes; making a second determination as to whether said word is included in the existing lexicon known by the system; and if the second determination indicates that said word does exist in the existing lexicon, and if the first determination indicates the user'"'"'s pronunciation of said word is in said existing pronunciations known by the system, then selectively changing a parameter associated within the system with the user'"'"'s pronunciation of said word. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method of learning with an automatic speech recognition system, the method comprising:
-
detecting a change to a word included in a collection of dictated text, the change producing a changed version of the word; utilizing a computer processor that is a component of a computing device to automatically infer whether the change is a correction or editing; wherein inferring whether the change is a correction or editing includes comparing a speech recognition engine score of the dictated text and of the changed text; if the change is inferred to be a correction, selectively learning from the nature of the correction without additional user interaction; wherein selectively learning from the nature of the correction includes; making a first determination as to whether a user'"'"'s pronunciation, during an utterance that gave rise to the dictated text, deviated from existing pronunciations known by the system, and wherein making the first determination comprises generating a confidence score based at least upon a distance of the user'"'"'s pronunciation of said word to each of a plurality of possible pronunciations; wherein the confidence score is calculated using the function 1/[d/f/log(len1+len2)], where d is the distance between the user'"'"'s pronunciation of said word to one of said possible pronunciations, f is a frequency that the user'"'"'s pronunciation of said word is pronounced, and len1 and len2 are values representing the length of phonemes; making a second determination as to whether said word is included in the existing lexicon known by the system; and if the second determination indicates that said word does exist in the existing lexicon, and if the first determination indicates the user'"'"'s pronunciation of said word is in said existing pronunciations known by the system, then selectively changing a parameter associated within the system with the user'"'"'s pronunciation of said word. - View Dependent Claims (6)
-
Specification