Methods for reducing spurious insertions in speech recognition
First Claim
1. A method of automatically generating a phonetic baseform from a spoken utterance, the method comprising the steps of:
- obtaining a stream of acoustic observations representing the spoken utterance;
generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, wherein a score associated with the single candidate subphone is equal to the sum of scores associated with the merged candidate subphones such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event, and further wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
converting the sequence of subphone units into a phonetic baseform.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates.
9 Citations
17 Claims
-
1. A method of automatically generating a phonetic baseform from a spoken utterance, the method comprising the steps of:
-
obtaining a stream of acoustic observations representing the spoken utterance; generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, wherein a score associated with the single candidate subphone is equal to the sum of scores associated with the merged candidate subphones such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event, and further wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and converting the sequence of subphone units into a phonetic baseform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
Specification