Apparatus for Reducing Spurious Insertions in Speech Recognition
First Claim
1. Apparatus for automatically generating a phonetic baseform from a spoken utterance, the apparatus comprising:
- a memory; and
at least one processor coupled to the memory and operative to;
(i) obtain a stream of acoustic observations representing the spoken utterance;
(ii) generate a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, wherein a score associated with the single candidate subphone is equal to the sum of scores associated with the merged candidate subphones such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event, and further wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
(iii) convert the sequence of subphone units into a phonetic baseform.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates.
11 Citations
12 Claims
-
1. Apparatus for automatically generating a phonetic baseform from a spoken utterance, the apparatus comprising:
-
a memory; and at least one processor coupled to the memory and operative to;
(i) obtain a stream of acoustic observations representing the spoken utterance;
(ii) generate a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, wherein a score associated with the single candidate subphone is equal to the sum of scores associated with the merged candidate subphones such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event, and further wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
(iii) convert the sequence of subphone units into a phonetic baseform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An article of manufacture for automatically generating a phonetic baseform from a spoken utterance, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
obtaining a stream of acoustic observations representing the spoken utterance; generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, wherein a score associated with the single candidate subphone is equal to the sum of scores associated with the merged candidate subphones such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event, and further wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and converting the sequence of subphone units into a phonetic baseform.
-
-
12. A speech recognition system, comprising:
-
a speech recognition engine; and a recognition lexicon associated with the speech recognition engine, the recognition lexicon including at least one phonetic baseform automatically generated by;
(i) obtaining a stream of acoustic observations representing the spoken utterance;
(ii) generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, wherein a score associated with the single candidate subphone is equal to the sum of scores associated with the merged candidate subphones such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event, and further wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
(iii) converting the sequence of subphone units into the at least one phonetic baseform.
-
Specification