Methods and apparatus for reducing spurious insertions in speech recognition
First Claim
1. A method of automatically generating a phonetic baseform from a spoken utterance, the method comprising the steps of:
- obtaining a stream of acoustic observations representing the spoken utterance;
generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, and wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
converting the sequence of subphone units into a phonetic baseform.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates.
35 Citations
30 Claims
-
1. A method of automatically generating a phonetic baseform from a spoken utterance, the method comprising the steps of:
-
obtaining a stream of acoustic observations representing the spoken utterance;
generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, and wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
converting the sequence of subphone units into a phonetic baseform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. Apparatus for automatically generating a phonetic baseform from a spoken utterance, the apparatus comprising:
-
a memory; and
at least one processor coupled to the memory and operative to;
(i) obtain a stream of acoustic observations representing the spoken utterance;
(ii) generate a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, and wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
(iii) convert the sequence of subphone units into a phonetic baseform. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. An article of manufacture for automatically generating a phonetic baseform from a spoken utterance, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
obtaining a stream of acoustic observations representing the spoken utterance;
generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, and wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
converting the sequence of subphone units into a phonetic baseform.
-
-
30. A speech recognition system, comprising:
-
a speech recognition engine; and
a recognition lexicon associated with the speech recognition engine, the recognition lexicon including at least one phonetic baseform automatically generated by;
(i) obtaining a stream of acoustic observations representing the spoken utterance;
(ii) generating a sequence of subphone units, wherein candidate subphones that relate to the same speech event and that overlap in time are merged into a single candidate subphone, and wherein the sequence of subphone units represents candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
(iii) converting the sequence of subphone units into the at least one phonetic baseform.
-
Specification