Constructing Markov models of words from multiple utterances
First Claim
1. In a speech recognition system having an acoustic processor, a method of processing multiple utterances of a word in the construction of a fenemic baseform for the word, the method comprising the steps of:
- (a) providing as input a string of fenemes generated by the acoustic processor in response to an utterance of the word;
(b) repeating step (a) for each utterance of the multiple utterances; and
(c) locating a consistent point in each input string of fenemes, wherein each string of fenemes is divided by the consistent point thereof into a left portion and a right portion (i) each of the left portions corresponding to a first sound-representing model in a set of sound-representing models and (ii) each of the right portions corresponding to a second sound-representing model in the set of sound-representing models.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech recognition is improved by splitting each feneme string at a consistent point into a left portion and a right portion. The present invention addresses the problem of constructing fenemic baseforms which take into account variations in pronunciation of words from one utterance thereof to another. Specifically, the invention relates to a method of constructing a fenemic baseform for a word in a vocabulary of word segments including the steps of: (a) transforming multiple utterances of the word into respective strings of fenemes; (b) defining a set of fenemic Markov model phone machines; (c) determining the best single phone machine P1 for producing the multiple feneme strings; (d) determining the best two phone baseform of the form P1 P2 or P2 P1 for producing the multiple feneme strings; (e) aligning the best two phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings and the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances including the further step of inhibiting further splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and (k) concatenating the unsplit single phones in an order corresponding to the order of the feneme substrings to which they correspond.
61 Citations
13 Claims
-
1. In a speech recognition system having an acoustic processor, a method of processing multiple utterances of a word in the construction of a fenemic baseform for the word, the method comprising the steps of:
-
(a) providing as input a string of fenemes generated by the acoustic processor in response to an utterance of the word; (b) repeating step (a) for each utterance of the multiple utterances; and (c) locating a consistent point in each input string of fenemes, wherein each string of fenemes is divided by the consistent point thereof into a left portion and a right portion (i) each of the left portions corresponding to a first sound-representing model in a set of sound-representing models and (ii) each of the right portions corresponding to a second sound-representing model in the set of sound-representing models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech recognition method using a speech input subsystem which converts utterances to feneme strings and a computer, the method being characterized by the steps of:
-
(a) finding a best first baseform of phone length one which maximizes the joint probability of producing the feneme strings resulting from multiple utterances of a given word in a vocabulary of words; (b) finding a best second baseform of phone length two and of the form either (i) P1 P2 or (ii) P2 P1 which has a higher joint probability than any other baseform of length two; (c) comparing the joint probability of the first baseform with the joint probability of the second baseform and, if the second baseform joint probability is higher than the joint probability of the first baseform, splitting each feneme string into a left portion and a right portion at the point which maximizes the probability that the left portion is produced by the left phone and the right portion is produced by the right phone; (d) repeating steps (a) through (c) until all baseforms are of single phone length and no second baseform has a higher probability than its respective first baseform; and (e) concatenating the baseforms of phone length one remaining after step (d) to form a basic fenemic baseform of the entire word. - View Dependent Claims (10)
-
-
11. A method of constructing a fenemic baseform for a word in a vocabulary of word segments, the method comprising the steps of:
-
(a) transforming multiple utterances of the word into respective strings of fenemes; (b) defining a set of fenemic Markov model phone machines; (c) determining the best single phone machine P1 for producing the multiple feneme strings; (d) determining the best two phone baseform of the form P1 P2 or P2 P1 for producing the multiple feneme strings; (e) aligning the best two phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone machine of the two phone baseform and the right portion corresponding to the second phone machine of the two phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings in the same manner as the set of feneme strings corresponding to the multiple utterances, including the further step of inhibiting splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; (j) processing the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances, including the further step of inhibiting splitting of a substring when the single phone baseform thereof has a higher probability of producing the substring than does the best two phone baseform; and (k) concatenating the unsplit single phones in an order corresponding the order of the feneme substrings to which they correspond. - View Dependent Claims (12, 13)
-
Specification