Automatic generation of simple Markov model stunted baseforms for words in a vocabulary
First Claim
1. In a system that (i) generates in an acoustic processor a string of fenemes in response to speech input, (ii) defines each word in a vocabulary by a respective fenemic baseform comprising a sequence of Markov model fenemic phones each of which relate to a feneme generatable by the acoustic processor, and (iii) defines an alphabet of composite Markov model phonetic phones each of which relates to a phonetic element and each of which correlates to at least one fenemic phone, a method of selecting Markov model phones for inclusion in a stunted baseform constructed of a sequence of phonetic phones, the method comprising the steps of:
- (a) generating a respective string of fenemes in response to an utterance of a selected word;
(b) selecting the fenemic baseform having the highest probability of producing the string of fenemes and aligning the selected fenemic baseform against the generated string of fenemes;
(c) replacing each fenemic phone in the fenemic baseform by a composite phone corresponding thereto, thereby forming a composite phone baseform;
(d) selecting a pair of adjacent composite phones in the composite phone baseform;
(e) determining a substring of fenemes aligned against the selected pair of adjacent phones;
(f) computing, for each composite phone in the composite phone alphabet, a respective probability of producing the determined substring of fenemes;
(g) selecting the composite phone having the highest probability of having produced the determined substring;
(h) repeating steps (d) through (g) for each pair of adjacent phones in the composite phone baseform; and
(j) replacing at least one pair of adjacent composite phones by the selected composite phone corresponding thereto.
1 Assignment
0 Petitions
Accused Products
Abstract
In a system that (i) defines each word in a vocabulary by a fenemic baseform of fenemic phones, (ii) defines an alphabet of composite phones each of which corresponds to at least one fenemic phone, and (iii) generates a string of fenemes in response to speech input, the method provides for converting a word baseform comprised of fenemic phones into a stunted word baseform of composite phones by (a) replacing each fenemic phone in the fenemic phone word baseform by the composite phone corresponding thereto; and (b) merging together at least one pair of adjacent composite phones by a single composite phone where the adverse effect of the merging is below a predefined threshold.
-
Citations
24 Claims
-
1. In a system that (i) generates in an acoustic processor a string of fenemes in response to speech input, (ii) defines each word in a vocabulary by a respective fenemic baseform comprising a sequence of Markov model fenemic phones each of which relate to a feneme generatable by the acoustic processor, and (iii) defines an alphabet of composite Markov model phonetic phones each of which relates to a phonetic element and each of which correlates to at least one fenemic phone, a method of selecting Markov model phones for inclusion in a stunted baseform constructed of a sequence of phonetic phones, the method comprising the steps of:
-
(a) generating a respective string of fenemes in response to an utterance of a selected word; (b) selecting the fenemic baseform having the highest probability of producing the string of fenemes and aligning the selected fenemic baseform against the generated string of fenemes; (c) replacing each fenemic phone in the fenemic baseform by a composite phone corresponding thereto, thereby forming a composite phone baseform; (d) selecting a pair of adjacent composite phones in the composite phone baseform; (e) determining a substring of fenemes aligned against the selected pair of adjacent phones; (f) computing, for each composite phone in the composite phone alphabet, a respective probability of producing the determined substring of fenemes; (g) selecting the composite phone having the highest probability of having produced the determined substring; (h) repeating steps (d) through (g) for each pair of adjacent phones in the composite phone baseform; and (j) replacing at least one pair of adjacent composite phones by the selected composite phone corresponding thereto. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a system which processes speech, a method of automatically constructing for each word in a vocabulary a stunted baseform, the method comprising the steps of:
-
(a) defining each word in the vocabulary as a respective fenemic baseform formed of a sequence of fenemic phones, each fenemic phone being from an alphabet of N fenemic phones, and entering each fenemic baseform into storage; (b) defining an alphabet of composite phones, each composite phone being associated with at least one fenemic phone; (c) generating, in an acoustic processor which associates one feneme in a fixed set of fenemes to each successive interval of speech in an utterance of speech, a plurality of feneme strings for a selected word, each feneme string being generated in response to an utterance of the selected word; (d) selecting the stored fenemic baseform having the highest joint probability of having produced all the generated feneme strings for the selected word and aligning the selected fenemic baseform against each of the feneme strings; (e) replacing each fenemic phone by a composite phone corresponding thereto, thereby forming a composite phone baseform; (f) selecting a pair of adjacent phones in the composite phone baseform; (g) for each feneme string, aligning a substring of fenemes against the selected pair of adjacent phones; (h) selecting a composite phone for a given pair of adjacent composite phones which has the highest joint probability of producing all of the substrings aligned against said given pair of adjacent composite phones; (j) repeating steps (f) through (h) for all pairs of adjacent phones, thereby providing a selected composite phone for each adjacent pair of phones in the composite phone baseform; and (k) determining which selected composite phone yields the least adverse effect on the probability of producing fenemes in the aligned substring therefor in the event that the pair of adjacent phones is replaced by the selected composite phone corresponding thereto. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. In a speech recognition system including means for performing acoustic matching between (i) stunted word baseforms each comprised of a sequence of Markov model phones and (ii) a string of fenemes generated by an acoustic processor in response to speech input, a method of automatically constructing for each word in a vocabulary a stunted baseform used in the acoustic matching, the method comprising the steps of:
-
(a) defining each word in the vocabulary as a respective fenemic baseform formed of a sequence of fenemic phones, each fenemic phone being from an alphabet of N fenemic phones, and entering each fenemic baseform into storage; (b) defining an alphabet of composite phones, each composite phone being associated with at least one fenemic phone; (c) generating a plurality of feneme strings for a selected word, each feneme string being generated in response to an utterance of the selected word; (d) selecting the stored fenemic baseform having the highest joint probability of producing all the generated feneme strings for the subject word and aligning the fenemic phones of the selected fenemic baseform against each of the feneme strings; (e) replacing each fenemic phone by a composite phone corresponding thereto, thereby forming a composite phone baseform; (f) selecting a pair of adjacent composite phones in the composite phone baseform; (g) for each generated feneme string, aligning a substring of the generated feneme string against the selected pair of adjacent composite phones; (h) selecting the composite phone in the alphabet thereof which has the highest joint probability of producing all of the determined substrings; (j) repeating steps (f) through (h) for all pairs of adjacent composite phones, thereby providing a selected composite phone for each adjacent pair of composite phones in the baseform; (k) determining which single composite phone yields the least adverse effect when replacing the adjacent pair of composite phones corresponding thereto; and (m) replacing the selected composite phone yielding the least adverse effect for the adjacent composite phones corresponding thereto, thereby forming a new baseform. - View Dependent Claims (20, 21)
-
-
22. In a system that (i) defines each word in a vocabulary by a fenemic baseform of fenemic phones, (ii) defines an alphabet of composite phones each of which corresponds to at least one fenemic phone, and (iii) generates a string of fenemes in response to speech input, a method of constructing a stunted phonetic-type word baseform of phones comprising the step of:
-
converting a fenemic phone word baseform into a stunted word baseform of composite phones including the steps of; (a) replacing each fenemic phone in the fenemic phone word baseform by the composite phone corresponding thereto; and (b) replacing at least one pair of adjacent composite phones by a single composite phone where the adverse effect of the replacing is below a predefined threshold. - View Dependent Claims (23, 24)
-
Specification