Single tree method for grammar directed, very large vocabulary speech recognizer
First Claim
1. A method of recognizing a word as being one of a plurality of words in a vocabulary, the method comprising the steps of:
- constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being associated with the end of a sequence of branches, such that all words that include the same initial phoneme sequence include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch other than final phonemes of a word having a right-context that includes at least one branch;
compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word;
associating with each branch a set of common-phoneme words, each common-phoneme word including the phoneme associated with the branch;
computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set; and
upon entering a branch, determining which preceding grammar state of a plurality of preceding grammar states is most likely to transition into a common-phoneme word of the set of common-phoneme words associated with the branch.
13 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process. A grammar probability is utilized upon recognition of each phoneme of a word, before recognition of the entire word is complete. Thus, grammar probabilities are exploited as early as possible during recognition of a word. At each frame of the recognition process, a grammar probability is determined for the transition from the most likely preceding grammar state to a set of words that share at least one common phoneme. The grammar probability is combined with accumulating phonetic evidence to provide a measure of the likelihood that a state in the HMM will lead to the word most likely to have been spoken. In a preferred embodiment, phonetic context information is exploited, even before the complete context of a phoneme is known. Instead of an exact triphone model, wherein the phonemes previous and subsequent to a phoneme are considered, a composite triphone model is used that exploits partial phonetic context information to provide a phonetic model that is more accurate than aphonetic model that ignores context. In another preferred embodiment, the single phonetic tree method is used as the forward pass of a forward/backward recognition process, wherein the backward pass employs a recognition process other than the single phonetic tree method.
417 Citations
32 Claims
-
1. A method of recognizing a word as being one of a plurality of words in a vocabulary, the method comprising the steps of:
-
constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being associated with the end of a sequence of branches, such that all words that include the same initial phoneme sequence include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch other than final phonemes of a word having a right-context that includes at least one branch; compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word; associating with each branch a set of common-phoneme words, each common-phoneme word including the phoneme associated with the branch; computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set; and upon entering a branch, determining which preceding grammar state of a plurality of preceding grammar states is most likely to transition into a common-phoneme word of the set of common-phoneme words associated with the branch. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of recognizing a word, represented by an acoustic signal adapted to be separated into time interval frames, as being one of a plurality of words in a vocabulary, the method comprising the steps of:
-
constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being disposed at the end of a sequence of branches, such that all words that include the same initial phonemes include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch having a right-context that includes at least one branch; compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word; associating with each branch a set of common-phoneme words, each common-phoneme word including the initial phonemes associated with the branch; computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set; computing at each frame for each phoneme branch associated with the last phoneme of a word, a word-ending score; compiling and storing upon each frame a list of words that are each characterized by a word-ending score that exceeds a threshold value, each word of the list being associated with a grammar state; determining upon each frame, the greatest word-ending score of the list; for each phoneme-ending state of a phoneme branch, and for the root node, propagating the path score and an associated partial grammar score into each right-context branch thereof, only if the path score at the phoneme-ending state exceeds the threshold value; adjusting the path score to provide an adjusted path score, upon the path score and the associated partial grammar score being propagated into a right-context branch, by dividing the path score by the associated partial grammar score, and then multiplying by the partial grammar score of the right-context branch; and adding the state associated with the adjusted path score to a list of active states, only if the adjusted path score has exceed the first threshold. - View Dependent Claims (11, 12)
-
-
13. A method of recognizing a word as being one of a plurality of words in a vocabulary, the method comprising the steps of:
-
constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being associated with the end of a sequence of branches, such that all words that include the same initial phonemes include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch having a right-context that includes at least one branch; determining for each branch as many triphone hidden Markov models of the phoneme associated with the branch as there are branches in the right-context of the branch; computing for each branch a composite triphone model based upon all triphone models associated with the branch, and associating said composite triphone model with the branch; compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word; associating with each branch a set of common-phoneme words, each word including the phoneme associated with the branch; computing, for each set of common-phoneme words and for each preceding grammar state, the total probability, summed over all the common-phoneme words of a set, that a common-phoneme word of the set will follow a preceding grammar state; associating a hypothesis having a path score, a traceback time, and a partial grammar score with each state of each phonetic HMM associated with a branch in said phonetic tree, the hypothesis being updatable upon each frame; updating upon each frame the path score of each hypothesis, only if the path score computed in the preceding frame exceeds a first threshold value; propagating upon each frame the partial grammar score and the traceback time of the dominant hypothesis of the previous frame to update the hypothesis of each state of the present frame; remembering upon each frame a partial grammar score and a traceback time of the hypothesis entering a branch with the highest path score; remembering upon each frame the maximum path score of all phonetic models in the phonetic tree HMM, and recomputing the first threshold value using the maximum path score and a beam width; computing at each frame, for each phoneme branch associated with the last phoneme of a word, a word-ending score; compiling upon each frame a first list of words, each word of the first list being associated with a grammar state, and each word of the first list being characterized by a word-ending score that exceeds the first threshold value, the first list of words being for use in a backwards pass of a forward-backward search; computing upon each frame a second threshold value that is greater than the first threshold value and less than the greatest word-ending score; compiling and storing upon each frame a second list of words that are each characterized by a word-ending score that exceeds the second threshold value, each word of the second list being associated with a grammar state; for each phoneme-ending state of a phoneme branch, and for the root node, propagating the path score and an associated partial grammar score into each right-context branch thereof, only if the path score at the phoneme-ending state exceeds the first threshold; adjusting the path score to provide an adjusted path score, upon the path score and the associated partial grammar score being propagated into a right-context branch, by dividing the path score by the associated partial grammar score, and then multiplying by the partial grammar score of the right-context branch; and adding the state associated with the adjusted path score to a list of active states, only if the adjusted path score has exceed the first threshold. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification