Single tree method for grammar directed, very large vocabulary speech recognizer

US 5,621,859 A
Filed: 01/19/1994
Issued: 04/15/1997
Est. Priority Date: 01/19/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method of recognizing a word as being one of a plurality of words in a vocabulary, the method comprising the steps of:

constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being associated with the end of a sequence of branches, such that all words that include the same initial phoneme sequence include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch other than final phonemes of a word having a right-context that includes at least one branch;

compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word;

associating with each branch a set of common-phoneme words, each common-phoneme word including the phoneme associated with the branch;

computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set; and

upon entering a branch, determining which preceding grammar state of a plurality of preceding grammar states is most likely to transition into a common-phoneme word of the set of common-phoneme words associated with the branch.

View all claims

13 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process. A grammar probability is utilized upon recognition of each phoneme of a word, before recognition of the entire word is complete. Thus, grammar probabilities are exploited as early as possible during recognition of a word. At each frame of the recognition process, a grammar probability is determined for the transition from the most likely preceding grammar state to a set of words that share at least one common phoneme. The grammar probability is combined with accumulating phonetic evidence to provide a measure of the likelihood that a state in the HMM will lead to the word most likely to have been spoken. In a preferred embodiment, phonetic context information is exploited, even before the complete context of a phoneme is known. Instead of an exact triphone model, wherein the phonemes previous and subsequent to a phoneme are considered, a composite triphone model is used that exploits partial phonetic context information to provide a phonetic model that is more accurate than aphonetic model that ignores context. In another preferred embodiment, the single phonetic tree method is used as the forward pass of a forward/backward recognition process, wherein the backward pass employs a recognition process other than the single phonetic tree method.

417 Citations

32 Claims

1. A method of recognizing a word as being one of a plurality of words in a vocabulary, the method comprising the steps of:
- constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being associated with the end of a sequence of branches, such that all words that include the same initial phoneme sequence include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch other than final phonemes of a word having a right-context that includes at least one branch;
  
  compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word;
  
  associating with each branch a set of common-phoneme words, each common-phoneme word including the phoneme associated with the branch;
  
  computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set; and
  
  upon entering a branch, determining which preceding grammar state of a plurality of preceding grammar states is most likely to transition into a common-phoneme word of the set of common-phoneme words associated with the branch.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the most likely preceding grammar state is associated with the branch.
  - 3. The method of claim 1, after the step of computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set, further including the step of:
    - associating the set transition probability with the branch associated with the set of common-phoneme words.
  - 4. The method of claim 1, wherein the step of determining which preceding grammar state is most likely includes the steps of:
    - for each preceding grammar state of the plurality of preceding grammar states, computing the product of a path score associated with the ending state of the preceding grammar state, and the set transition probability, thereby providing a plurality of products;
      
      determining which product among the plurality of products is greatest, the preceding grammar state associated with the greatest product being the preceding grammar state that is most likely to transition into a word of the set of common-phoneme words.
  - 5. The method of claim 1, wherein the function of the transition probabilities is the sum of the transition probabilities from the preceding grammar state to each word in the set of common-phoneme words.
  - 6. The method of claim 1, wherein the function of the transition probabilities is the maximum transition probability of the transition probabilities from the preceding grammar state to each word in the set of common-phoneme words.
  - 7. The method of claim 1, wherein the function of the transition probabilities is the average transition probability of all the transition probabilities from the preceding grammar state to each word in the set of common-phoneme words.
  - 8. The method of claim 1, wherein the transition probabilities are sometimes obtained from a grammar cache.
  - 9. The method of claim 8, wherein the grammar cache is a random-access data structure.

10. A method of recognizing a word, represented by an acoustic signal adapted to be separated into time interval frames, as being one of a plurality of words in a vocabulary, the method comprising the steps of:
- constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being disposed at the end of a sequence of branches, such that all words that include the same initial phonemes include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch having a right-context that includes at least one branch;
  
  compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word;
  
  associating with each branch a set of common-phoneme words, each common-phoneme word including the initial phonemes associated with the branch;
  
  computing, for each set of common-phoneme words and for each preceding grammar state that precedes the set of common-phoneme words, a set transition probability that is a function of the transition probabilities from the preceding grammar state to each common-phoneme word of the set;
  
  computing at each frame for each phoneme branch associated with the last phoneme of a word, a word-ending score;
  
  compiling and storing upon each frame a list of words that are each characterized by a word-ending score that exceeds a threshold value, each word of the list being associated with a grammar state;
  
  determining upon each frame, the greatest word-ending score of the list;
  
  for each phoneme-ending state of a phoneme branch, and for the root node, propagating the path score and an associated partial grammar score into each right-context branch thereof, only if the path score at the phoneme-ending state exceeds the threshold value;
  
  adjusting the path score to provide an adjusted path score, upon the path score and the associated partial grammar score being propagated into a right-context branch, by dividing the path score by the associated partial grammar score, and then multiplying by the partial grammar score of the right-context branch; and
  
  adding the state associated with the adjusted path score to a list of active states, only if the adjusted path score has exceed the first threshold.
- View Dependent Claims (11, 12)
- - 11. The method of claim 10, further including the step of:
    - associating the set transition probability with the branch associated with the set of common-phoneme words.
  - 12. The method of claim 10, wherein upon each frame, the greatest word-ending score of the list, and the word associated therewith, is propagated to the root node of the phonetic tree.

13. A method of recognizing a word as being one of a plurality of words in a vocabulary, the method comprising the steps of:
- constructing a phonetic tree having a plurality of branches, a phoneme being associated with each branch, a phonetic HMM being associated with each branch so as to model the phoneme, and a word being associated with the end of a sequence of branches, such that all words that include the same initial phonemes include the same initial branches in the phonetic tree, each branch having a left-context consisting of no more than a single branch, and each branch having a right-context that includes at least one branch;
  
  determining for each branch as many triphone hidden Markov models of the phoneme associated with the branch as there are branches in the right-context of the branch;
  
  computing for each branch a composite triphone model based upon all triphone models associated with the branch, and associating said composite triphone model with the branch;
  
  compiling a statistical language model having a plurality of grammar states and transition probabilities between grammar states, each grammar state including at least one word;
  
  associating with each branch a set of common-phoneme words, each word including the phoneme associated with the branch;
  
  computing, for each set of common-phoneme words and for each preceding grammar state, the total probability, summed over all the common-phoneme words of a set, that a common-phoneme word of the set will follow a preceding grammar state;
  
  associating a hypothesis having a path score, a traceback time, and a partial grammar score with each state of each phonetic HMM associated with a branch in said phonetic tree, the hypothesis being updatable upon each frame;
  
  updating upon each frame the path score of each hypothesis, only if the path score computed in the preceding frame exceeds a first threshold value;
  
  propagating upon each frame the partial grammar score and the traceback time of the dominant hypothesis of the previous frame to update the hypothesis of each state of the present frame;
  
  remembering upon each frame a partial grammar score and a traceback time of the hypothesis entering a branch with the highest path score;
  
  remembering upon each frame the maximum path score of all phonetic models in the phonetic tree HMM, and recomputing the first threshold value using the maximum path score and a beam width;
  
  computing at each frame, for each phoneme branch associated with the last phoneme of a word, a word-ending score;
  
  compiling upon each frame a first list of words, each word of the first list being associated with a grammar state, and each word of the first list being characterized by a word-ending score that exceeds the first threshold value, the first list of words being for use in a backwards pass of a forward-backward search;
  
  computing upon each frame a second threshold value that is greater than the first threshold value and less than the greatest word-ending score;
  
  compiling and storing upon each frame a second list of words that are each characterized by a word-ending score that exceeds the second threshold value, each word of the second list being associated with a grammar state;
  
  for each phoneme-ending state of a phoneme branch, and for the root node, propagating the path score and an associated partial grammar score into each right-context branch thereof, only if the path score at the phoneme-ending state exceeds the first threshold;
  
  adjusting the path score to provide an adjusted path score, upon the path score and the associated partial grammar score being propagated into a right-context branch, by dividing the path score by the associated partial grammar score, and then multiplying by the partial grammar score of the right-context branch; and
  
  adding the state associated with the adjusted path score to a list of active states, only if the adjusted path score has exceed the first threshold.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 14. The method of claim 13, wherein the total probability that a common-phoneme word of the set will follow a preceding grammar state is associated with the branch associated with the set of common-phoneme words.
  - 15. The method of claim 13, further including the step of:
    - representing the grammar so as to facilitate access to information regarding each grammar state, each grammar state being associated with all observed transitions to subsequent words, and to subsequent sets of words, each word and each set of words being associated with a unique index.
  - 16. The method of claim 15, including the step of:
    - computing, for each set of common-phoneme words, the sum of the unigram probabilities of all the common-phoneme words in the set.
  - 17. The method of claim 13, further including the step of:
    - propagating upon each frame, the greatest word-ending score to the root node of the phonetic tree.
  - 18. The method of claim 13, wherein the second list of words includes less words than the first list of words.
  - 19. The method of claim 13, wherein the hypothesis also includes a preceding grammar state.
  - 20. The method of claim 13, wherein said second threshold is defined with reference to the highest word-end score.
  - 21. The method of claim 13, wherein the statistical language model is a bigram grammar.
  - 22. The method of claim 13, wherein the representative function of the path scores is the maximum of the path scores.
  - 23. The method of claim 13, wherein the representative function of the path scores is the sum of the path scores.
  - 24. The method of claim 13, wherein the step of propagating upon each frame the path score into each branch of the right context of the branch associated with the phoneme ending state, or of the root node, includes the step of:
    - propagating the path score, the traceback time, and the partial grammar score directly into the first state of the following phonetic branch, only if there is one phonetic branch in the right-branching context.
  - 25. The method of claim 13, the step of propagating upon each frame the path score into each branch of the right context of the branch associated with the phoneme ending state, or of the root node, includes the steps of:
    - reading the partial grammar score stored in the branch of the right-context;
      
      propagating that partial grammar score, an entering traceback time, and the adjusted path score into the single branch, only if the entering traceback time is the same as the traceback time stored in the single branch at a previous frame; and
      
      using that partial grammar score as the new partial grammar score of the branch, thereby avoiding computing a new partial grammar score for the branch, only if the entering traceback time is the same as the traceback time stored in the branch at a previous frame.
  - 26. The method of claim 13, wherein the step of propagating upon each frame the path score into each branch of the right-context of the branch associated with the phoneme-ending state, includes the step of:
    - reading the second list of words stored at the frame indicated by the traceback time that is associated with the path score in a hypothesis, each word of the second list of words being associated with a preceding grammar state; and
      
      for each branch, determining which preceding grammar state is most likely to transition into the set of common-phoneme words associated with the branch.
  - 27. The method of claim 26, wherein the step of determining which preceding grammar state is most likely to transition into the set of common-phoneme words includes the steps of:
    - for each preceding grammar state, computing the product of the path score associated with the ending state of the grammar state, and a function of the conditional transition probabilities over the set of common-phoneme words, given the preceding grammar state, thereby providing a plurality of products;
      
      determining which product among the plurality of products is the greatest product, the preceding grammar state associated with the greatest product being most likely to transition into the set of common-phoneme words; and
      
      using the greatest product as the new partial grammar score of the hypothesis propagated into the branch.
  - 28. Method of claim 27, wherein the function of the conditional transition probabilities over the set of common-phoneme words is the sum of the conditional transition probabilities of each word in the set of common-phoneme words, given the preceding grammar state.
  - 29. Method of claim 27, wherein the function of the conditional transition probabilities over the set of common-phoneme words is the maximum conditional probability of the conditional transition probabilities of each word in the set of common-phoneme words, given the preceding grammar state.
  - 30. Method of claim 27, wherein the function of the conditional transition probabilities over the set of common-phoneme words is the average conditional probability of the conditional transition probabilities of each word in the set of common-phoneme words, given the preceding grammar state.
  - 31. The method of claim 27, wherein the conditional transition probabilities are sometimes obtained from a grammar cache.
  - 32. The method of claim 31, wherein the grammar cache is a random-access data structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.), Ramp Holdings Incorporated (Clean Harbors Incorporated)
Original Assignee
BBN Corporation (Verizon Communications Inc.)
Inventors
Nguyen, Long, Schwartz, Richard M.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Dorvil, Richemond

Application Number

US08/183,719
Time in Patent Office

1,182 Days
Field of Search

395/2.65, 395/2.64, 395/2.63, 395/2.4, 395/2.45, 395/2.49, 395/2.66, 364/419.01, 364/419.02, 364/419.03, 364/419.04, 364/419.05, 364/419.06, 364/419.07, 364/419.08
US Class Current

704/256
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/022   Demisyllables, biphones or ...

Single tree method for grammar directed, very large vocabulary speech recognizer

First Claim

13 Assignments

0 Petitions

Accused Products

Abstract

417 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Single tree method for grammar directed, very large vocabulary speech recognizer

First Claim

13 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

417 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links