Word dependent N-best search method

US 5,241,619 A
Filed: 06/25/1991
Issued: 08/31/1993
Est. Priority Date: 06/25/1991
Status: Expired due to Term

First Claim

Patent Images

1. A method of producing N-most likely sentence hypotheses defined as word sequences of one or more words from a limited vocabulary speech signal, each word having a set of states including a distinguished first and last state, said method comprising the steps of:

a. dividing the speech signal of an utterance into frames and generating for each frame at least one vector that characterizes the speech signal;

b. computing for each frame for selected states in selected words, the probability of a sequence of vectors up to each such frame, given a most likely partial sentence hypothesis that begins with the utterance and ends with that state at that frame;

c. at each of said selected states accumulating a separate probability score for each of m most likely different partial sentence hypotheses that begin with the utterance and end at this state at that frame, but that differ in the previous word to the word to which this state belongs so as to provide m previous-word theories having respective identities, wherein m is an integer;

d. recording at each frame for the last state of each word the accumulated probability scores together with the identities of the respective previous-word theories;

e. starting the first state of each word with the probability score of each of n most likely respective previous-word theories and each said word according to a grammar model, wherein n is an integer; and

f. at the end of the utterance reassembling N likely different sentence hypotheses that have the highest accumulated scores using the recorded probability scores and previous-word theories recorded in step d so as to provide the N-most likely sentence hypotheses, wherein N is an integer.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

As a step in finding the one most likely word sequence in a spoken language system, an N-best search is conducted to find the N most likely sentence hypotheses. During the search, word theories are distinguished based only on the one previous word. At each state within a word, the total probability is calculated for each of a few previous words. At the end of each word, the probability score is recorded for each previous word theory, together with the name of the previous word. At the end of the sentence, a recursive traceback is performed to derive the list of the N best sentences.

280 Citations

20 Claims

1. A method of producing N-most likely sentence hypotheses defined as word sequences of one or more words from a limited vocabulary speech signal, each word having a set of states including a distinguished first and last state, said method comprising the steps of:
- a. dividing the speech signal of an utterance into frames and generating for each frame at least one vector that characterizes the speech signal;
  
  b. computing for each frame for selected states in selected words, the probability of a sequence of vectors up to each such frame, given a most likely partial sentence hypothesis that begins with the utterance and ends with that state at that frame;
  
  c. at each of said selected states accumulating a separate probability score for each of m most likely different partial sentence hypotheses that begin with the utterance and end at this state at that frame, but that differ in the previous word to the word to which this state belongs so as to provide m previous-word theories having respective identities, wherein m is an integer;
  
  d. recording at each frame for the last state of each word the accumulated probability scores together with the identities of the respective previous-word theories;
  
  e. starting the first state of each word with the probability score of each of n most likely respective previous-word theories and each said word according to a grammar model, wherein n is an integer; and
  
  f. at the end of the utterance reassembling N likely different sentence hypotheses that have the highest accumulated scores using the recorded probability scores and previous-word theories recorded in step d so as to provide the N-most likely sentence hypotheses, wherein N is an integer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. A method as in claim 1, wherein said step c further comprises the step of adding the respective probability scores of two or more word theories having the same single previous word and arriving at the same state at the same time so as to provide a combined previous-word theory.
  - 3. A method as in claim 1, wherein said step e further comprises the step of adding the respective probability scores of two or more word theories having the same single previous word and arriving at the same state at the same time so as to provide a combined previous-word theory.
  - 4. A method as in claim 1, wherein said selected states of step b comprise select Markov states, said select Markov states being states in which the most likely previous-word theory probability score is within a predetermined range of the most likely previous-word theory probability score for any state in the previous frame.
  - 5. A method as in claim 1, wherein said selected words are words in which the most likely previous-word theory probability score of any state within that word is within a predetermined range of the most likely previous-word theory probability score for any state in the previous frame.
  - 6. A method as in claim 1, wherein said selected states in step b are selected Markov states, step b includes the step of computing for each frame for additional Markov states, the probability of the sequence of vectors up to each such frame, said Markov states being organized in groups such that said selected Markov states are states belonging to a group in which the maximum word theory probability score is within a predetermined range of the maximum word theory probability score for any state in the previous frame.
  - 7. A method as in claim 1, wherein said vector characterizes the spectral content of said speech signal.
  - 8. A method as in claim 7, wherein said vector is a Cepstral vector.
  - 9. A method as in claim 1, wherein a plurality of dissimilar vectors are generated for each frame.
  - 10. A method as in claim 1, wherein said grammar model is a bigram class grammar model based on pairs of word classes.
  - 11. A method as in claim 1, wherein said steps b and c are performed one frame at a time in a time synchronous manner.
  - 12. A method as in claim 1, wherein said frames overlap.
  - 13. A method as in claim 1, wherein in step b the selected states are selected Markov states, and said computation is based on corresponding hidden Markov models.
  - 14. A method as in claim 13, further comprising the steps of:
    - g. rescoring each of the N different sentence hypotheses from step f using different models from said hidden Markov models corresponding to said Markov states, and;
      
      h. multiplying the different probability scores for each of said N different word sequences from step f so as to produce the combined probability scores of said models.
  - 15. A method as in claim 14, wherein said different models are more detailed than said hidden Markov models.
  - 16. A method as in claim 1, wherein the grammar model used in step e is a bigram grammar model, said method further comprising the steps of:
    - g. rescoring each of the N different sentence hypotheses from step f using a different grammar model from said bigram model, and;
      
      h. multiplying the different scores for each of said N different word sequences from step f so as to produce the combined probability scores of said models.
  - 17. A method as in claim 16, wherein said different grammar model is a higher ordered model than said bigram model.
  - 18. A method as in claim 1, wherein the method is performed in two passes, a forward pass using a relatively simplified algorithm, and a backward and best pass using information computed in the forward pass to perform steps b, c, d and e.
  - 19. A method as in claim 18, wherein the steps b, c, d and e are performed in reverse order one frame at a time in a time synchronous manner.

20. A system for producing N-most likely sentence hypotheses defined as word sequences of one or more words from a limited vocabulary speech signal, each word having a set of states including a distinguished first and last state, said system comprising:
- a. means for dividing the speech signal of an utterance into frames and generating for each frame at least one vector that characterizes the speech signal;
  
  b. means for computing for each frame for selected states in selected words, the probability of a sequence of vectors up to each such frame, given a most likely partial sentence hypothesis that begins with the utterance and ends with that state at that frame;
  
  c. means for accumulating at each of said selected states a separate probability score for each of m most likely different partial sentence hypotheses that begin with the utterance and end at this state at that frame, but that different in the previous word to the word to which this state belongs so as to provide m previous-word theories having respective identities, wherein m is an integer;
  
  d. means for recording at each frame for the last state of each word the accumulated probability scores together with the identities of the respective previous-word theories;
  
  e. means for starting the first state of each word with the probability score of each of n most likely respective previous-word theories and each said word according to a grammar model, wherein n is an integer; and
  
  f. means for reassembling at the end of the utterance the N likely different sentence hypotheses that have the highest accumulated scores using the recorded probability scores and previous-word theories recorded in step d so as to provide the N-most likely sentence hypotheses, wherein N is an integer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.), Ramp Holdings Incorporated (Clean Harbors Incorporated)
Original Assignee
Bolt Beranek & Newman, Inc. (Verizon Communications Inc.)
Inventors
Austin, Stephen C., Schwartz, Richard M.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/720,652
Time in Patent Office

798 Days
Field of Search

381/41-45, 395/2
US Class Current

704/200
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/197   Probabilistic grammars, e.g...

Word dependent N-best search method

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

280 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Word dependent N-best search method

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

280 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links