Search algorithm for large vocabulary speech recognition

US 6,275,802 B1
Filed: 01/07/1999
Issued: 08/14/2001
Est. Priority Date: 01/07/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of a speech recognition system for processing a sequence of parameters that represent an input speech signal, the method comprising:

a. at selected locations in the sequence, searching over a current block time period having a duration sufficient that at least one word in the input speech signal is likely included, with a forward search pass that;

i. compares the current block time period parameters to selected word models from a first set of word models having sequences of model states, and ii. for each state in a set of selected model states, generates a most likely forward pass recognition hypothesis ending in the state and corresponding to the input speech signal;

b. performing a backward search pass, back through the sequence of parameters within the current block time period, that;

i. compares the current block time period parameters to selected word models from a second set of word models having sequences of model states, and to the most likely forward pass recognition hypotheses, ii. generates a current word graph including (1) a set of word graph recognition hypotheses, of at least one word, that end in selected model states, each recognition hypothesis having an associated occurrence probability score representing the likelihood of that recognition hypothesis corresponding to the input speech signal, and (2) nodes which connect adjacent words in the current word graph; and

iii. prunes any generated word graph recognition hypothesis that has an occurrence probability score less than a selected threshold;

c. updating any preceding word graph by linking recognition hypotheses of the preceding word graph to the current word graph; and

d. repeating steps (a)-(c) for the next block time period until the end of the utterance.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automatic speech recognition word sequence hypotheses are generated using an interleaved forward-backward search. A forward search pass uses relatively simple models for a given block period of time. A backward search pass then goes back over the previous block period using more complex models and the recognition hypotheses generated by the forward search pass. The backward search pass employs a word dependent n-best search having a flat model state organization.

Citations

34 Claims

1. A method of a speech recognition system for processing a sequence of parameters that represent an input speech signal, the method comprising:
- a. at selected locations in the sequence, searching over a current block time period having a duration sufficient that at least one word in the input speech signal is likely included, with a forward search pass that;
  
  i. compares the current block time period parameters to selected word models from a first set of word models having sequences of model states, and ii. for each state in a set of selected model states, generates a most likely forward pass recognition hypothesis ending in the state and corresponding to the input speech signal;
  
  b. performing a backward search pass, back through the sequence of parameters within the current block time period, that;
  
  i. compares the current block time period parameters to selected word models from a second set of word models having sequences of model states, and to the most likely forward pass recognition hypotheses, ii. generates a current word graph including (1) a set of word graph recognition hypotheses, of at least one word, that end in selected model states, each recognition hypothesis having an associated occurrence probability score representing the likelihood of that recognition hypothesis corresponding to the input speech signal, and (2) nodes which connect adjacent words in the current word graph; and
  
  iii. prunes any generated word graph recognition hypothesis that has an occurrence probability score less than a selected threshold;
  
  c. updating any preceding word graph by linking recognition hypotheses of the preceding word graph to the current word graph; and
  
  d. repeating steps (a)-(c) for the next block time period until the end of the utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. A method as in claim 1, wherein the forward search pass over the current block time period, begins operation with forward pass recognition hypotheses that were active at the end of the immediately preceding forward search pass, if any.
  - 3. A method as in claim 1, wherein the backward search pass continues into a portion of the sequence of parameters in an immediately preceding block time period, if any.
  - 4. A method as in claim 3, wherein the step of updating includes continuing the backward search pass in a reduced search mode over a portion of the immediately preceding block time period, wherein the reduced search mode includes:
5. A method as in claim 1, wherein the step of updating uses backward pointers to connect recognition hypotheses in the current word graph with recognition hypotheses in the preceding word graph.
6. A method as in claim 5, wherein the step of updating includes:
- tracing back the backward pointers from recognition hypotheses in the preceding word graph, and reconnecting the backward pointers of active recognition hypotheses of the preceding word graph to corresponding recognition hypotheses in the current word graph.
7. A method as in claim 1, wherein each word graph contains layers of recognition hypotheses and the step of updating processes back through the word graph layers.
8. A method as in claim 7, wherein the word graph layers are structured so that recognition hypotheses within a word graph layer point to preceding word graph layers and so that all the recognition hypotheses within each word graph layer are updated when the word graph layer is processed.
9. A method as in claim 7, wherein all recognition hypotheses ending at the same time are within the same word graph layer.
10. A method as in claim 7, wherein time is an indexed part of each word graph layer.
11. A method as in claim 7, wherein the word graph layers are updated by redirecting links from recognition hypotheses in the preceding word graph to recognition hypotheses in the current word graph.
12. A method as in claim 1, wherein the step of updating includes deleting inactive recognition hypotheses of the preceding word graph.
13. A method as in claim 1, wherein the step of updating further includes outputting at least one of the probable recognition hypotheses of the current word graph.
14. A method as in claim 13, wherein the step of outputting includes displaying to a user at least one of the probable recognition hypotheses.
15. A method as in claim 13, wherein the step of outputting outputs the most probable recognition hypothesis of the current word graph.
16. A method as in claim 1, the method further including pruning the current word graph when the sequence of parameters continues for a predetermined length of time without pausing at the end of a phrase.
17. A method as in claim 16, wherein the step of pruning includes:
- determining the most probable recognition hypothesis for the current word graph;
  
  selecting a boundary time between a pair of words near the end of the most probable recognition hypothesis; and
  
  treating the boundary time as a end of the sequence of parameters and a beginning of a new sequence of parameters.

18. A speech recognition system comprising:
- a. an input segmenter that processes an input speech signal into a sequence of representative parameters;
  
  b. a forward search comparator in communication with the input segmenter that, at selected locations in the sequence, searches a current block time period having a duration sufficient that at least one word in the input speech signal is likely included, with a forward search pass that;
  
  i. compares the current block time period parameters to selected word models from a first set of word models having sequences of model states, ii. for each state in a set of selected model states, generates a most likely forward pass recognition hypothesis ending in the state and corresponding to the input speech signal;
  
  c. a backward search comparator in communication with the forward search comparator that performs a backward search pass back though the sequence of parameters within the current block time period that;
  
  i. compares the current block time period parameters to selected word models from a second set of word models having sequences of model states, and to the forward pass recognition hypotheses, ii. generates a current word graph including (1) a set of word graph recognition hypotheses of at least one word that end in selected model states, each recognition hypothesis having an associated occurrence probability score representing the likelihood of that recognition hypothesis corresponding to the input speech signal, and (2) nodes which connect adjacent words in the current word graph, and iii. prunes any generated word graph recognition hypothesis that has an occurrence probability score less than a selected threshold;
  
  iv. updates any preceding word graph by linking recognition hypotheses of the preceding word graph to the current word graph; and
  
  d. a block period controller in communication with the forward search comparator and the backward search comparator that controls processing of the sequence of representative parameters in successive block periods.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 19. A system as in claim 18, wherein the forward search comparator searches over the current block time period beginning operation with the forward pass recognition hypotheses from the immediately preceding forward search pass, if any.
  - 20. A system as in claim 18, wherein the backward search comparator continues the backward search into a portion of the sequence of parameters in an immediately preceding block time period, if any.
  - 21. A system as in claim 20, wherein the step of updating in the backward search comparator includes continuing the backward search pass in a reduced search mode over a portion of the immediately preceding block time period, wherein the reduced search mode includes:
22. A system as in claim 18, wherein the step of updating in the backward search comparator uses backward pointers to connect recognition hypotheses in the current word graph with recognition hypotheses in the preceding word graph.
23. A system as in claim 22, wherein the step of updating in the backward search comparator includes:
- tracing back the backward pointers from recognition hypotheses in the preceding word graph, and reconnecting the backward pointers of active recognition hypotheses of the preceding word graph to corresponding recognition hypotheses in the current word graph.
24. A system as in claim 18, wherein each word graph contains layers of recognition hypotheses and the step of updating in the backward search comparator processes back through the word graph layers.
25. A system as in claim 24, wherein the word graph layers are structured so that recognition hypotheses within a word graph layer point to preceding word graph layers and so that all the recognition hypotheses within each word graph layer are updated when the word graph layer is processed.
26. A system as in claim 24, wherein all recognition hypotheses ending at the same time are within the same word graph layer.
27. A system as in claim 24, wherein time is an indexed part of each word graph layer.
28. A system as in claim 24, wherein the word graph layers are updated by redirecting links from recognition hypotheses in the preceding word graph to recognition hypotheses in the current word graph.
29. A system as in claim 18, wherein the step of updating in the backward search comparator includes deleting inactive recognition hypotheses of the preceding word graph.
30. A system as in claim 18, wherein the step of updating in the backward search comparator further includes outputting at least one of the probable recognition hypotheses of the current word graph.
31. A system as in claim 30, wherein the step of outputting in the backward search comparator includes displaying to a user at least one of the probable recognition hypotheses.
32. A system as in claim 30, wherein the step of outputting in the backward search comparator outputs the most probable recognition hypothesis of the current word graph.
33. A system as in claim 18, wherein the backward search comparator further prunes the current word graph when the sequence of parameters continues for a predetermined length of time without pausing at the end of a phrase.
34. A system as in claim 33, wherein the step of pruning in the backward search comparator includes:
- determining the most probable recognition hypothesis for the current word graph;
  
  selecting a boundary time between a pair of words near the end of the most probable recognition hypothesis; and
  
  treating the boundary time as an end of the sequence of parameters and a beginning of a new sequence of parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ScanSoft, Inc. n/k/a Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Lernout & Hauspie Speech Products NV (Intel Corporation)
Inventors
Aelten, Filip Van
Primary Examiner(s)
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars

Application Number

US09/226,863
Time in Patent Office

950 Days
Field of Search

704/242, 704/255
US Class Current

704/255
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 2015/085 Methods for reducing search...

Search algorithm for large vocabulary speech recognition

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Search algorithm for large vocabulary speech recognition

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links