Search algorithm for large vocabulary speech recognition
First Claim
Patent Images
1. A method of a speech recognition system for processing a sequence of parameters that represent an input speech signal, the method comprising:
- a. at selected locations in the sequence, searching over a current block time period having a duration sufficient that at least one word in the input speech signal is likely included, with a forward search pass that;
i. compares the current block time period parameters to selected word models from a first set of word models having sequences of model states, and ii. for each state in a set of selected model states, generates a most likely forward pass recognition hypothesis ending in the state and corresponding to the input speech signal;
b. performing a backward search pass, back through the sequence of parameters within the current block time period, that;
i. compares the current block time period parameters to selected word models from a second set of word models having sequences of model states, and to the most likely forward pass recognition hypotheses, ii. generates a current word graph including (1) a set of word graph recognition hypotheses, of at least one word, that end in selected model states, each recognition hypothesis having an associated occurrence probability score representing the likelihood of that recognition hypothesis corresponding to the input speech signal, and (2) nodes which connect adjacent words in the current word graph; and
iii. prunes any generated word graph recognition hypothesis that has an occurrence probability score less than a selected threshold;
c. updating any preceding word graph by linking recognition hypotheses of the preceding word graph to the current word graph; and
d. repeating steps (a)-(c) for the next block time period until the end of the utterance.
6 Assignments
0 Petitions
Accused Products
Abstract
Automatic speech recognition word sequence hypotheses are generated using an interleaved forward-backward search. A forward search pass uses relatively simple models for a given block period of time. A backward search pass then goes back over the previous block period using more complex models and the recognition hypotheses generated by the forward search pass. The backward search pass employs a word dependent n-best search having a flat model state organization.
-
Citations
34 Claims
-
1. A method of a speech recognition system for processing a sequence of parameters that represent an input speech signal, the method comprising:
-
a. at selected locations in the sequence, searching over a current block time period having a duration sufficient that at least one word in the input speech signal is likely included, with a forward search pass that;
i. compares the current block time period parameters to selected word models from a first set of word models having sequences of model states, and ii. for each state in a set of selected model states, generates a most likely forward pass recognition hypothesis ending in the state and corresponding to the input speech signal;
b. performing a backward search pass, back through the sequence of parameters within the current block time period, that;
i. compares the current block time period parameters to selected word models from a second set of word models having sequences of model states, and to the most likely forward pass recognition hypotheses, ii. generates a current word graph including (1) a set of word graph recognition hypotheses, of at least one word, that end in selected model states, each recognition hypothesis having an associated occurrence probability score representing the likelihood of that recognition hypothesis corresponding to the input speech signal, and (2) nodes which connect adjacent words in the current word graph; and
iii. prunes any generated word graph recognition hypothesis that has an occurrence probability score less than a selected threshold;
c. updating any preceding word graph by linking recognition hypotheses of the preceding word graph to the current word graph; and
d. repeating steps (a)-(c) for the next block time period until the end of the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
when the backward search pass at a given time reaches a beginning of a word, creating a new node in the current word graph for that word; and
examining the preceding word graph for a node for that word at that time and, if such a node exists, creating a substitute pointer from that node in the preceding word graph to the new node in the current word graph, and stopping the backward search pass for that word; and
if no such node exists, continuing the backward search pass.
-
-
5. A method as in claim 1, wherein the step of updating uses backward pointers to connect recognition hypotheses in the current word graph with recognition hypotheses in the preceding word graph.
-
6. A method as in claim 5, wherein the step of updating includes:
-
tracing back the backward pointers from recognition hypotheses in the preceding word graph, and reconnecting the backward pointers of active recognition hypotheses of the preceding word graph to corresponding recognition hypotheses in the current word graph.
-
-
7. A method as in claim 1, wherein each word graph contains layers of recognition hypotheses and the step of updating processes back through the word graph layers.
-
8. A method as in claim 7, wherein the word graph layers are structured so that recognition hypotheses within a word graph layer point to preceding word graph layers and so that all the recognition hypotheses within each word graph layer are updated when the word graph layer is processed.
-
9. A method as in claim 7, wherein all recognition hypotheses ending at the same time are within the same word graph layer.
-
10. A method as in claim 7, wherein time is an indexed part of each word graph layer.
-
11. A method as in claim 7, wherein the word graph layers are updated by redirecting links from recognition hypotheses in the preceding word graph to recognition hypotheses in the current word graph.
-
12. A method as in claim 1, wherein the step of updating includes deleting inactive recognition hypotheses of the preceding word graph.
-
13. A method as in claim 1, wherein the step of updating further includes outputting at least one of the probable recognition hypotheses of the current word graph.
-
14. A method as in claim 13, wherein the step of outputting includes displaying to a user at least one of the probable recognition hypotheses.
-
15. A method as in claim 13, wherein the step of outputting outputs the most probable recognition hypothesis of the current word graph.
-
16. A method as in claim 1, the method further including pruning the current word graph when the sequence of parameters continues for a predetermined length of time without pausing at the end of a phrase.
-
17. A method as in claim 16, wherein the step of pruning includes:
-
determining the most probable recognition hypothesis for the current word graph;
selecting a boundary time between a pair of words near the end of the most probable recognition hypothesis; and
treating the boundary time as a end of the sequence of parameters and a beginning of a new sequence of parameters.
-
-
18. A speech recognition system comprising:
-
a. an input segmenter that processes an input speech signal into a sequence of representative parameters;
b. a forward search comparator in communication with the input segmenter that, at selected locations in the sequence, searches a current block time period having a duration sufficient that at least one word in the input speech signal is likely included, with a forward search pass that;
i. compares the current block time period parameters to selected word models from a first set of word models having sequences of model states, ii. for each state in a set of selected model states, generates a most likely forward pass recognition hypothesis ending in the state and corresponding to the input speech signal;
c. a backward search comparator in communication with the forward search comparator that performs a backward search pass back though the sequence of parameters within the current block time period that;
i. compares the current block time period parameters to selected word models from a second set of word models having sequences of model states, and to the forward pass recognition hypotheses, ii. generates a current word graph including (1) a set of word graph recognition hypotheses of at least one word that end in selected model states, each recognition hypothesis having an associated occurrence probability score representing the likelihood of that recognition hypothesis corresponding to the input speech signal, and (2) nodes which connect adjacent words in the current word graph, and iii. prunes any generated word graph recognition hypothesis that has an occurrence probability score less than a selected threshold;
iv. updates any preceding word graph by linking recognition hypotheses of the preceding word graph to the current word graph; and
d. a block period controller in communication with the forward search comparator and the backward search comparator that controls processing of the sequence of representative parameters in successive block periods. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
when the backward search pass at a given time reaches a beginning of a word, creating a new node in the current word graph for that word; and
examining the preceding word graph for a node for that word at that time and, if such a node exists, creating a substitute pointer from that node in the preceding word graph to the new node in the current word graph, and stopping the backward search pass for that word; and
if no such node exists, continuing the backward search pass.
-
-
22. A system as in claim 18, wherein the step of updating in the backward search comparator uses backward pointers to connect recognition hypotheses in the current word graph with recognition hypotheses in the preceding word graph.
-
23. A system as in claim 22, wherein the step of updating in the backward search comparator includes:
-
tracing back the backward pointers from recognition hypotheses in the preceding word graph, and reconnecting the backward pointers of active recognition hypotheses of the preceding word graph to corresponding recognition hypotheses in the current word graph.
-
-
24. A system as in claim 18, wherein each word graph contains layers of recognition hypotheses and the step of updating in the backward search comparator processes back through the word graph layers.
-
25. A system as in claim 24, wherein the word graph layers are structured so that recognition hypotheses within a word graph layer point to preceding word graph layers and so that all the recognition hypotheses within each word graph layer are updated when the word graph layer is processed.
-
26. A system as in claim 24, wherein all recognition hypotheses ending at the same time are within the same word graph layer.
-
27. A system as in claim 24, wherein time is an indexed part of each word graph layer.
-
28. A system as in claim 24, wherein the word graph layers are updated by redirecting links from recognition hypotheses in the preceding word graph to recognition hypotheses in the current word graph.
-
29. A system as in claim 18, wherein the step of updating in the backward search comparator includes deleting inactive recognition hypotheses of the preceding word graph.
-
30. A system as in claim 18, wherein the step of updating in the backward search comparator further includes outputting at least one of the probable recognition hypotheses of the current word graph.
-
31. A system as in claim 30, wherein the step of outputting in the backward search comparator includes displaying to a user at least one of the probable recognition hypotheses.
-
32. A system as in claim 30, wherein the step of outputting in the backward search comparator outputs the most probable recognition hypothesis of the current word graph.
-
33. A system as in claim 18, wherein the backward search comparator further prunes the current word graph when the sequence of parameters continues for a predetermined length of time without pausing at the end of a phrase.
-
34. A system as in claim 33, wherein the step of pruning in the backward search comparator includes:
-
determining the most probable recognition hypothesis for the current word graph;
selecting a boundary time between a pair of words near the end of the most probable recognition hypothesis; and
treating the boundary time as an end of the sequence of parameters and a beginning of a new sequence of parameters.
-
Specification