Method and apparatus for recognizing spoken words in a speech signal by organizing the vocabulary in the form of a tree
First Claim
1. A method for processing a sequence of words in a speech signal for speech recognition, said method comprising the steps of:
- sampling, at recurrent instants, said speech signal for generating a series of test signals;
generating a signal-by-signal matching and scoring between said test signals and a series of reference signals, each of said series of reference signals forming one of a plurality of vocabulary words arranged as a vocabulary tree with a root, and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end;
determining at least one complete word for a particular test signal;
for each completed word, separately;
forming a word result including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words;
storing said aggregate score starting at said root with a reference to said completed word;
proceeding with said signal-by-signal matching and scoring between subsequent test signals and said series of reference signals for each of a plurality of words completed for a particular test signal.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for processing a sequence of words in a speech signal for speech recognition. The method includes the steps of sampling, at recurrent instants, said speech signal for generating a series of test signals. Signal-by-signal matching and scoring is generated between the test signals and a series of reference signals, where each of the series of reference signals forms one of a plurality of vocabulary words arranged as a vocabulary tree. The vocabulary tree includes a root and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end. Acoustic recombination determines both continuations of branches and the most probable partial hypotheses within a word because of the use of a vocabulary built up as a tree with branches having reference signals. At least one complete word for a particular test signal is determined, and, separately, for each completed word, there is: I) a word result formed including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words.
-
Citations
14 Claims
-
1. A method for processing a sequence of words in a speech signal for speech recognition, said method comprising the steps of:
-
sampling, at recurrent instants, said speech signal for generating a series of test signals; generating a signal-by-signal matching and scoring between said test signals and a series of reference signals, each of said series of reference signals forming one of a plurality of vocabulary words arranged as a vocabulary tree with a root, and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end; determining at least one complete word for a particular test signal; for each completed word, separately; forming a word result including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words; storing said aggregate score starting at said root with a reference to said completed word; proceeding with said signal-by-signal matching and scoring between subsequent test signals and said series of reference signals for each of a plurality of words completed for a particular test signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An apparatus for processing a sequence of words in a speech signal for speech recognition, comprising:
-
means for sampling, at recurrent instants, said speech signal for generating a series of test signals; means for generating a signal-by-signal matching and scoring between said test signals and a series of reference signals, each of said series of reference signals forming one of a plurality of vocabulary words arranged as a vocabulary tree with a root, and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and as assigned to speech element and any vocabulary word is assigned to a particular branch junction or branch end; means for determining at least one complete word for a particular test signal; for each completed word, means for separately; forming a word result including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words; storing said aggregate score starting at said root with a reference to said completed word; proceeding with said signal-by-signal matching and scoring between subsequent test signals and said series of reference signals for each of a plurality of completed words for a particular test signal.
-
-
14. An apparatus for processing a sequence of words in a speech signal for speech recognition, comprising:
-
sampling means for, at recurrent instants, sampling said speech signal for generating a set of test signals; tree storage means for storing a vocabulary tree that has a root and a plurality of branches, any branch comprising a series of one or more reference signals and being assigned to a speech element and any vocabulary word being assigned to a particular branch junction or branch end as being represented by a string of series of reference signals from said root to the particular branch junction or particular branch end; model storage means for storing a plurality of language model values, each value uniquely assigned to a particular vocabulary word and a uniform-length string of prior completed words; matching-and-scoring means fed by said sampling means, by said tree storage means and by said model storage means for executing a signal-by-signal matching and scoring between subsequent test signals and various strings thus determining at least one complete word, deriving a word result comprising a word score, an aggregate score derived from said word score and the language model score assigned to the particular word and a respective string of prior completed words, and a reference to a last word of said respective string of prior completed words; copying means fed by said matching-and-scoring means for separately copying further non-identical word results from said matching-and-scoring means between subsequent test signals and said series of reference signals for each of a plurality of words completed for a particular test signal from the root of said tree into an intermediate memory; decision means fed by said matching-and-scoring means and by said intermediate memory for selectively continuing or not continuing said series of reference signals based on the derived aggregate scores; and recognition decision means fed by said matching-and-scoring means and by said intermediate memory for recognizing speech of said speech signal based on a minimum score among the derived aggregate score for each non-identical word from the root of the tree.
-
Specification