Method and apparatus for recognizing spoken words in a speech signal
First Claim
1. A method for recognizing a sequence of words in a speech signal, said method comprising the steps of:
- at recurrent instants, sampling said speech signal for generating a series of test signals and executing a signal-by-signal matching and scoring between said test signals and various series of reference signals from a unitary set of reference signal series that each represent a vocabulary word;
assigning a first score to any first string based on a first test signal of preliminarily recognized words terminating at said first test signal;
as from a particular second test signal subsequent to said first test signal, continuing said signal-by-signal matching and scoring for appropriate further series of reference signals of the above set so as to take along a backpointer to said first test signal indicating the terminated words until attainment of a subsequent word termination at a further first test signal, each such further series representing said subsequent word so producing a sub-score;
for each such further series, retrieving an n-gram language model score (n≧
2) determined through a combined identity of said subsequent word and of (n-1) most recent vocabulary words at the preliminary recognized word string indicated by said backpointer;
adding said first score, said sub-score and said language model score for producing further said first score, an indication of said further first score, an indication of said subsequent word and said backpointer being stored in a region of a result list associated to said further first test signal;
selecting at least one minimum first score for so preliminarily recognizing a word string associated with said minimum first score.
0 Assignments
0 Petitions
Accused Products
Abstract
In the recognition of coherent speech, language models are favourably used to increase the reliability of recognition, which models, for example, take into account the probabilities of word combinations, especially of word pairs. For this purpose, a language model value corresponding to this probability is added at boundaries between words. In several recognition methods, for example, when the vocabulary is built up from phonemes in the shape of a tree, it is not known at the start of the continuation of a hypothesis after a word end which word will actually follow, so that a language model value cannot be taken into account until at the end of the next word. Measures are given for achieving this in such a manner that as far as possible the optimal preceding word or the optimal preceding word sequence is taken into account for the language model value without the necessity of constructing a copy of the searching tree for each and every simultaneously ending preceding word sequence.
-
Citations
8 Claims
-
1. A method for recognizing a sequence of words in a speech signal, said method comprising the steps of:
-
at recurrent instants, sampling said speech signal for generating a series of test signals and executing a signal-by-signal matching and scoring between said test signals and various series of reference signals from a unitary set of reference signal series that each represent a vocabulary word; assigning a first score to any first string based on a first test signal of preliminarily recognized words terminating at said first test signal; as from a particular second test signal subsequent to said first test signal, continuing said signal-by-signal matching and scoring for appropriate further series of reference signals of the above set so as to take along a backpointer to said first test signal indicating the terminated words until attainment of a subsequent word termination at a further first test signal, each such further series representing said subsequent word so producing a sub-score; for each such further series, retrieving an n-gram language model score (n≧
2) determined through a combined identity of said subsequent word and of (n-1) most recent vocabulary words at the preliminary recognized word string indicated by said backpointer;adding said first score, said sub-score and said language model score for producing further said first score, an indication of said further first score, an indication of said subsequent word and said backpointer being stored in a region of a result list associated to said further first test signal; selecting at least one minimum first score for so preliminarily recognizing a word string associated with said minimum first score. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A device recognizing a sequence of words in a speech signal comprising:
-
input means for receiving a speech signal; sampling means fed by said input means for at recurrent sampling said speech signal and at each instant; matching and scoring means fed by said sampling means and provided with first storage means for storing a unitary set of series of reference signals, each such series representing a vocabulary word, and second storage means for storing a set of n-gram (n≧
2) language model scores, each score pertaining to a sub-string of (n) most recent vocabulary words;first score means fed by said matching and scoring means for, at a particular first test signal, assigning a first score to any string of preliminary recognized words terminating at said first test signal; second score means fed by said first score means and by said second storage means, for incrementing any such first score by the appropriate language model score pertaining to the n most recently recognized vocabulary words of said string; memory means for storing incremented scores and associated words and backpointers; selecting means fed by said first memory means for selecting minimal score strings among those coexistently assigned and incremented by said first and second score means; and output means fed by said selecting means for outputting a selected absolute minimal score string for further usage.
-
Specification