Method of recognizing continuously spoken words
First Claim
1. A method of recognizing a speech signal which is derived from coherently spoken words and includes a temporal sequence of speech values, each of which indicates a section of the speech signal, comprising:
- comparing the speech values with given stored comparison values, of which each time a group of comparison values represents a word of a given vocabulary;
summing the comparison results over different sequences of combinations of comparison values and speech values to a distance sum per sequence, at each new speech value for each word calculating and storing in a first memory a distance sum of such sequences which for each word begin at different earlier speech values as a starting point, traverse the whole word as far as the instantaneous speech value and, related to the respective starting point, produce a minimum distance sum;
then, for each of these starting points, according to an assignment contained in a first stored list of the words of the vocabulary to per word at least one syntactical class, for each class storing the smallest distance sum of all words assigned to this class together with an indication about the assignment of the word yielding this smallest distance sum in a second memory;
subsequently, according to a second stored list, checking whether and into which two further syntactical classes each class can be subdivided and each time that a subdivisibility is ascertained again for each starting point as far as the earliest speech signal, adding each distance sum stored for the one of the two further classes for the respective starting point and a number of intermediate points lying successively at points adjacent to each other between the starting point and the instantaneous speech value, and each distance sum stored for the other of the two further classes for each intermediate point and the instantaneous speech value, and comparing each sum with the distance sum of the subdivided class and, in case it is larger than the smallest of the added distance sums, storing said sum instead thereof together with an indication about the subdivision at the particular intermediate point which has yielded the smallest sum; and
after processing the last speech value from the class indicating a whole sentence through the subdivision into further classes indicated therein at the storage site for the first speech value as starting point and through the subdivision indicated at the respective further classes, determining a sequence of words and supplying same as recognized spoken words.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of recognizing continuously spoken words in which, during the speech recognition, speech values derived from the speech signal are compared with comparison values of the individual words of a given vocabulary. In order to reduce the rate of recognition errors, it is essentially known to take into account speech models, which consequently admit only selected sequences of words. From the theory of the formal languages, a class of speech models designated as "context-free grammar" is known, which represents a comparatively flexible speech model. For the use of this speech model in the technical process of recognition of a speech signal two lists are now utilized, which indicate the assignment between words and given syntactical classes and the assignment of these classes to, as the case may be, two other classes. Both lists are used at each new speech signal in that there is constantly considered backwards, which class explains most clearly the preceding speech section. At the end of the speech signal, starting from the class indicating the whole sentence, the sequence of the words can be followed backwards, which has yielded the smallest total distance sum and which moreover fits into the speech model given by the two lists.
30 Citations
4 Claims
-
1. A method of recognizing a speech signal which is derived from coherently spoken words and includes a temporal sequence of speech values, each of which indicates a section of the speech signal, comprising:
- comparing the speech values with given stored comparison values, of which each time a group of comparison values represents a word of a given vocabulary;
summing the comparison results over different sequences of combinations of comparison values and speech values to a distance sum per sequence, at each new speech value for each word calculating and storing in a first memory a distance sum of such sequences which for each word begin at different earlier speech values as a starting point, traverse the whole word as far as the instantaneous speech value and, related to the respective starting point, produce a minimum distance sum;
then, for each of these starting points, according to an assignment contained in a first stored list of the words of the vocabulary to per word at least one syntactical class, for each class storing the smallest distance sum of all words assigned to this class together with an indication about the assignment of the word yielding this smallest distance sum in a second memory;
subsequently, according to a second stored list, checking whether and into which two further syntactical classes each class can be subdivided and each time that a subdivisibility is ascertained again for each starting point as far as the earliest speech signal, adding each distance sum stored for the one of the two further classes for the respective starting point and a number of intermediate points lying successively at points adjacent to each other between the starting point and the instantaneous speech value, and each distance sum stored for the other of the two further classes for each intermediate point and the instantaneous speech value, and comparing each sum with the distance sum of the subdivided class and, in case it is larger than the smallest of the added distance sums, storing said sum instead thereof together with an indication about the subdivision at the particular intermediate point which has yielded the smallest sum; and
after processing the last speech value from the class indicating a whole sentence through the subdivision into further classes indicated therein at the storage site for the first speech value as starting point and through the subdivision indicated at the respective further classes, determining a sequence of words and supplying same as recognized spoken words. - View Dependent Claims (2, 3, 4)
- comparing the speech values with given stored comparison values, of which each time a group of comparison values represents a word of a given vocabulary;
Specification