Method of recognizing continuously spoken words

US 5,005,203 A
Filed: 03/30/1988
Issued: 04/02/1991
Est. Priority Date: 04/03/1987
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing a speech signal which is derived from coherently spoken words and includes a temporal sequence of speech values, each of which indicates a section of the speech signal, comprising:

comparing the speech values with given stored comparison values, of which each time a group of comparison values represents a word of a given vocabulary;

summing the comparison results over different sequences of combinations of comparison values and speech values to a distance sum per sequence, at each new speech value for each word calculating and storing in a first memory a distance sum of such sequences which for each word begin at different earlier speech values as a starting point, traverse the whole word as far as the instantaneous speech value and, related to the respective starting point, produce a minimum distance sum;

then, for each of these starting points, according to an assignment contained in a first stored list of the words of the vocabulary to per word at least one syntactical class, for each class storing the smallest distance sum of all words assigned to this class together with an indication about the assignment of the word yielding this smallest distance sum in a second memory;

subsequently, according to a second stored list, checking whether and into which two further syntactical classes each class can be subdivided and each time that a subdivisibility is ascertained again for each starting point as far as the earliest speech signal, adding each distance sum stored for the one of the two further classes for the respective starting point and a number of intermediate points lying successively at points adjacent to each other between the starting point and the instantaneous speech value, and each distance sum stored for the other of the two further classes for each intermediate point and the instantaneous speech value, and comparing each sum with the distance sum of the subdivided class and, in case it is larger than the smallest of the added distance sums, storing said sum instead thereof together with an indication about the subdivision at the particular intermediate point which has yielded the smallest sum; and

after processing the last speech value from the class indicating a whole sentence through the subdivision into further classes indicated therein at the storage site for the first speech value as starting point and through the subdivision indicated at the respective further classes, determining a sequence of words and supplying same as recognized spoken words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of recognizing continuously spoken words in which, during the speech recognition, speech values derived from the speech signal are compared with comparison values of the individual words of a given vocabulary. In order to reduce the rate of recognition errors, it is essentially known to take into account speech models, which consequently admit only selected sequences of words. From the theory of the formal languages, a class of speech models designated as "context-free grammar" is known, which represents a comparatively flexible speech model. For the use of this speech model in the technical process of recognition of a speech signal two lists are now utilized, which indicate the assignment between words and given syntactical classes and the assignment of these classes to, as the case may be, two other classes. Both lists are used at each new speech signal in that there is constantly considered backwards, which class explains most clearly the preceding speech section. At the end of the speech signal, starting from the class indicating the whole sentence, the sequence of the words can be followed backwards, which has yielded the smallest total distance sum and which moreover fits into the speech model given by the two lists.

30 Citations

View as Search Results

4 Claims

1. A method of recognizing a speech signal which is derived from coherently spoken words and includes a temporal sequence of speech values, each of which indicates a section of the speech signal, comprising:
- comparing the speech values with given stored comparison values, of which each time a group of comparison values represents a word of a given vocabulary;
  
  summing the comparison results over different sequences of combinations of comparison values and speech values to a distance sum per sequence, at each new speech value for each word calculating and storing in a first memory a distance sum of such sequences which for each word begin at different earlier speech values as a starting point, traverse the whole word as far as the instantaneous speech value and, related to the respective starting point, produce a minimum distance sum;
  
  then, for each of these starting points, according to an assignment contained in a first stored list of the words of the vocabulary to per word at least one syntactical class, for each class storing the smallest distance sum of all words assigned to this class together with an indication about the assignment of the word yielding this smallest distance sum in a second memory;
  
  subsequently, according to a second stored list, checking whether and into which two further syntactical classes each class can be subdivided and each time that a subdivisibility is ascertained again for each starting point as far as the earliest speech signal, adding each distance sum stored for the one of the two further classes for the respective starting point and a number of intermediate points lying successively at points adjacent to each other between the starting point and the instantaneous speech value, and each distance sum stored for the other of the two further classes for each intermediate point and the instantaneous speech value, and comparing each sum with the distance sum of the subdivided class and, in case it is larger than the smallest of the added distance sums, storing said sum instead thereof together with an indication about the subdivision at the particular intermediate point which has yielded the smallest sum; and
  
  after processing the last speech value from the class indicating a whole sentence through the subdivision into further classes indicated therein at the storage site for the first speech value as starting point and through the subdivision indicated at the respective further classes, determining a sequence of words and supplying same as recognized spoken words.
- View Dependent Claims (2, 3, 4)
- - 2. A method as claimed in claim 1, characterized in that for determining the distance sums the starting points of the sequences cover at most twice the length of the longest word of the vocabulary back from the instantaneous speech value.
  - 3. An arrangement for carrying out the method claimed in claim 1 comprising:
    - a comparison value memory for storing comparison values of a number of words,an input circuit for producing electrical speech signals from an acoustic speech signal,a first processing circuit for comparing the speech signals with the comparison values and for producing distance sums,a first memory which, each time at a speech value, stores the distance sums per word for several sequences which each time begin at one of a number of preceding speech values, and is overwritten at each following speech value, a second memory which, for each syntactical class per speech value and per preceding speech value, contains a distance sum as well as the address of a speech model memory, which contains the assignment of the words to given syntactical classes and their mutual assignment, a second processing circuit which, for each word, reads the syntactical class from the speech model memory and writes the distance sums contained in the first memory for the relevant word together with a reference to the corresponding address of the speech model memory into the second memory at a storage site corresponding to the class, to the speech value and to the preceding speech value, as far as said storage site contains a larger distance sum, and then reads from the speech model memory assignments of a syntactical class to two further classes and which for said further classes reads and adds at storage sites of the second memory corresponding to different combinations of speech value and preceding speech value and stores the minimum sum of the distance values of the further classes at the storage site of the first class together with an indication about the subdivision yielding the minimum sum and with further indications and which after processing of the last speech value, starting from the class indicating the whole sentence, determines and supplies successively a sequence of words through the further classes each time indicated therein in the second memory.
  - 4. An arrangement as claimed in claim 3, characterized in that at least one of the processing circuits comprises a microprocessor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Ney, Hermann
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/175,085
Time in Patent Office

1,098 Days
Field of Search

381/41-43, 364/513.5
US Class Current

704/255
CPC Class Codes

G10L 15/12 using dynamic programming t...

G10L 15/193 Formal grammars, e.g. finit...

Method of recognizing continuously spoken words

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

4 Claims

Specification

Solutions

Use Cases

Quick Links

Method of recognizing continuously spoken words

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

4 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links