Method of recognizing coherently spoken words

US 5,058,166 A
Filed: 05/11/1990
Issued: 10/15/1991
Est. Priority Date: 04/03/1987
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing words, composed of phonemes, from a speech signal divided into successive sections, the speech signal in each section being converted to a speech value, said method comprising:

storing reference values for comparison with successive speech values and storing distance sums for said reference values attained by comparisons of said reference values with preceding speech values;

comparing each stored reference value within a given neighborhood to a current speech value to determine a comparison result; and

determining a new distance sum to be stored for each compared reference value, by adding the comparison result to a quantity determined by considering a position of the compared reference value within a sequence of equal reference values in a current phoneme, utilizing a relationship between phonemes and sequences of numbers of equal reference values, predetermined with the aid of learning speech values, in accordance with the following rules;

a) if the compared reference value is at the end of a sequence of reference values which is in number greater than unity and less than the number of sequential equal reference values predetermined within a phoneme, said quantity is equal to the distance sum stored for the preceding reference value in the sequence;

b) if the compared reference value is at the end of a sequence of equal reference values greater in number than the number of sequential equal reference values predetermined within a phoneme, said quantity is formed by adding a time distortion value to the distance sum previously stored for said reference value;

c) if the compared reference value is the first reference value beginning a sequence of reference values in a new current current phoneme, said quantity is formed by selecting the minimum of sums of time distortion values and the distance sums stored for each reference value in the sequence of reference values in the phoneme preceding the new current phoneme.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

During the recognition, speech values which are derived from sample values of the speech signals are compared with reference values, the words of a given vocabulary each time being given by a sequence of reference values. The words are then determined from phonemes according to a fixed pronouncing lexicon and the reference values for the phonemes are determined in a learning phase, each phoneme within a word consisting of a number of equal reference values determined in the learning phase. In order to approach transitions between phonemes, each phoneme may also consist of three sections of each time constant reference values. By the given number of reference values per phoneme, the time duration of a phoneme in a given word can be simulated more accurately. Different possibilities are indicated to determine the reference values and the distance value during the recognition.

23 Citations

View as Search Results

14 Claims

1. A method of recognizing words, composed of phonemes, from a speech signal divided into successive sections, the speech signal in each section being converted to a speech value, said method comprising:
- storing reference values for comparison with successive speech values and storing distance sums for said reference values attained by comparisons of said reference values with preceding speech values;
  
  comparing each stored reference value within a given neighborhood to a current speech value to determine a comparison result; and
  
  determining a new distance sum to be stored for each compared reference value, by adding the comparison result to a quantity determined by considering a position of the compared reference value within a sequence of equal reference values in a current phoneme, utilizing a relationship between phonemes and sequences of numbers of equal reference values, predetermined with the aid of learning speech values, in accordance with the following rules;
  
  a) if the compared reference value is at the end of a sequence of reference values which is in number greater than unity and less than the number of sequential equal reference values predetermined within a phoneme, said quantity is equal to the distance sum stored for the preceding reference value in the sequence;
  
  b) if the compared reference value is at the end of a sequence of equal reference values greater in number than the number of sequential equal reference values predetermined within a phoneme, said quantity is formed by adding a time distortion value to the distance sum previously stored for said reference value;
  
  c) if the compared reference value is the first reference value beginning a sequence of reference values in a new current current phoneme, said quantity is formed by selecting the minimum of sums of time distortion values and the distance sums stored for each reference value in the sequence of reference values in the phoneme preceding the new current phoneme.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method as claimed in claim 1, wherein each speech value comprises a plurality of component values and said comparing step comprises forming each comparison result from the differences between the component values of speech value and reference value.
  - 3. A method as claimed in claim 2, wherein the comparison result is formed from the sum of the squares of the component value differences.
  - 4. A method as claimed in claim 2, wherein the comparison result is formed from the sum of the amounts of the component value differences.
  - 5. A method as claimed in claim 4, wherein the components of the reference values are produced by forming the median values of the component values of the speech values of learning speech signals associated with the respective reference value.
  - 6. A method as claimed in claim 2, wherein the components of the reference values are produced by forming the average values of the components of the speech values of learning speech signals associated with the respective reference value.
  - 7. A method as claimed in claim 1, further comprising in a learning phase, selecting prototype reference values from the speech values then produced and for each combination of prototype reference value and phoneme assigning a distance measure, in that said comparing step is such that as a comparison result the distance measure is used for the prototype reference value assigned to the respective reference value and to the phoneme.
  - 8. A method as claimed in claim 7, wherein by selecting the prototype reference values in such a manner that the sum of the distances of all learning speech values from the respective next prototype reference value is a minimum.
  - 9. A method as claimed in claim 7, wherein for assigning the distance measure, the logarithm of the ratio of the frequency of a prototype reference value in a phoneme to the frequency of all prototype reference values in this phoneme is determined.
  - 10. A method as claimed in claim 7, characterized in that for assigning the distance measure the probability of connection of prototype reference values and phonemes is approached in that differences of the frequency at which during the learning phase the individual prototype reference values have occurred and that at which the different phonemes have occurred are at least reduced by standardization.
  - 11. A method as claimed in claim 7, characterized in that for determining the distance measure the logarithm of the ratio of the frequency of a prototype reference value in a phoneme to the frequency of all prototype reference values in this phoneme is determined.

12. Apparatus for recognizing words, composed of phonemes, from a speech signal divided into successive sections, comprising:
- means for converting the speech signal in each section to a speech value;
  
  memory means for storing reference values for comparison with successive speech values and for storing distance sums for said reference values attained by comparisons of said reference values with preceding speech values; and
  
  processing circuit means responsive to said converting means and said memory means for comparing each stored reference value within a given neighborhood to a current speech value to determine a comparison result, and for determining a new distance sum to be stored in said memory means for each compared reference value, by adding the comparison result to a quantity determined by considering a position of the compared reference value within a sequence of equal reference values in a current phoneme, utilizing a relationship between phonemes and sequences of numbers of equal reference values, predetermined with the aid of learning speech values, in accordance with the following rules;
  
  a) if the compared reference value is at the end of a sequence of reference values which is in number greater than unity and less than the number of sequential equal reference values predetermined within a phoneme, said quantity is equal to the distance sum stored for the preceding reference value in the sequence;
  
  b) if the compared reference value is at the end of a sequence of equal reference values greater in number than the number of sequential equal reference values predetermined within a phoneme, said quantity is formed by adding a time distortion value to the distance sum previously stored for said reference value;
  
  c) if the compared reference value is the first reference value beginning a sequence of reference values in a new current current phoneme, said quantity is formed by selecting the minimum of sums of time distortion values and the distance sums stored for each reference value in the sequence of reference values in the phoneme preceding the new current phoneme.
- View Dependent Claims (13, 14)
- - 13. An apparatus as claimed in claim 12, wherein said processing circuit means is a microprocesor.
  - 14. An apparatus as claimed in claim 12, wherein the first memory (16) contains prototype reference values and fixedly associated distance measures and in that the processing circuit means (14) compares each new speech value with all prototype reference values and utilizes for each prototype reference value, the associated distance measure as comparison result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Noll, Andreas, Ney, Hermann
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/523,305
Time in Patent Office

522 Days
Field of Search

364/513, 364/513.5, 381/41-43
US Class Current

704/254
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Method of recognizing coherently spoken words

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

23 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method of recognizing coherently spoken words

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links