Method of recognizing coherently spoken words
First Claim
1. A method of recognizing words, composed of phonemes, from a speech signal divided into successive sections, the speech signal in each section being converted to a speech value, said method comprising:
- storing reference values for comparison with successive speech values and storing distance sums for said reference values attained by comparisons of said reference values with preceding speech values;
comparing each stored reference value within a given neighborhood to a current speech value to determine a comparison result; and
determining a new distance sum to be stored for each compared reference value, by adding the comparison result to a quantity determined by considering a position of the compared reference value within a sequence of equal reference values in a current phoneme, utilizing a relationship between phonemes and sequences of numbers of equal reference values, predetermined with the aid of learning speech values, in accordance with the following rules;
a) if the compared reference value is at the end of a sequence of reference values which is in number greater than unity and less than the number of sequential equal reference values predetermined within a phoneme, said quantity is equal to the distance sum stored for the preceding reference value in the sequence;
b) if the compared reference value is at the end of a sequence of equal reference values greater in number than the number of sequential equal reference values predetermined within a phoneme, said quantity is formed by adding a time distortion value to the distance sum previously stored for said reference value;
c) if the compared reference value is the first reference value beginning a sequence of reference values in a new current current phoneme, said quantity is formed by selecting the minimum of sums of time distortion values and the distance sums stored for each reference value in the sequence of reference values in the phoneme preceding the new current phoneme.
0 Assignments
0 Petitions
Accused Products
Abstract
During the recognition, speech values which are derived from sample values of the speech signals are compared with reference values, the words of a given vocabulary each time being given by a sequence of reference values. The words are then determined from phonemes according to a fixed pronouncing lexicon and the reference values for the phonemes are determined in a learning phase, each phoneme within a word consisting of a number of equal reference values determined in the learning phase. In order to approach transitions between phonemes, each phoneme may also consist of three sections of each time constant reference values. By the given number of reference values per phoneme, the time duration of a phoneme in a given word can be simulated more accurately. Different possibilities are indicated to determine the reference values and the distance value during the recognition.
23 Citations
14 Claims
-
1. A method of recognizing words, composed of phonemes, from a speech signal divided into successive sections, the speech signal in each section being converted to a speech value, said method comprising:
-
storing reference values for comparison with successive speech values and storing distance sums for said reference values attained by comparisons of said reference values with preceding speech values; comparing each stored reference value within a given neighborhood to a current speech value to determine a comparison result; and determining a new distance sum to be stored for each compared reference value, by adding the comparison result to a quantity determined by considering a position of the compared reference value within a sequence of equal reference values in a current phoneme, utilizing a relationship between phonemes and sequences of numbers of equal reference values, predetermined with the aid of learning speech values, in accordance with the following rules; a) if the compared reference value is at the end of a sequence of reference values which is in number greater than unity and less than the number of sequential equal reference values predetermined within a phoneme, said quantity is equal to the distance sum stored for the preceding reference value in the sequence; b) if the compared reference value is at the end of a sequence of equal reference values greater in number than the number of sequential equal reference values predetermined within a phoneme, said quantity is formed by adding a time distortion value to the distance sum previously stored for said reference value; c) if the compared reference value is the first reference value beginning a sequence of reference values in a new current current phoneme, said quantity is formed by selecting the minimum of sums of time distortion values and the distance sums stored for each reference value in the sequence of reference values in the phoneme preceding the new current phoneme. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. Apparatus for recognizing words, composed of phonemes, from a speech signal divided into successive sections, comprising:
-
means for converting the speech signal in each section to a speech value; memory means for storing reference values for comparison with successive speech values and for storing distance sums for said reference values attained by comparisons of said reference values with preceding speech values; and processing circuit means responsive to said converting means and said memory means for comparing each stored reference value within a given neighborhood to a current speech value to determine a comparison result, and for determining a new distance sum to be stored in said memory means for each compared reference value, by adding the comparison result to a quantity determined by considering a position of the compared reference value within a sequence of equal reference values in a current phoneme, utilizing a relationship between phonemes and sequences of numbers of equal reference values, predetermined with the aid of learning speech values, in accordance with the following rules; a) if the compared reference value is at the end of a sequence of reference values which is in number greater than unity and less than the number of sequential equal reference values predetermined within a phoneme, said quantity is equal to the distance sum stored for the preceding reference value in the sequence; b) if the compared reference value is at the end of a sequence of equal reference values greater in number than the number of sequential equal reference values predetermined within a phoneme, said quantity is formed by adding a time distortion value to the distance sum previously stored for said reference value; c) if the compared reference value is the first reference value beginning a sequence of reference values in a new current current phoneme, said quantity is formed by selecting the minimum of sums of time distortion values and the distance sums stored for each reference value in the sequence of reference values in the phoneme preceding the new current phoneme. - View Dependent Claims (13, 14)
-
Specification