Pattern recognition using a predictive neural network
First Claim
1. A method of recognizing a speech sound having a variable time interval as a recognized pattern selected from a plurality of reference patterns which represent categories of recognition objects, respectively, said method comprising the steps of:
- converting the speech sound into a digital speech signal;
analyzing the digital speech signal by using a spectral analysis method into a time sequence of input feature vectors;
storing parameters of the reference patterns, each of which is defined by a sequence of predictors, each of the predictors comprising an input layer, a hidden layer, and an output layer, the input layer comprising a plurality of primary input units and a plurality of secondary input units, the hidden layer comprising a plurality of hidden units, the output layer comprising a plurality of output units, the primary input units being connected with the hidden units by a first coefficient matrix, the secondary input units being connected with the hidden units by a second coefficient matrix, the hidden units being connected with the output units by a third coefficient matrix, said parameters corresponding to the first through the third coefficient matrices;
successively deriving a time sequence of predicted feature vectors and a sequence of new state vectors by supplying the time sequence of the input feature vectors and a sequence of preceding state vectors to the primary input units and the secondary input units, respectively, so that feedback occurs through successively supplying the preceding state vectors from said hidden units to the secondary input units to successively produce the new state vectors;
calculating a prediction error between the time sequence of the input feature vectors and the time sequence of the predicted feature vectors; and
selecting, as the recognized pattern, one of the reference patterns that minimizes the prediction error.
1 Assignment
0 Petitions
Accused Products
Abstract
Input feature vectors (a(t)) is considered a pattern selected from a plurality of reference patterns which represent categories of recognition objects. Each reference pattern is defined by a sequence of state models, successively supplied with the time sequence of the input feature vectors and with a sequence of preceding state vectors (h(t, s, n)). The sequence of the state models produces a time sequence of predicted feature vectors (A(t+1, s, n) and a sequence of new state vectors (h(t+1, s, n)). The recognized pattern is selected from one of the reference patterns that minimizes a prediction error between the time sequence of the input feature vectors and the time sequence of the predicted feature vectors. The prediction error is calculated by using a dynamic programming algorithm. Training of the reference pattern is carried out by a gradient descent method such as back-propagation technique.
19 Citations
5 Claims
-
1. A method of recognizing a speech sound having a variable time interval as a recognized pattern selected from a plurality of reference patterns which represent categories of recognition objects, respectively, said method comprising the steps of:
-
converting the speech sound into a digital speech signal; analyzing the digital speech signal by using a spectral analysis method into a time sequence of input feature vectors; storing parameters of the reference patterns, each of which is defined by a sequence of predictors, each of the predictors comprising an input layer, a hidden layer, and an output layer, the input layer comprising a plurality of primary input units and a plurality of secondary input units, the hidden layer comprising a plurality of hidden units, the output layer comprising a plurality of output units, the primary input units being connected with the hidden units by a first coefficient matrix, the secondary input units being connected with the hidden units by a second coefficient matrix, the hidden units being connected with the output units by a third coefficient matrix, said parameters corresponding to the first through the third coefficient matrices; successively deriving a time sequence of predicted feature vectors and a sequence of new state vectors by supplying the time sequence of the input feature vectors and a sequence of preceding state vectors to the primary input units and the secondary input units, respectively, so that feedback occurs through successively supplying the preceding state vectors from said hidden units to the secondary input units to successively produce the new state vectors; calculating a prediction error between the time sequence of the input feature vectors and the time sequence of the predicted feature vectors; and selecting, as the recognized pattern, one of the reference patterns that minimizes the prediction error.
-
-
2. A speech recognition device for recognizing a speech sound having a variable time interval as a recognized pattern selected from a plurality of reference patterns which represent categories of recognition objects, respectively, said speech recognition device comprising:
-
a converter for converting the speech sound into a digital speech signal; an analyzer connected to said converter for processing the digital speech signal, using a spectral analysis method, into a time sequence of input feature vectors; a storing unit for storing parameters of the reference patterns, each of which is defined by a sequence of predictors, each of the predictors comprising an input layer, a hidden layer, and an output layer, the input layer comprising a plurality of primary input units and a plurality of secondary input units, the hidden layer comprising a plurality of hidden units, the output layer comprising a plurality of output units, the primary input units being connected with the hidden units by a first coefficient matrix, the secondary input units being connected with the hidden units by a secondary coefficient matrix, the hidden units being connected with the output units by a third coefficient matrix, said parameters corresponding to the first through the third coefficient matrices; a calculator connected to said analyzer and to said storing unit for successively deriving a time sequence of predicted feature vectors and a sequence of new state vectors by supplying the time sequence of the input feature vectors and a sequence of preceding state vectors to the primary input units and the secondary input units, respectively, so that feedback occurs through successively supplying the preceding state vectors from said hidden units to the secondary input units to successively produce the new state vectors, said calculator calculating a prediction error between the time sequence of the input feature vectors and the time sequence of the predicted feature vectors; and a selector connected to said calculator for selecting, as the recognized pattern, one of the reference patterns that minimizes the prediction error. - View Dependent Claims (3)
-
-
4. A method of recognizing a speech sound, comprising the steps of:
-
analyzing a digital speech signal using a spectral analysis to produce a time sequence of input feature vectors; storing parameters of the reference patterns which represents a plurality of speech objects to be recognized, each of said parameters previously defined by a training operation performed on one of a plurality of predictors said predictors having an input layer, a hidden layer, and an output layer; successively supplying a time delay feedback from an output of said predictors to an input of said predictors by feeding back a sequence of new state vectors derived by said predictors to said input after a time delay so that said input is supplied with a sequence of preceding state vectors output from said predictors, wherein said time delay feedback supplies the preceding state vectors from said hidden layer to said input layer; successively deriving a time sequence of predicted feature vectors and said sequence of new state vectors from said time sequence of input feature vectors and said sequence of preceding state vectors by using said predictors; calculating a prediction error between said time sequence of the input feature vectors and said time sequence of the predicted feature vectors; and selecting, as the recognized pattern, one of the reference patterns that minimizes the prediction error.
-
-
5. A speech recognition device comprising:
-
an analyzer for processing a digital speech signal, using a spectral analysis method, into a time sequence of input feature vectors; a storing unit for storing parameters of reference patterns, each of said parameters being defined by a training operation performed on said speech recognition device; a calculator connected to said analyzer and to said storing unit, said calculator comprising; a plurality of predictors each having a first input, a second input, a first output, and a second output, said first output outputting a time sequence of predicted feature vectors, each of said plurality of predictors having an input layer, a hidden layer, and an output layer connected in series, said output layer outputting said time sequence of predicted feature vectors; a time delay feedback loop connecting said hidden layer to said input layer so that a sequence of new state vectors output from said hidden layer to the input layer are fed back as a sequence of preceding state vectors while said input layer simultaneously receives said time sequence of the input feature vectors; a plurality of distance calculators each connected to said first output of one of said predictors for determining a prediction error between the time sequence of the input feature vectors and the time sequence of predicted feature vectors; and a selector connected to said calculator for selecting, as a recognized pattern, one of said reference patterns that minimizes said prediction error.
-
Specification