Speech transformation using log energy and orthogonal matrix
First Claim
Patent Images
1. A method of generating features for use with speech responsive apparatus, said method comprising:
- calculating the logarithmic frame energy value of each of a sequence of a predetermined number n of frames of an input speech signal; and
applying a predetermined orthogonal transform matrix to the n logarithmic frame every values to form a frame energy vector representing the input speech signal, wherein the predetermined orthogonal transform matrix encodes temporal information such that the elements of the off-diagonal of the covariance matrix of the frame energy vector are substantially zero.
1 Assignment
0 Petitions
Accused Products
Abstract
Calculate the log frame energy value of each of a pre-determined number n of frames of an input speech signal and apply a matrix transform to the n log frame energy values to form a temporal matrix representing the input speech signal. The matrix transform may be a discrete cosine transform.
175 Citations
9 Claims
-
1. A method of generating features for use with speech responsive apparatus, said method comprising:
-
calculating the logarithmic frame energy value of each of a sequence of a predetermined number n of frames of an input speech signal; and
applying a predetermined orthogonal transform matrix to the n logarithmic frame every values to form a frame energy vector representing the input speech signal, wherein the predetermined orthogonal transform matrix encodes temporal information such that the elements of the off-diagonal of the covariance matrix of the frame energy vector are substantially zero. - View Dependent Claims (2, 3, 4)
-
-
5. A method of speech recognition comprising:
-
receiving an input signal representing speech, said input signal being divided into frames;
generating a feature by calculating the logarithmic frame energy value of each of a predetermined number n frames of the input speech signal;
applying a predetermined orthogonal transform matrix to the n logarithmic frame energy values to form a frame energy vector representing the input speech signal, the predetermined orthogonal transform matrix encoding temporal information such that the elements of the off-diagonal of the covariance matrix of the frame energy vector are substantially zero;
comparing the generated feature with recognition data representing allowed utterances, said recognition data relating to the feature; and
indicating recognition or otherwise on the basis of the comparison step. - View Dependent Claims (6)
-
-
7. Feature generating apparatus for use with speech responsive apparatus, said feature generating apparatus comprising:
-
a processor for calculating the logarithm of the energy of each of a predetermined number n of frames of an input speech signal; and
a processor for applying a predetermined orthogonal transform matrix to the n logarithmic energy values so calculated to form a frame energy vector representing the input speech signal, the predetermined orthogonal transform matrix encoding temporal information such that the elements of the off diagonal of the covariance matrix of the frame energy vector are substantially zero. - View Dependent Claims (8, 9)
-
Specification