Speech recognition using dynamic features
First Claim
1. A method for speech encoding, comprising the steps of:
- producing a set of N distinct principal discriminant matrices, each principal discriminant matrix being associated with a different class, each class being an indication of the proximity of a speech segment to one or more neighboring speech segments,arranging a speech signal into a series of frames;
deriving a feature vector which represents said speech signal for each frame; and
generating a set of N different projected vectors for each frame, by multiplying each of said N distinct principal discriminant matrices by said feature vector.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition technique utilizes a set of N different principal discriminant matrices. Each principal discriminant matrix is associated with a distinct class. The class is an indication of the proximity of a speech segment to neighboring phones. A technique for speech encoding includes arranging speech signal into a series of frames. A feature vector is derived which represents the speech signal for a speech segment or series of speech segments for each frame. A set of N different projected vectors are generated for each frame, by multiplying the principal discriminant matrices by the vector. This speech encoding technique is capable of being used in speech recognition systems by utilizing models, in which each model transition is tagged with one of the N classes. The projected vector is utilized with the corresponding tag to compute the probability that at least one particular speech port is present in said frame.
36 Citations
25 Claims
-
1. A method for speech encoding, comprising the steps of:
-
producing a set of N distinct principal discriminant matrices, each principal discriminant matrix being associated with a different class, each class being an indication of the proximity of a speech segment to one or more neighboring speech segments, arranging a speech signal into a series of frames; deriving a feature vector which represents said speech signal for each frame; and
generating a set of N different projected vectors for each frame, by multiplying each of said N distinct principal discriminant matrices by said feature vector. - View Dependent Claims (2, 3, 4)
-
-
5. A method for speech recognition, the method of speech recognition comprising the steps of:
-
deriving N distinct transformations, each distinct transformation is respectively associated with one of N classes, each class providing an indication of the proximity of a speech segment to one or more neighboring speech segments, arranging a speech signal into a series of frames; deriving a vector, within each said frame, which represents said speech signal; generating a set of N different projected vectors for each frame, by multiplying said transformations by said vector; utilizing models for tagging each model transition with one of said N classes; and utilizing the projected vector with the corresponding tag to compute a probability that a particular speech segment is present in said frame. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. An apparatus for speech encoding comprising:
-
means for producing a set of N distinct principal discriminant matrices, each principal discriminant matrix being associated with a different class, the class being an indication of the proximity of the speech segment to one or more neighboring speech segments; means for arranging a speech signal into a series of frames; means for deriving a feature vector which represents said speech signal for each frame; and means for generating a set of N different projected vectors for each frame, by multiplying each of said principal discriminant matrices by said vector. - View Dependent Claims (12, 13, 14)
-
-
15. A speech recognition system comprising:
-
means for arranging speech segments into a series of frames; means for deriving a vector, within each of said frames, which represents said speech signal; means for deriving N distinct transformations, each distinct transformation is respectively associated with one of N classes, each class providing an indication of the proximity of a speech part to neighboring speech parts, means for generating a set of N different, projected vectors for each frame, by multiplying said N transformations by said vector; means for utilizing models for tagging each model transition with one of said N classes; and means for utilizing the projected vector with the corresponding tag to compute the probability that a particular speech part is present in said frame. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method for speech recognition which comprises the steps of:
-
arranging a speech signal into a series of frames; varying the width of one or more windows to be utilized for a speech encoding system in accordance with a principal discriminant matrix, each window being defined as a number of successive frames which have a same speech segment associated therewith; deriving a feature vector which represents said speech signal for each frame; and generating a projected vector for each frame by multiplying said principal discriminant matrix by said feature vector, wherein said principal discriminant matrix represents the values of the projected vectors which are indicative of the speech signal. - View Dependent Claims (22)
-
-
23. An apparatus which comprises:
-
means for arranging a speech signal into a series of frames; means for varying the width of one or more windows to be utilized for a speech encoding system, based upon a principal discriminant matrix, each window is defined as the number of successive frames which has the same speech segment associated with it, means for deriving a feature vector which represents said speech signal for a speech segment or series of speech segments for each frame; and means for generating a projected vector for each frame by multiplying said principal discriminant matrix by said feature vector, wherein said principal discriminant matrix equates the values of the projected vectors which are representative of the speech signal. - View Dependent Claims (24)
-
-
25. A method for applying a value to each tag from a series of tags, to be utilized in a speech recognition application, comprising the steps of:
-
determine whether a frame F belongs to a phone whose duration is M frames or less, if so, set the tag for each frame in the phone at a first value;
otherwise, proceed with the next step;determine whether the window of frame F overlaps the preceding phone by N frames or more, if so, set the value of the of the tag at a second value, otherwise proceed with the next step; determine whether the window overlaps the following phone by N frames or more, if so, set frame tag at a third value, otherwise proceed with the next step; determine whether the window overlaps the preceding phone at all, if so, set the tag to a fourth value, otherwise proceed with the next step; determine whether the window overlaps the following phone at all, if so, set the tag to a fifth value, otherwise proceed to the next step; and set the tag to a sixth value.
-
Specification