Producing time uniform feature vectors
First Claim
1. A method of processing a signal representing speech, the method comprising:
- receiving a region of the signal representing speech, wherein the region comprises a portion of a frame of the signal representing speech classified as a voiced frame and wherein the region is marked based on one or more pitch estimates for the region;
identifying a plurality of cords within the region of the signal based on occurrence of events within the region of the signal, wherein the events comprise glottal pulses and each cord begins with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and
normalizing the plurality of cords on a time basis, wherein the normalized plurality of cords each have a uniform duration on the time basis.
1 Assignment
0 Petitions
Accused Products
Abstract
Generally speaking, embodiments of the present invention relate to speech processing such as, for example, speech recognition. Speech processing according to one embodiment of the present invention can be performed based on the occurrence of events within the electrical signals representing speech. Such events need not comprise instantaneous occurrences but rather, an occurrence within the electrical signal spanning some period of time. Furthermore, the electrical signal can be analyzed based on the occurrence and location of these events so that less than all of the signal is analyzed. That is, the spoken sounds can be processed based on regions of the signal around and including the events but excluding other portions of the signal. For example, transition periods before the occurrence of the events may be excluded to eliminate noise or transients introduced at that part of the signal.
-
Citations
16 Claims
-
1. A method of processing a signal representing speech, the method comprising:
-
receiving a region of the signal representing speech, wherein the region comprises a portion of a frame of the signal representing speech classified as a voiced frame and wherein the region is marked based on one or more pitch estimates for the region; identifying a plurality of cords within the region of the signal based on occurrence of events within the region of the signal, wherein the events comprise glottal pulses and each cord begins with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and normalizing the plurality of cords on a time basis, wherein the normalized plurality of cords each have a uniform duration on the time basis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a classification module adapted to receive a region of a signal representing speech, wherein the region comprises a portion of a frame of the signal representing speech and wherein the region is marked based on one or more pitch estimates for the region; a cord finder module communicatively coupled with the classification module and adapted to receive the frame from the classification module and identify a plurality of cords within the region of the signal based on occurrence of events within the region of the signal, wherein the events comprise glottal pulses and each cord begins with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and a time normalization module communicatively coupled with the cord finder module and adapted to receive the plurality of extracted cords from the cord finder module and normalize the plurality of cords on a time basis, wherein the normalized the plurality of cords each have a uniform duration on the time basis. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification