Producing phonitos based on feature vectors
First Claim
1. A method of processing a signal representing speech, the method comprising:
- receiving a region of the signal representing speech, wherein the region comprises a portion of a frame of the signal representing speech classified as a voiced frame and wherein the region is marked based on one or more pitch estimates for the region;
identifying one or more cords within the region of the signal based on occurrence of one or more events within the region of the signal, wherein the one or more events comprise one or more glottal pulses and the cord begins with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and
determining a phoneme for the voiced frame based on at least one of the one or more identified cords.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a first frame of the signal, the first frame comprising a voiced frame. One or more cords can be extracted from the voiced frame based on occurrence of one or more events within the frame. For example, the one or more events can comprise one or more glottal pulses. The one or more cords can collectively comprise less than all of the frame. For example, each of the cords can begin with onset of a glottal pulse and extend to a point prior to an onset of neighboring glottal pulse but may exclude a portion of the frame prior to the onset of the neighboring glottal pulse. A phoneme for the voiced frame can be determined based on at least one of the extracted cords.
16 Citations
18 Claims
-
1. A method of processing a signal representing speech, the method comprising:
-
receiving a region of the signal representing speech, wherein the region comprises a portion of a frame of the signal representing speech classified as a voiced frame and wherein the region is marked based on one or more pitch estimates for the region; identifying one or more cords within the region of the signal based on occurrence of one or more events within the region of the signal, wherein the one or more events comprise one or more glottal pulses and the cord begins with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and determining a phoneme for the voiced frame based on at least one of the one or more identified cords. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a classification module adapted to receive a first frame of a signal representing speech and classify the first frame as a voiced frame; a pitch estimation and marking module communicatively coupled with the classification module and adapted to receive the voiced frame from the classification module and to mark a region of the voiced frame based on one or more pitch estimates for the region; a cord finder module communicatively coupled with the pitch estimation and marking module and adapted to receive the marked region of the signal from the pitch estimation and marking module and to identify one or more cords within the region of the signal based on occurrence of one or more events within the region of the signal, wherein the one or more events comprise one or more glottal pulses and the cords begin with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and a phoneme determination module communicatively coupled with the cord finder module and adapted to receive the one or more identified cords from the cord finder module and determine a phoneme for the voiced frame based on at least one of the one or more identified cords. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A machine-readable memory having stored thereon a series of instructions which, when executed by a processor, cause the processor to process a signal representing speech by:
-
receiving a region of the signal representing speech, wherein the region comprises a portion of a frame of the signal representing speech classified as a voiced frame and wherein the region is marked based on one or more pitch estimates for the region; identifying one or more cords within the region of the signal based on occurrence of one or more events within the region of the signal, wherein the one or more events comprise one or more glottal pulses and the cord begins with onset of a first glottal pulse and extends to a point prior to an onset of a second glottal pulse but excludes a portion of the region of the signal prior to the onset of the second glottal pulse; and determining a phoneme for the voiced frame based on at least one of the one or more identified cords. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification