Parallel processing pitch detector
First Claim
1. A pitch detector system for human speech comprising:
- means for storing a predetermined number of evenly spaced samples of instantaneous amplitude of said speech as a speech frame;
means for generating residual samples from said speech samples;
a plurality of identical means each responsive to an individual predetemined portion of said residual samples of said frame for estimating a pitch value of said frame;
another plurality of identical means each responsive to an individual predetermined portion of said speech samples of said frame for estimating a pitch value of said frame;
means for calculating a final pitch value from the estimated pitch values from each of said plurality and said other plurality of estimating means wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value;
said calculating means comprises means responsive to all of said estimated pitch values having a value different than said predefined value for setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value;
means for constraining said final pitch value so that the calculated pitch value is consistent with calculated pitch values from previous frames;
said constraining means comprises means responsive to a first sequence of frames comprising a voiced frame and an unvoiced frame and a second voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said first sequence;
said generating means comprises a new pitch value generating means responsive to a second sequence of frames comprising an unvoiced frame and a voiced frame and a second unvoiced frame for generating a new calculated value indicating an unvoiced frame; and
said new pitch value generating means further reponsive to a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said third sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A pitch detector system for use with speech analysis and synthesis methods having a plurality of identical detectors each responsive to a different portion of a speech signal for estimating a pitch value and a voter circuit responsive to the estimated pitch values for determining a final pitch value. The pitch detectors are identical in design which allows for an efficient software implementation since only one set of program instructions is necessary to implement all of the encoders. The voter subsystem may be implemented by a digital signal processor executing program instructions that calculate a pitch value from the estimated pitch values determined by the pitch detectors and a second set of program instructions for constraining the final pitch value outputted by the voter subsystem so that the calculated pitch value is in agreement with calculated pitch values for previous frames. In addition, the pitch and voters may be implemented by a third set of program instructions executing on the same digital signal processor as the sets of instructions for the voter subsystem.
-
Citations
13 Claims
-
1. A pitch detector system for human speech comprising:
-
means for storing a predetermined number of evenly spaced samples of instantaneous amplitude of said speech as a speech frame; means for generating residual samples from said speech samples; a plurality of identical means each responsive to an individual predetemined portion of said residual samples of said frame for estimating a pitch value of said frame; another plurality of identical means each responsive to an individual predetermined portion of said speech samples of said frame for estimating a pitch value of said frame; means for calculating a final pitch value from the estimated pitch values from each of said plurality and said other plurality of estimating means wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value; said calculating means comprises means responsive to all of said estimated pitch values having a value different than said predefined value for setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value; means for constraining said final pitch value so that the calculated pitch value is consistent with calculated pitch values from previous frames; said constraining means comprises means responsive to a first sequence of frames comprising a voiced frame and an unvoiced frame and a second voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said first sequence; said generating means comprises a new pitch value generating means responsive to a second sequence of frames comprising an unvoiced frame and a voiced frame and a second unvoiced frame for generating a new calculated value indicating an unvoiced frame; and said new pitch value generating means further reponsive to a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said third sequence. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A pitch detector for human speech comprising:
-
means for storing a predetermined number of evenly spaced speech samples of instantaneous amplitude of said speech as a present speech frame; means for filtering said samples to produce residual samples of the speech remaining after the formant effects of the vocal tract have been substantially removed; first means responsive to positive valued ones of said speech samples for estimating a first pitch value of said present speech frame; second means responsive to negative valued ones of said speech samples for estimating a second pitch value of said present speech frame; third means responsive to positive valued ones of said residual samples for estimating a third pitch value of said present speech frame; a fourth means responsive to negative valued ones of said residual samples for estimating a fourth pitch value of said present speech frame; means for calculating a pitch value from the estimated pitch values from said first, second, third and fourth estimating means wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value; said calculating means comprises means responsive to all of said estimated pitch values having a value different than said predefined value for setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value; means for constraining said final pitch value so that the calculated pitch value is consistent with calculated pitch values from previous frames; said constraining means comprises means responsive to a first sequence of frames comprising a voiced frame and an unvoiced frame and a second voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said first sequence; means responsive to a second sequence of frames comprising an unvoiced frame and voiced frame and a second unvoiced frame for generating a new calculated value indicating an unvoiced frame; and said generating means further responsive to a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said third sequence. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for detecting the pitch of human speech with a system comprising a quantizer for converting the speech into frames of digital samples and a digital signal processor responsive to a plurality of program instructions and said frames of digital samples to determine the pitch of the speech, said method comprising the steps of:
-
producing residual samples of the digitized speech that remain after the formant effects of the vocal track have been substantially removed; estimating a first pitch value of a present speech frame in response to positive valued ones of said digitized speech samples; estimating a second pitch value of said present speech frame in response to negative valued ones of said digitized speech samples; estimating a third pitch value of said present speech frame in response to positive valued ones of said residual samples; and estimating a fourth pitch value of said present speech frame in response to negative valued ones of said residual samples; and calculating said final pitch value from said first, second, third, and fourth pitch values wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value; said step of calculating comprises the step of setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value; constraining said final pitch value so that said final pitch value is in agreement with final pitch values from previous frames by; said step constraining comprises the steps of generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of a first sequence of frames comprising a voiced frame and unvoiced frame and a second voiced frame; generating a new calculated value indicating an unvoiced frame in response to a second sequence of frames comprising an unvoiced frame and a voiced frame and a second unvoiced frame; and generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame. - View Dependent Claims (12, 13)
-
Specification