Voice messaging system with unified pitch and voice tracking
First Claim
1. In a voice messaging system for receiving a human speech signal and reconstituting said human speech signal at a receiver which is spatially or temporally remote, the combination comprising:
- LPC analysis means for analyzing an analog speech signal provided as an input thereto in accordance with an LPC (Linear Predictive Coding) model, said LPC analysis means providing LPC parameters and a residual signal organized in a sequence of speech data frames and the respective residual signals corresponding thereto as an output representative of the analog speech signal;
pitch extraction means operably associated with said LPC analysis means for determining a plurality of pitch candidates for each of the speech data frames in said sequence;
optimization means operably associated with said LPC analysis means and said pitch extraction means for performing dynamic programming with respect both to said plurality of pitch candidates for each speech data frame and also to a voiced/unvoiced decision for each speech data frame to determine both an optimal pitch and an optimal voicing decision for each speech data frame in the context of sequence of speech data frames, said optimization means defining a transition error between each pitch candidate of the current frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate in the current frame which is equal to the transition error between said pitch candidate of said current frame plus the cumulative error of an optimally identified pitch cnadidate in the preceding frame, said optimally identified pitch candidate in the preceding frame being chosen from among the pitch candidates for said preceding frame such that the cumulative error of said corresponding pitch candidate in said current frame is at a minimum; and
means operably associated with said LPC analysis means, said pitch extraction means and said optimization means for encoding said LPC parameters and said optimal pitch and optimal voicing decision for each speech data frame.
1 Assignment
0 Petitions
Accused Products
Abstract
This voice messaging system provides an LPC analyzer in combination with a pitch extractor wherein LPC parameters and a residual signal organized in a sequence of speech data frames are provided by the LPC analyzer as an output representative of an analog speech signal. The pitch extractor is operably associated with the LPC analyzer and produces a plurality of pitch candidates for each of the speech data frames in the sequence thereof. Dynamic programming is performed on the plurality of pitch candidates for each speech data frame and also with respect to a voiced/unvoiced decision of the speech data for each frame by tracking both pitch and voicing from frame to frame to provide an optimal pitch value and also an optimal voicing decision. During dynamic programming, a cumulative penalty for a sequence of frame pitch/voicing decisions is accumulated by defining a transition error between each pitch candidate of a current speech data frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate of the current frame equal to the transition error between the pitch candidate of the current frame plus the cumulative error of an optimally identified pitch candidate in the preceding frame to locate the track providing optimal pitch and voicing decisions based upon the lowest cumulative penalty. An encoder then encodes the LPC parameters as generated by the LPC analyzer and the optimal pitch and voicing decisions for each speech data frame for subsequent use in providing an audible synthesized speech output substantially identical to the original speech input.
-
Citations
10 Claims
-
1. In a voice messaging system for receiving a human speech signal and reconstituting said human speech signal at a receiver which is spatially or temporally remote, the combination comprising:
-
LPC analysis means for analyzing an analog speech signal provided as an input thereto in accordance with an LPC (Linear Predictive Coding) model, said LPC analysis means providing LPC parameters and a residual signal organized in a sequence of speech data frames and the respective residual signals corresponding thereto as an output representative of the analog speech signal; pitch extraction means operably associated with said LPC analysis means for determining a plurality of pitch candidates for each of the speech data frames in said sequence; optimization means operably associated with said LPC analysis means and said pitch extraction means for performing dynamic programming with respect both to said plurality of pitch candidates for each speech data frame and also to a voiced/unvoiced decision for each speech data frame to determine both an optimal pitch and an optimal voicing decision for each speech data frame in the context of sequence of speech data frames, said optimization means defining a transition error between each pitch candidate of the current frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate in the current frame which is equal to the transition error between said pitch candidate of said current frame plus the cumulative error of an optimally identified pitch cnadidate in the preceding frame, said optimally identified pitch candidate in the preceding frame being chosen from among the pitch candidates for said preceding frame such that the cumulative error of said corresponding pitch candidate in said current frame is at a minimum; and means operably associated with said LPC analysis means, said pitch extraction means and said optimization means for encoding said LPC parameters and said optimal pitch and optimal voicing decision for each speech data frame. - View Dependent Claims (3, 4, 5, 6)
-
-
2. A method for determining the pitch and voicing of human speech comprising the steps of:
-
analyzing a speech signal input in accordance with an LPC (Linear Predictive Coding) model to provide LPC parameters and a residual signal organized into a sequence of speech data frames and the respective residual signals corresponding thereto; determining a plurality of pitch candidates for each of the speech data frames in said sequence; performing dynamic programming with respect both to said plurality of pitch candidates for each speech data frame and also to a voiced/unvoiced decision for each speech data frame by defining a transition error between each pitch candidate of the current frame and each pitch candidate of the preceding frame, defining a cumulative error for each pitch candidate of the current frame equal to the transition error between said pitch candidate of said current frame plus the cumulative error of an optimally identified pitch candidate in the preceding frame, and choosing said optimally identified pitch candidate in the preceding frame such that the cumulative error of said corresponding pitch candidate in said current frame is at a minimum; and determining both an optimal pitch and an optimal voicing decision for each speech data frame in the context of said sequence of speech data frames in response to the performance of said dynamic programming. - View Dependent Claims (7, 8, 9, 10)
-
Specification