Voice messaging system with pitch tracking based on adaptively filtered LPC residual signal
First Claim
1. A voice messaging system for encoding and regenerating human speech comprising:
- LPC analysis means for analyzing an analog speech signal provided as an input thereto in accordance with an LPC (Linear Predictive Coding) model, said LPC analysis means providing LPC parameters and a residual signal as an output representative of the analog speech signal;
adaptive filter means operably coupled to the output of said LPC analysis means for receiving said residual signal and at least one LPC parameter from said LPC analysis means, said adaptive filter means filtering said residual signal in accordance with a time-varying filter characteristic defined by said at least one LPC parameter, wherein the time-varying filter characteristic provides for the removal of high frequency noise from the residual signal during periods of voiced speech and for the retention of high frequency energy in the residual signal during periods of unvoiced speech, to provide an adaptively filtered residual signal as an output therefrom;
means operably connected to the output of said adaptive filter means for extracting pitch and voicing information from said adaptively filtered residual signal; and
means operably connected to the outputs of said extracting means and said LPC analysis means for encoding said pitch and voicing information and said LPC parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice messaging system, wherein linear predictive coding (LPC) parameters, pitch, and preferably other excitation information is derived from a human voice input, encoded, and transmitted and/or stored, to be called up later to provide a speech output which is nearly identical to the original speech input. The invention features adaptive filtering of the residual signal. The residual signal derived from LPC estimation is adaptively filtered, and then is used as the input to a conventional pitch estimation procedure. The adaptive filtering step uses the first reflection coefficient (k1) to realize a simple filter (e.g., A(z)=(1-k1 z-1)-1. This filter removes high frequency noise from the residual signal during voiced periods, but does not remove the high frequency energy which contains important information during the unvoiced periods of speech. Preferably the above preprocessing technique is also combined with a postprocessing technique, wherein dynamic programming is used to optimally track pitch and voicing information through successive frames.
68 Citations
20 Claims
-
1. A voice messaging system for encoding and regenerating human speech comprising:
-
LPC analysis means for analyzing an analog speech signal provided as an input thereto in accordance with an LPC (Linear Predictive Coding) model, said LPC analysis means providing LPC parameters and a residual signal as an output representative of the analog speech signal; adaptive filter means operably coupled to the output of said LPC analysis means for receiving said residual signal and at least one LPC parameter from said LPC analysis means, said adaptive filter means filtering said residual signal in accordance with a time-varying filter characteristic defined by said at least one LPC parameter, wherein the time-varying filter characteristic provides for the removal of high frequency noise from the residual signal during periods of voiced speech and for the retention of high frequency energy in the residual signal during periods of unvoiced speech, to provide an adaptively filtered residual signal as an output therefrom; means operably connected to the output of said adaptive filter means for extracting pitch and voicing information from said adaptively filtered residual signal; and means operably connected to the outputs of said extracting means and said LPC analysis means for encoding said pitch and voicing information and said LPC parameters. - View Dependent Claims (2, 3, 4)
-
-
5. A method for determining the pitch of human speech comprising the steps of:
-
analyzing a speech signal input in accordance with an LPC (Linear Predictive Coding) model to provide LPC parameters and a residual signal; adaptively filtering said residual signal in accordance with a time-varying filtering characteristic as defined by at least one of said LPC parameters provided by the analyzing of said speech signal input, wherein the time-varying filtering characteristic provides for the removal of high frequency noise from the residual signal during periods of voiced speech and for the retention of higher frequency energy in the residual signal during periods of unvoiced speech, to provide an adaptively filtered residual signal; and extracting pitch period candidates from said adaptively filtered residual signal. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for determining the pitch of human speech, comprising the steps of:
-
receiving an input speech signal at a sample rate; analyzing said input speech signal according to an LPC (Linear Predictive Coding) model to provide LPC parameters and a residual signal, wherein said LPC parameters are calculated in a sequence of frames at a predetermined frame rate, and wherein the sample rate at which said input speech signal is received is much higher than said frame rate; adaptively filtering said residual signal by a filter having a time-varying filtering characteristic defined by at least one of said LPC parameters provided by said LPC analyzing step, wherein the time-varying filtering characteristic provides for the removal of high frequency noise from the residual signal during periods of voiced speech and for the retention of high frequency energy in the residual signal during periods of unvoiced speech, to provide an adaptively filtered residual signal; extracting pitch period candidates from said adaptively filtered residual signal; performing dynamic programming with respect both to said pitch period candidates for each frame and also to a voiced/unvoiced decision for each frame to determine both an optimal pitch period and an optimal voicing decision for each frame, in the context of said sequence of frames, said dynamic programming step defining a transition error between each period candidate of the current frame and each candidate of the preceding frame and wherein a cumulative error is defined for each pitch period candidate in the current frame which is equal to the transition error between said pitch period candidate of said current frame plus the cumulative error at an optimally identified pitch period candidate in the preceding frame chosen from among said pitch period candidates in said preceding frame such that the cumulative error of said corresponding pitch period candidate in said current frame is at a minimum; and determining an optimal pitch and voicing decision for each said frame in accordance with said dynamic programming performance. - View Dependent Claims (17, 18, 19, 20)
-
Specification