System and method for improved pitch estimation which performs first formant energy removal for a frame using coefficients from a prior frame
First Claim
1. A method for performing pitch estimation which pre-filters speech data prior to pitch estimation with improved performance, comprising:
- receiving a speech waveform comprising a plurality of frames;
analyzing a plurality of speech frames, wherein said plurality of speech frames include a first frame of speech data and a second frame of speech data;
calculating coefficients for said first frame of speech data;
filtering said second frame of speech data, wherein said filtering uses one or more coefficients from said first frame of speech data as a multi-pole analysis filter, wherein said filtering removes undesired signal information from said speech data in said second frame;
performing pitch estimation on said second frame of speech data after said filtering;
wherein said filtering removes first Formant energy from said second frame of speech data.
8 Assignments
0 Petitions
Accused Products
Abstract
An improved vocoder system and method for estimating pitch in a speech waveform which pre-filters speech data with improved efficiency and reduced computational requirements. The vocoder system is preferably a low bit rate speech coder which analyzes a plurality of frames of speech data in parallel. Once the LPC filter coefficients and the pitch for a first frame have been calculated, the vocoder then looks ahead to the next frame to estimate the pitch, i.e., to estimate the pitch of the next frame. In the preferred embodiment of the invention, the vocoder filters speech data in a second frame using a plurality of the coefficients from a first frame as a multi pole analysis filter. These coefficients are used as a "crude" two pole analysis filter. The vocoder preferably includes a first processor which performs coefficient calculations for the second frame, and a second processor which performs pre-filtering and pitch estimation, wherein the second processor operates substantially simultaneously with the first processor. Thus, the vocoder system uses LPC coefficients for a first frame as a "crude" multi pole analysis filter for a subsequent frame of data, thereby performing pre-filtering on a frame without requiring previous coefficient calculations for that frame. This allows pre-filtered pitch estimation and LPC coefficient calculations to be performed in parallel. This provides a more efficient pitch estimation, thus enhancing vocoder performance.
22 Citations
20 Claims
-
1. A method for performing pitch estimation which pre-filters speech data prior to pitch estimation with improved performance, comprising:
-
receiving a speech waveform comprising a plurality of frames; analyzing a plurality of speech frames, wherein said plurality of speech frames include a first frame of speech data and a second frame of speech data; calculating coefficients for said first frame of speech data; filtering said second frame of speech data, wherein said filtering uses one or more coefficients from said first frame of speech data as a multi-pole analysis filter, wherein said filtering removes undesired signal information from said speech data in said second frame; performing pitch estimation on said second frame of speech data after said filtering; wherein said filtering removes first Formant energy from said second frame of speech data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A vocoder which pre-filters speech data prior to pitch estimation with improved performance, comprising:
-
means for receiving a plurality of digital samples of a speech waveform, wherein the speech waveform includes a plurality of frames each comprising a plurality of samples; two or more processors for analyzing a plurality of speech frames, wherein said plurality of speech frames include a first frame of speech data and a second frame of speech data, wherein said two or more processors include; a first processor which calculates coefficients for said first frame of speech data, wherein said first processor also calculates coefficients for said second frame of speech data; and a second processor which filters said second frame of speech data using one or more coefficients from said first frame of speech data as a multi-pole analysis filter, wherein said filtering removes undesired signal information from said speech data in said second frame;
wherein said second processor also performs pitch estimation on said second frame of speech data after said filtering, wherein said second processor performs said filtering of said second frame of speech data in parallel with operation of said first processor calculating coefficients for said second frame of speech data;wherein said second processor filters said second frame of speech data using said one or more coefficients from said first frame of speech data as a multi-pole analysis filter to remove first Formant energy from said second frame of speech data. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A method for performing pitch estimation which pre-filters speech data prior to pitch estimation with improved performance, comprising:
-
receiving a speech waveform comprising a plurality of frames; analyzing a plurality of speech frames, wherein said plurality of speech frames include a first frame of speech data and a second frame of speech data; calculating coefficients for said first frame of speech data; calculating a subset of coefficients for said second frame of speech data; filtering said second frame of speech data, wherein said filtering uses said subset of coefficients from said second frame of speech data as a multi-pole analysis filter, wherein said filtering removes undesired signal information from said speech data in said second frame; performing pitch estimation on said second frame of speech data after said filtering; wherein said filtering removes first Formant energy from said second frame of speech data. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A vocoder which pre-filters speech data prior to pitch estimation with improved performance, comprising:
-
means for receiving a plurality of digital samples of a speech waveform, wherein the speech waveform includes a plurality of frames each comprising a plurality of samples; a processor for analyzing a plurality of speech frames, wherein said plurality of speech frames include a first frame of speech data and a second frame of speech data, wherein said processor calculates coefficients for said first frame of speech data, wherein said processor filters said second frame of speech data using one or more coefficients from said first frame of speech data as a multi-pole analysis filter, wherein said filtering removes undesired signal information from said speech data in said second frame; wherein said processor performs pitch estimation on said second frame of speech data after said filtering; wherein said processor filters said second frame of speech data using said one or more coefficients from said first frame of speech data as a multi-pole analysis filter to remove first Formant energy from said second frame of speech data.
-
Specification