Statistical methods and apparatus for pitch extraction in speech recognition, synthesis and regeneration

US 6,035,271 A
Filed: 10/31/1997
Issued: 03/07/2000
Est. Priority Date: 03/15/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A method for pitch extraction in speech recognition, synthesis and regeneration comprising the steps of:

performing autocorrelation of a digitized speech input to produce an autocorrelation function;

selecting at least the three highest peaks from the autocorrelation function;

calculating top ranked frequencies for the at least three highest peaks;

determining a plurality of frequency candidates from the calculated frequencies;

identifying valid and non-valid frames of the input speech;

determining pitch values for each frame of the received input speech using the positions of the selected peaks and an energy value representing the instantaneous voice energy;

maintaining a running average of determined pitch values; and

performing a weighted dynamic least squares fit of the identified valid and non-valid frames to estimate the pitch value using a least squares fit to a cubic function.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for extracting pitch value information from speech. The method selects at least three highest peaks from a normalized autocorrelation function and produces a plurality of frequency candidates for pitch value determination. The plurality of frequency candidates are used to identify anchor points in pitch values, and is further used to perform both forward and backward searching when an anchor point cannot be readily identified. The running mean or average of determined pitch values is maintained and used in conjunction with the identified valid pitch values in a final determination of the pitch estimation using a weighted least squares fit for identified non-valid frames.

62 Citations

View as Search Results

11 Claims

1. A method for pitch extraction in speech recognition, synthesis and regeneration comprising the steps of:
- performing autocorrelation of a digitized speech input to produce an autocorrelation function;
  
  selecting at least the three highest peaks from the autocorrelation function;
  
  calculating top ranked frequencies for the at least three highest peaks;
  
  determining a plurality of frequency candidates from the calculated frequencies;
  
  identifying valid and non-valid frames of the input speech;
  
  determining pitch values for each frame of the received input speech using the positions of the selected peaks and an energy value representing the instantaneous voice energy;
  
  maintaining a running average of determined pitch values; and
  
  performing a weighted dynamic least squares fit of the identified valid and non-valid frames to estimate the pitch value using a least squares fit to a cubic function.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein said step of determining pitch values further comprises the steps of:
    - determining whether one of said plurality of frequency candidates is an anchor point;
      
      identifying the frame as valid raw data when one of said plurality of frequency candidates is determined to be an anchor point;
      
      determining whether a previous input frame of speech was identified as valid when an anchor point is not determined;
      
      conducting a forward search of the plurality of frequency candidates when the previous frame was identified as valid;
      
      identifying a pitch value from the forward search when such value exists; and
      
      identifying a left over frame when said steps of determining whether a previous input frame was valid and conducting a forward search have negative results.
  - 3. The method according to claim 2, wherein said step of identifying the input frame as valid raw data further comprises the steps of:
    - determining whether a left over frame has been previously identified;
      
      conducting a backward search of said plurality of frequency candidates when there is a previous frame left over;
      
      identifying a pitch value from the backward search when such a value exists; and
      
      identifying the frame as non-valid when said backward search does not identify a valid pitch value.
  - 4. The method according to claim 1, further comprising the step of normalizing the autocorrelation function with respect amplitude.

5. An apparatus for pitch extraction in speech recognition, synthesis and regeneration comprising:
- input means for receiving a speech waveform;
  
  processing means connected to said input means for receiving said speech waveform;
  
  means for generating an autocorrelation function of the input speech waveform and extracting raw pitch values from frames of the autocorrelation function of said input speech waveform by using acoustic occurrences that occur both prior to and after a moment of pitch maintaining a running average of determined row pitch valuesmeans for estimating true pitch values by processing the raw pitch values using a weighted dynamic least squares process using a least squares fit to a cubic function.
- View Dependent Claims (6, 7)
- - 6. The apparatus according to claim 5, wherein said input means is one selected from a group consisting of a microphone, a telephone, a recorded medium, and a broadcasted medium.
  - 7. The apparatus according to claim 5, wherein said processing means, said generating means, said extracting means and said estimating means comprise a general purpose computer programmed to perform these functions.

8. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for extracting pitch from a speech signal, the method comprising the steps of:
- performing autocorrelation of a digitized speech waveform to produce an autocorrelation function;
  
  selecting at least three highest peaks from the autocorrelation function for each frame of the digitized waveform;
  
  selecting a plurality of frequency candidates for each frame, the frequency candidates being three top-ranked frequencies calculated from the at least three highest peaks and at least the first and second harmonics of the at least three calculated top-ranked frequencies;
  
  determining a raw pitch value for each frame using the plurality of frequency candidates and an energy value representing instantaneous voice energy;
  
  maintaining a running average of determined raw pitch values;
  
  identifying valid and non-valid frames of the input speech, wherein the valid frames have a determined raw pitch value and the non-valid frames do not have a determined raw pitch value;
  
  assigning the running average of the determined raw pitch values as the raw pitch value for an identified non-valid frame; and
  
  performing a weighted dynamic least squares fit of the identified valid and non-valid frames to estimate the pitch value using a least squares fit to a cubic function.
- View Dependent Claims (9, 10, 11)
- - 9. The program storage device of claim 8, wherein the instructions for determining the raw pitch values further comprise instructions for:
    - evaluating current frame using a criterion of anchor points to determine if the current frame is an anchor point;
      
      identifying the current frame as valid and assigning as the raw pitch value the frequency corresponding to the highest peak in the autocorrelation function, if the frame satisfies the anchor point criterion;
      
      determining whether a previous frame of speech was identified as valid when the current frame is not identified as valid;
      
      conducting a forward search using the frequency candidates of the current frame when the previous frame was identified as valid;
      
      identifying the current frame as valid and assigning, based on the forward search, a raw pitch value as the frequency candidate that is within a specified distance to the raw pitch value of the previous frame; and
      
      identifying the current frame as a left over frame when said steps of determining whether a previous input frame was valid and conducting a forward search have negative results.
  - 10. The program storage device of claim 9, wherein the instructions for determining the raw pitch values further comprise instructions for:
    - determining whether a left over frame has been previously identified;
      
      conducting a backward search of said plurality of frequency candidates when there is a previous frame left over;
      
      identifying a pitch value from the backward search when such a value exists; and
      
      identifying the frame as non-valid when said backward search does not identify a valid pitch value.
  - 11. The program storage device of claim 8, further comprising instructions for normalizing the autocorrelation with respect amplitude.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chen, Chengjun Julian
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Azad, Abul K.

Application Number

US08/961,733
Time in Patent Office

858 Days
Field of Search

704/207, 704/216, 704/217, 704/218, 704/237, 704/263
US Class Current

704/207
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 25/06   the extracted parameters be...

G10L 25/15   the extracted parameters be...

G10L 25/90   Pitch determination of spee...

Statistical methods and apparatus for pitch extraction in speech recognition, synthesis and regeneration

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

62 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Statistical methods and apparatus for pitch extraction in speech recognition, synthesis and regeneration

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links