Parallel processing pitch detector

US 4,879,748 A
Filed: 08/28/1985
Issued: 11/07/1989
Est. Priority Date: 08/28/1985
Status: Expired due to Fees

First Claim

Patent Images

1. A pitch detector system for human speech comprising:

means for storing a predetermined number of evenly spaced samples of instantaneous amplitude of said speech as a speech frame;

means for generating residual samples from said speech samples;

a plurality of identical means each responsive to an individual predetemined portion of said residual samples of said frame for estimating a pitch value of said frame;

another plurality of identical means each responsive to an individual predetermined portion of said speech samples of said frame for estimating a pitch value of said frame;

means for calculating a final pitch value from the estimated pitch values from each of said plurality and said other plurality of estimating means wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value;

said calculating means comprises means responsive to all of said estimated pitch values having a value different than said predefined value for setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value;

means for constraining said final pitch value so that the calculated pitch value is consistent with calculated pitch values from previous frames;

said constraining means comprises means responsive to a first sequence of frames comprising a voiced frame and an unvoiced frame and a second voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said first sequence;

said generating means comprises a new pitch value generating means responsive to a second sequence of frames comprising an unvoiced frame and a voiced frame and a second unvoiced frame for generating a new calculated value indicating an unvoiced frame; and

said new pitch value generating means further reponsive to a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said third sequence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A pitch detector system for use with speech analysis and synthesis methods having a plurality of identical detectors each responsive to a different portion of a speech signal for estimating a pitch value and a voter circuit responsive to the estimated pitch values for determining a final pitch value. The pitch detectors are identical in design which allows for an efficient software implementation since only one set of program instructions is necessary to implement all of the encoders. The voter subsystem may be implemented by a digital signal processor executing program instructions that calculate a pitch value from the estimated pitch values determined by the pitch detectors and a second set of program instructions for constraining the final pitch value outputted by the voter subsystem so that the calculated pitch value is in agreement with calculated pitch values for previous frames. In addition, the pitch and voters may be implemented by a third set of program instructions executing on the same digital signal processor as the sets of instructions for the voter subsystem.

Citations

13 Claims

1. A pitch detector system for human speech comprising:
- means for storing a predetermined number of evenly spaced samples of instantaneous amplitude of said speech as a speech frame;
  
  means for generating residual samples from said speech samples;
  
  a plurality of identical means each responsive to an individual predetemined portion of said residual samples of said frame for estimating a pitch value of said frame;
  
  another plurality of identical means each responsive to an individual predetermined portion of said speech samples of said frame for estimating a pitch value of said frame;
  
  means for calculating a final pitch value from the estimated pitch values from each of said plurality and said other plurality of estimating means wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value;
  
  said calculating means comprises means responsive to all of said estimated pitch values having a value different than said predefined value for setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value;
  
  means for constraining said final pitch value so that the calculated pitch value is consistent with calculated pitch values from previous frames;
  
  said constraining means comprises means responsive to a first sequence of frames comprising a voiced frame and an unvoiced frame and a second voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said first sequence;
  
  said generating means comprises a new pitch value generating means responsive to a second sequence of frames comprising an unvoiced frame and a voiced frame and a second unvoiced frame for generating a new calculated value indicating an unvoiced frame; and
  
  said new pitch value generating means further reponsive to a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said third sequence.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1 wherein said generating means responsive to said first sequence comprises means for setting the new calculated pitch value equal to the arithmetic average of the calculated pitch values of the voiced frames of said first sequence;
    - andsaid generating means further comprises means responsive to said second sequence of frames for setting the new calculated pitch value equal to said predefined value.
  - 3. The system of claim 2 wherein said new pitch value generating means further comprises means responsive to a fourth sequence of frames comprising a first voiced frame and a second voiced frame and an unvoiced frame for setting the new calculated pitch value equal to the average of the calculated pitch values for the voiced frames and the unvoiced frame upon the magnitude of the difference between the calculated pitch values of the two voiced frames being less than another predefined value;
    - andmeans responsive to said fourth sequence for setting the new calculated pitch value equal to the pitch value of the first voiced frame upon the magnitude of the difference between the calculated pitch values for the two voiced frames being greater than said other predefined value.
  - 4. The system of claim 1 wherein said setting means further responsive to said estimated pitch values upon all but a first subset of said estimated pitch values equaling said predefined value for setting said calculated pitch value equal to the arithmetic average of said first subset upon the estimated pitch values of said first subset of said pitch values differing by less than another predefined value from each other;
    - andsaid setting means further responsive to all of said estimated pitch values being equal to said predefined value except for a second subset of said estimated pitch values for setting said calculated pitch value equal to said predefined value upon said estimated pitch values of said second subset differing from each other by a magnitude greater than said other predefined value.
  - 5. The system of claim 4 wherein said setting means further responsive to all but one of said estimated pitch values equaling said predefined value for setting said calculated pitch value equal to the one of said estimated pitch values not equal to said predefined value.

6. A pitch detector for human speech comprising:
- means for storing a predetermined number of evenly spaced speech samples of instantaneous amplitude of said speech as a present speech frame;
  
  means for filtering said samples to produce residual samples of the speech remaining after the formant effects of the vocal tract have been substantially removed;
  
  first means responsive to positive valued ones of said speech samples for estimating a first pitch value of said present speech frame;
  
  second means responsive to negative valued ones of said speech samples for estimating a second pitch value of said present speech frame;
  
  third means responsive to positive valued ones of said residual samples for estimating a third pitch value of said present speech frame;
  
  a fourth means responsive to negative valued ones of said residual samples for estimating a fourth pitch value of said present speech frame;
  
  means for calculating a pitch value from the estimated pitch values from said first, second, third and fourth estimating means wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value;
  
  said calculating means comprises means responsive to all of said estimated pitch values having a value different than said predefined value for setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value;
  
  means for constraining said final pitch value so that the calculated pitch value is consistent with calculated pitch values from previous frames;
  
  said constraining means comprises means responsive to a first sequence of frames comprising a voiced frame and an unvoiced frame and a second voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said first sequence;
  
  means responsive to a second sequence of frames comprising an unvoiced frame and voiced frame and a second unvoiced frame for generating a new calculated value indicating an unvoiced frame; and
  
  said generating means further responsive to a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame for generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of said third sequence.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The system of claim 6 wherein said generating means responsive to said first sequence comprises means for setting the new calculated pitch value equal to the arithmetic average of the calculated pitch values of the voiced frames of said first sequence;
    - andsaid generating means further responsive to said second sequence of unvoiced and voiced and unvoiced frames for setting the new calculated pitch value to said predefined value.
  - 8. The system of claim 7 wherein said generating means further comprises means responsive to a fourth sequence of frames comprising a first voiced frame and second voiced frame and an unvoiced frame for setting the new calculated pitch value equal to the average of the calculated pitch values for the voiced frames and the unvoiced frame upon the magnitude of the difference between the calculated pitch values of the two voiced frames being less than another predefined value;
    - andmeans responsive to said fourth sequence for setting the new calculated pitch value equal to the pitch value of said first voiced frame upon the magnitude of difference between the calculated pitch values for the two voiced frames being greater than said other predefined value.
  - 9. The system of claim 6 wherein said setting means further responsive to said estimated pitch values upon all but a first subset of said estimated pitch values equaling said predefined value for setting said calculated pitch value equal to the arithmetic average of said first subset upon the estimated pitch values of said first subset of said pitch values differing by less than another predefined value from each other;
    - andsaid setting means further responsive to all of said estimated pitch values being equal to said predefined value except for a second subset of said estimated pitch values for setting said calculated pitch value equal to said predefined value upon said estimated pitch values of said second subset differing from each other by magnitude greater than said other predefined value.
  - 10. The system of claim 9 wherein said setting means further comprises means responsive to all but one of said estimated pitch values equaling said predefined value for setting said calculated pitch value equal to the one of said estimated pitch value not equal to said predefined value.

11. A method for detecting the pitch of human speech with a system comprising a quantizer for converting the speech into frames of digital samples and a digital signal processor responsive to a plurality of program instructions and said frames of digital samples to determine the pitch of the speech, said method comprising the steps of:
- producing residual samples of the digitized speech that remain after the formant effects of the vocal track have been substantially removed;
  
  estimating a first pitch value of a present speech frame in response to positive valued ones of said digitized speech samples;
  
  estimating a second pitch value of said present speech frame in response to negative valued ones of said digitized speech samples;
  
  estimating a third pitch value of said present speech frame in response to positive valued ones of said residual samples; and
  
  estimating a fourth pitch value of said present speech frame in response to negative valued ones of said residual samples; and
  
  calculating said final pitch value from said first, second, third, and fourth pitch values wherein an unvoiced speech frame is indicated by said calculated pitch value being equal to a predefined value and a voiced frame is indicated by said calculated pitch value being equal to a value other than said predefined value;
  
  said step of calculating comprises the step of setting said calculated pitch value equal to the arithmetic average of a subset of said estimated pitch values, said subset comprising all of said estimated pitch values except the lowest magnitude value and the highest magnitude value;
  
  constraining said final pitch value so that said final pitch value is in agreement with final pitch values from previous frames by;
  
  said step constraining comprises the steps of generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of a first sequence of frames comprising a voiced frame and unvoiced frame and a second voiced frame;
  
  generating a new calculated value indicating an unvoiced frame in response to a second sequence of frames comprising an unvoiced frame and a voiced frame and a second unvoiced frame; and
  
  generating a new calculated pitch value having an arithmetic relationship to the calculated pitch values of the frames of a third sequence of frames comprising a voiced frame and a second voiced frame and a third voiced frame.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11 wherein said step of generating a new calculated value in response to said first sequence comprises the step of setting the new calculated pitch value equal to the arithmetic average of the calculated pitch values of the voiced frames of said first sequence;
    - andsaid step of generating a new calculated value for said second sequence comprises the step of setting the new calculated pitch value of said second sequence equal to said predefined value.
  - 13. The method of claim 12 wherein said constraining step further comprises the step of generating in response to a fourth sequence of frames comprising a first voiced frame and a second voiced frame and an unvoiced frame a new calculated pitch value equal to the average of the calculated pitch values for the two voiced frames and the unvoiced frame upon the magnitude of the difference between the voiced frames being less than another predefined value;
    - andsaid generating step further generating a new calculated pitch value equal to the pitch value of the first voiced frame upon the difference in magnitude between the two pitch values for the two voiced frames being greater than said other predefined value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bell Telephone Laboratories, Inc. (Nokia Corporation)
Original Assignee
American Telephone & Telegraph Company (AT&T, Inc.), AT&T, Inc.
Inventors
Prezas, Dimitrios, Picone, Joseph
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Merecki, John

Application Number

US06/770,633
Time in Patent Office

1,532 Days
Field of Search

381/49, 381/38, 381/29-50, 364/513.5
US Class Current

704/208
CPC Class Codes

G10L 25/90 Pitch determination of spee...

Parallel processing pitch detector

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Parallel processing pitch detector

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links