Processing speech signals

US 20040133424A1
Filed: 10/22/2003
Published: 07/08/2004
Est. Priority Date: 04/24/2001
Status: Abandoned Application

First Claim

Patent Images

1. A method of processing a speech signal in noise, comprising:

determining a frequency spectrum of a frame of the speech signal;

determining a value of the pitch of the frame of the speech signal;

characterised by;

identifying peaks (12, 14, 16, 22, 28, 32) in the spectrum; and

evaluating the peaks (12, 14, 16, 22, 28, 32) individually to determine respective scores for the peaks (12, 14, 16, 22, 28, 32), the score for a peak (12, 14, 16, 22, 28, 32) being a measure of the likelihood that the peak (12, 14, 16, 22, 28, 32) is a harmonic band of the speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of processing a speech signal in noise, comprising: determining a frequency spectrum of a frame of the speech signal; determining a value of the pitch of the frame of the speech signal; identifying peakes (12, 14, 16, 22, 28, 32) in the spectrum; and evaluating the peaks individually to determine respective scores for the peaks, the score for a peak being a measure of the likelihood that the peak is a harmonic band of teh speech signal. As a consequence there is: (a) no need for high f0 accuracy as there is no need to predict long sequences of harmonic positions; and (b) no need for an assumption of harmonic integrity at all points.

Citations

31 Claims

1. A method of processing a speech signal in noise, comprising:
- determining a frequency spectrum of a frame of the speech signal;
  
  determining a value of the pitch of the frame of the speech signal;
  
  characterised by;
  
  identifying peaks (12, 14, 16, 22, 28, 32) in the spectrum; and
  
  evaluating the peaks (12, 14, 16, 22, 28, 32) individually to determine respective scores for the peaks (12, 14, 16, 22, 28, 32), the score for a peak (12, 14, 16, 22, 28, 32) being a measure of the likelihood that the peak (12, 14, 16, 22, 28, 32) is a harmonic band of the speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31)
- - 2. A method according to claim 1, wherein each peak (12, 14, 16, 22, 28, 32) is individually evaluated by analysing the frequency position of the peak relative to the frequency position of one or more of the other peaks.
  - 3. A method according to claim 2;
    - wherein the score for a peak (12, 14, 16, 22, 28, 32) under consideration is dependent upon how close other peaks are to a frequency position calculated as one pitch away from the frequency position of the peak under consideration.
  - 4. A method according to claim 3, wherein the evaluating step comprises:
    - selecting a first peak (22) at a first frequency position (24);
      
      calculating a first calculated frequency position (26) separated from the first frequency position in frequency by the pitch value;
      
      identifying any second peak (28) within a given number of frequency bins of the first calculated frequency position (26); and
      
      allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26).
  - 5. A method according to claim 4, further comprising:
    - calculating a second calculated frequency position (30) separated, in an opposite frequency direction to the first calculated frequency position (26), from the first frequency position (24) in frequency by the pitch value;
      
      identifying any third peak (32) within a given number of frequency bins of the second calculated frequency position (30); and
      
      allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26) and the relative frequency position of the third peak (32) compared to the second calculated frequency position (30).
  - 6. A method according to claim 5, wherein the score is allocated according to the closeness of the second and third peaks to the first and second calculated frequency positions respectively and according to whether any variation is in the same or different frequency direction for the second peak (28) compared to the third peak (32).
  - 7. A method according to claim 6, wherein the given number of frequency bins from the first and second calculated frequency positions within which any second or third peak is identified is ±
    - one frequency bin, where + represents increasing/decreasing frequency value, such that the second or third peak may be either (i) one bin higher, (ii) at the correct bin or (iii) one bin lower than the respective calculated frequency position, and (iv) if no peaks are identified within ±
      
      one frequency bin then there is respectively no identified second or third peak; and
      
      the score is allocated as follows in terms of the second and third peaks;
      
      if both the peaks are at the correct bin, the score is ‘
      
      6’
      
      ;
      
      if one of the peaks is at the correct bin and the other peak is one bin higher or one bin lower, the score is ‘
      
      5’
      
      ;
      
      if both peaks are one bin higher or both peaks are one bin lower, the score is ‘
      
      4’
      
      ;
      
      if one peak is one bin higher and the other peak is one bin lower, the score is ‘
      
      3’
      
      ;
      
      if one peak is correct and there is no other peak identified, the score is ‘
      
      2’
      
      ;
      
      if one peak is one bin higher or one bin lower, and there is no other peak identified, the score is ‘
      
      1’
      
      ; and
      
      if neither peak is identified, the score is ‘
      
      0’
      
      .
  - 8. A method according to claim 2, wherein the evaluating step comprises:
    - determining the fundamental frequency position;
      
      calculating a first calculated frequency position separated from the fundamental frequency position by the pitch;
      
      seeking a first peak within a given number of frequency bins of the first calculated frequency position; and
      
      if such a first peak is found, allocating a score to the first peak dependent upon the relative frequency position of the first peak compared to the first calculated frequency position.
  - 9. A method according to claim 8, further comprising, if such a first peak is found:
    - calculating a second calculated frequency position separated from the frequency position of the first peak by the pitch;
      
      seeking a second peak within a given number of frequency bins of the second calculated frequency position; and
      
      if such a second peak is found, allocating a score to the second peak dependent upon the relative frequency position of the second peak compared to the first calculated frequency position.
  - 10. A method according to claim 8 or 9, further comprising, if such a first peak is not found:
    - calculating a second calculated frequency position separated from the fundamental frequency position by twice the pitch;
      
      seeking a second peak within a given number of frequency bins of the second calculated frequency position; and
      
      if such a second peak is found, allocating a score to the second peak dependent upon the relative frequency position of the second peak compared to the second calculated frequency position.
  - 11. A method according to claim 9 or 10, further comprising repeating the steps in corresponding fashion for further peaks and/or multiples of the pitch until the whole spectrum has been analysed.
  - 12. A method according to any of claims 8 to 11, wherein the given number of frequency bins which the respective peaks are required to be within the respective calculated frequency position is ±
    - one frequency bin, where ±
      
      represents increasing/decreasing frequency value, such that the respective peak may be either at the respective calculated frequency position in which case the peak is allocated a relatively higher score or ±
      
      one frequency bin of the respective calculated frequency position in which case the peak is allocated a relatively lower score.
  - 13. A method according to any of claims 3 to 7 further comprising the steps of the method of any of claims 8 to 12, wherein the score for a peak is a score provided by combining, for example by adding, the respective scores for the peak from each of the two methods.
  - 14. A method according to any preceding claim, further comprising performing an iterative process in which the positions found for identified harmonics are used to update the value of the pitch and the updated value of the pitch is then used in a refined determination of the positions of the harmonics.
  - 15. A method according to any preceding claim, wherein the score for a peak is modified by analysing the consistency of the score for the peak in the present frame with the score for the corresponding peak in one or more previous and/or one or more subsequent frames.
  - 16. A method according to claim 15, wherein the score is modified by adding to the score for the peak in the present frame the score for the corresponding peak in the one or more preceding and/or one or more subsequent frames, for those preceding and/or subsequent frames which fall within an allowable frame to frame speech harmonic trajectory.
  - 17. A method according to claim 16, wherein the score is modified by adding to the score for the peak in the present frame the score for the corresponding peak in the immediately preceding frame and the immediately subsequent frame, and the allowable frame to frame speech harmonic trajectory is that the corresponding peaks in the previous and subsequent frames are only allowed to be at the same frequency bin or at ±
    - one frequency bin from the same frequency bin as the peak in the present frame.
  - 18. A method according to any preceding claim, wherein the score for a peak is compared to a threshold value to determine whether the peak is to be treated as a harmonic band of the speech signal.
  - 19. A method according to claim 18, further comprising using a separate speech/non-speech detector to estimate whether the frame is speech or non-speech, and wherein the threshold value is varied according to whether the estimate is speech or non-speech.
  - 20. A method according to claim 18 or 19, wherein the speech signal is reproduced in a form containing only the harmonic bands or frames that are to be treated as speech in view of the comparison of their score with the threshold.
  - 21. A method according to any of claims 1 to 18, wherein the score for a peak is used as a speech-confidence indicator for further processing of the peak.
  - 22. A method according to any preceding claim, wherein the step of identifying peaks in the spectrum comprises differentiating the frequency spectrum with respect to frequency using two scales, the first scale being over a higher number of frequency bins than the second scale, and weighting the results from the two scales such that the differentiation using the first scale identifies significant speech peaks and the differentiation using the second scale improves the precision of the calculation of the frequency position of the identified peak.
  - 23. A method according to any preceding claim, further comprising using the resulting harmonic band data in at least one of the following group of processes:
    - (i) automatic speech recognition;
      
      (ii) front-end processing in distributed automatic speech recognition;
      
      (iii) speech enhancement;
      
      (iv) echo cancellation;
      
      (v) speech coding.
  - 24. A method according to any preceding claim, further comprising estimating the amount of speech energy in the frame as the energy contained in the identified speech harmonics.
  - 25. A method according to claim 24, further comprising using the estimated speech energy of the frame to normalise the speech energy of the frame.
  - 26. A method according to claim 25, wherein the speech energy of the frame is normalised using a power-law regulated by a speech-confidence metric.
  - 27. A method according to claim 25 or 26, further comprising deriving a root-cepstrum of the frame using the normalised speech energy of the frame, and using the root-cepstrum of the frame to perform an automatic speech recognition process on the frame.
  - 30. A storage medium storing processor-implementable instructions for controlling one or more processors to carry out the method of any of claims 1 to 29.
  - 31. Apparatus adapted to implement the method of any of claims 1 to 29.

28. A method of performing automatic speech recognition on a speech signal in noise, comprising normalising the speech energy level of the signal and deriving a root-cepstrum using the normalised speech energy level.

29. A method of identifying peaks (12, 14, 16) in a frequency spectrum of a frame of a speech signal, comprising:
- differentiating the frequency spectrum with respect to frequency using two scales, the first scale being over a higher number of frequency bins than the second scale, and weighting the results from the two scales such that the differentiation using the first scale identifies significant speech peaks and the differentiation using the second scale improves the precision of the calculation of the frequency position of the identified peak.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Kelleher, Holly Louise, Pearce, David John Benjamin, Ealey, Douglas Ralph

Application Number

US10/475,641
Publication Number

US 20040133424A1
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 25/90 Pitch determination of spee...

Processing speech signals

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Processing speech signals

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links