Method for speech analysis and speech recognition

US 4,805,218 A
Filed: 04/03/1987
Issued: 02/14/1989
Est. Priority Date: 04/03/1987
Status: Expired due to Fees

First Claim

Patent Images

1. A method of speech analysis comprising:

representing a length of speech as a temporal sequence of frames, with each frame representing speech sounds at one of a succession of brief time periods;

analyzing each frame of speech to obtain a plurality of spectral parameters, each of which represents the energy at one of a series of different frequency bands;

finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with an energy band which is close to, but different than, the frequency band represented by said given spectral parameter; and

using that difference to calculate a slope parameter which provides an indication of the extent to which the frequency of the acoustic energy in the part of the spectrum represented by said given spectral parameter is going up or going down.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.

Citations

18 Claims

1. A method of speech analysis comprising:
- representing a length of speech as a temporal sequence of frames, with each frame representing speech sounds at one of a succession of brief time periods;
  
  analyzing each frame of speech to obtain a plurality of spectral parameters, each of which represents the energy at one of a series of different frequency bands;
  
  finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with an energy band which is close to, but different than, the frequency band represented by said given spectral parameter; and
  
  using that difference to calculate a slope parameter which provides an indication of the extent to which the frequency of the acoustic energy in the part of the spectrum represented by said given spectral parameter is going up or going down.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as described in claim 1, further including:
    - calculating one or more such slope parameters for each of a plurality frames from the sequence of frames; and
      
      comparing the slope parameters which have been calculated for the sequence of frame against each of a plurality of acoustic models representing speech units, where each such speech-unit model has a model for the slope parameters associated with frame that correspond to the speech unit it represents.
  - 3. A method as described in claim 2, wherein:
    - each such speech-unit model has a model of the spectral parameters, as well as the slope parameters, associated with frames that correspond to the speech unit it represents; and
      
      said comparing includes comparing the spectral parameters, as well as the slope parameters, from individual frames against each of said plurality of speech-unit models.
  - 4. A method as described in claim 2, wherein:
    - said calculating of slope parameters associates one or more slope parameters with each frame of said sequence of frames;
      
      said speech-unit model is comprised of a sequence of dynamic programming elements each of which includes one or more slope parameter models; and
      
      said comparing includes using dynamic programming to compare the sequence of slope parameters associated with the sequence of frames against the sequence of dynamic programming elements associated with each of said speech unit models.
  - 5. A method as described in claim 1, wherein:
    - said finding of the difference between the energy of a given spectral parameter and a spectral parameter in a nearby frame includes;
      
      finding a first difference between the energy of said given spectral parameter of said given frame and the energy, in a frame occurring close after said given frame, of a spectral parameter associated with a frequency band which differs in frequency from the frequency band of the given parameter by an amount X; and
      
      finding a second difference between the energy of said given spectral parameter and the energy, in a frame occurring briefly before said given frame, of a spectral parameter associated with a frequency band which differs in frequency from the frequency band of the given parameter by approximately an amount -X; and
      
      said using of a difference to calculate a slope parameter includes combining said first and second differences to form a summed difference which is used to calculate said slope parameter.
  - 6. A method as described in claim 1, wherein:
    - said finding of the difference between the energy of a given spectral parameter and a spectral parameter in a nearby frame includes;
      
      finding a first difference between the energy of said given spectral parameter of said given frame and the energy, in a first nearby frame, of a spectral parameter associated with an frequency band which is above the frequency band of the given parameter by an amount X; and
      
      finding a second difference between the energy of said given spectral parameter and the energy, in a second nearby frame, of a spectral parameter associated with a frequency band which is below the frequency band of the given parameter by approximately an amount X, where said second nearby frame is on the opposite side of said given frame as said first nearby frame; and
      
      said using of a difference to calculate a slope parameter includes combining the negative of the value of one of said first and second differences to the value of the other of said differences to form a first net difference which is used to calculate said slope parameter.
  - 7. A method as described in claim 6, wherein:
    - the absolute value of said net difference is used to calculate said slope parameter.
  - 8. A method as described in claim 6, whereinsaid finding of the difference between the energy of a given spectral parameter and a spectral parameter in a nearby frame further includes:
    - finding a third difference between the energy of said given spectral parameter of said given frame and the energy, in a certain nearby frame, of a spectral parameter associated with a frequency band which is above the frequency band of the given parameter by an amount Y, which is larger than X; and
      
      finding a fourth difference between the energy of said given spectral parameter and the energy, in another nearby frame, of a spectral parameter associated with a frequency band which is below the frequency band of the given parameter by approximately an amount Y, where said another nearby frame is on the opposite side of said given frames as said certain nearby frame; and
      
      said using of a difference to calculate a slope parameter includes combining the negative of one of said third and fourth differences to the value of the other of those two differences to form a second net difference which is also used to calculate said slope parameter.
  - 9. A method as described in claim 8, wherein both the absolute value of said first net difference and the absolute value of the second net difference are used to calculate said slope parameter.
  - 10. A method as described in claim 1, wherein:
    - said finding of the difference between the energy of a given spectral parameter and the energy, in a nearby frame, of a another spectral parameter is performed for each of a group of spectral parameters which together represent a subrange of the frequency range represented by all the spectral parameters of that frame; and
      
      said using of that difference to calculate a slope parameter includes combining said differnces calculated for each of said group of spectral parameters to form a slope parameter which provides an indication of the extent to which the frequency of the acoustic energy in the part of the spectrum represented by said group of spectral parameters is going up or going down.
  - 11. A method as described in claim 1, wherein said nearby frame is directly adjacent said given frame in said temporal sequence of frames.
  - 12. A method as described in claim 1, wherein said nearby frame is separated from said given frame in said temporal sequence of frames by at least one intervening frame.

13. A method of speech recognition comprising:
- representing a length of speech as a temporal sequence of frames, with each frame representing speech sounds at one of a succession of brief time periods, and with each frame containing a plurality of acoustic parameters;
  
  calculating one or more difference parameters in association with each of a plurality of said frames, with each difference parameter being calculated as a function of the difference between a first acoustic parameter in one such frame and a second acoustic parameter in a nearby frame, in which for one or more of said difference parameters, said first parameter and said second parameter are associated with different frequencies; and
  
  comparing the difference parameters associated with individual frames against each of a plurality of acoustic models representing speech units, where each such speech-unit model has a model for the difference parameters associated with frames that correspond to the speech unit it represents.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. A method as described in claim 13, wherein:
    - each frame contains a plurality of spectral parameters, each of which represents the energy at one of a series of different frequency bands;
      
      said calculating of difference parameters includes calculating one or more slope parameters in association with each of a plurality of said frames, with each slope parameter being calculated as a function of the difference between the energy of a spectral parameter in one such frame and the energy of a spectral parameter representing a different frequency band in a nearby frame; and
      
      said comparing of difference parameters against speech-unit models includes comparing slope parameters against such speech-unit models, where each such speech-unit model has a model for the slope parameters associated with frames that correspond to the speech unit it represents.
  - 15. A method as described in claim 13, wherein:
    - the calculating of difference parameter also involves calculating one or more slope parameters in association with each of a plurality of said frames, where each slope parameter is calculated as a function of the difference between a given spectral parameter in one such frame which represents the energy at a given frequency band and a spectral parameter in a nearby frame which represents the energy at another frequency band;
      
      the speech-unit models also contain models for the slope parameters associated with frames that correspond to the speech unit those speech-unit models represents; and
      
      the comparing of the difference parameters of frames against speech-unit models also includes comparing such slope parameters against the models for such slope parameters contained in the speech-unit models.
  - 16. A method as described in claim 13, wherein:
    - each such speech-unit model has a model of the spectral parameters, as well as the difference parameters, associated with frames that correspond to the speech unit it represents; and
      
      said comparing includes comparing the spectral parameters, as well as the difference parameters, from individual frames against each of said plurality of speech-unit models.
  - 17. A method as described in claim 13, wherein:
    - said calculating of difference parameters associates one or more difference parameters with each frame of said sequence of frames;
      
      said speech-unit model is comprised of a sequence of dynamic programming elements, each of which includes one or more difference parameter models; and
      
      said comparing includes using dynamic programming to compare the sequence of difference parameters associated with the sequence of frames against the sequence of dynamic programming elements associated with each of said speech unit models.
  - 18. A method as described in claim 13, wherein:
    - said calculating of difference parameters associates one or more difference parameters with each frame of said sequence of frames;
      
      each of said speech-unit models represents the acoustic properties of individual frames associated with a given phoneme (define);
      
      said comparing includes comparing the speech-unit models for each of a plurality of phonemes against each of said plurality of frames and associating with each frame the one or more phonemes whose models compare most closely with that frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Baker, James K., Gillick, Laurence, Bamberg, Paul G., Roth, Robert S.
Primary Examiner(s)
Roskoski, Bernard

Application Number

US07/034,842
Time in Patent Office

683 Days
Field of Search

384/36, 384/37, 384/43, 384/45, 384/48, 384/50, 364/513.5
US Class Current

704/241
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Method for speech analysis and speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method for speech analysis and speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links