Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts

US 6,032,116 A
Filed: 06/27/1997
Issued: 02/29/2000
Est. Priority Date: 06/27/1997
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

a line spectral pair generator to generate line spectral pair frequencies from a speech input signal; and

a first speech classifier comprising;

a quantizer for determining a distance measure between an ith speech input signal line spectral pair frequency and an ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a shift of the difference by an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise, wherein N₁ is greater than or equal to one and less than or equal to P;

wherein the first speech classifier is capable of receiving output data based on the distance measures and is capable of generating speech classification output data for classifying the speech input signal.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented by a vector ƒ of line spectral pair frequencies and are fuzzy matrix quantized to respective a vector ƒ entries in a codebook of the FMQ. A distance measure between ƒ and ƒ, d(ƒ,ƒ), is defined as ##EQU1## where the constants α₁, a₂, β₁ and β₂ are set to substantially minimize quantization error, and e_i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal. The speech recognition system may also include hidden Markov models and neural networks, such as a multilevel perceptron neural network, speech classifiers.

37 Citations

View as Search Results

36 Claims

1. A speech recognition system comprising:
- a line spectral pair generator to generate line spectral pair frequencies from a speech input signal; and
  
  a first speech classifier comprising;
  
  a quantizer for determining a distance measure between an ith speech input signal line spectral pair frequency and an ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a shift of the difference by an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise, wherein N₁ is greater than or equal to one and less than or equal to P;
  
  wherein the first speech classifier is capable of receiving output data based on the distance measures and is capable of generating speech classification output data for classifying the speech input signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The speech recognition system of claim 1 wherein the quantizer is for further determining a distance measure between ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=N₁ +1 to P, is derived from (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a weighting of the difference by an ith frequency weighting factor.
  - 3. The speech recognition system of claim 2 wherein the distance measure, d(ƒ
    - ,ƒ
      
      ), between the speech input signal, ƒ
      
      , and the reference speech signal, ƒ
      
      , is defined by;
      
      ##EQU27## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequencies in the speech input signal and the reference speech signal, respectively, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantzation error, and e_i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal.
  - 4. The speech recognition system of claim 3 wherein the i=1 to N₁ line spectral pair frequencies are in the 0 to 400 Hz range.
  - 5. The speech recognition system of claim 3 wherein α
    - ₁ is set to 1.6, α
      
      ₂ is set to 0.68, β
      
      ₁ is set to 0.5, and β
      
      ₂ is set to 0.25.
  - 6. The speech recognition system of claim 1 wherein the ith frequency shifting factor is proportional to a power spectrum of a linear prediction error at the ith line spectral pair frequency.
  - 7. The speech recognition system of claim 1 further comprising:
    - a quantizer having a codebook having C codewords for generating the distance measure between each of the P line spectral pair frequencies of the speech input signal and each of a plurality of reference speech signals; and
      
      a second speech classifier to receive the output data based on the distance measures and generate speech classification output data to classify the speech input signal as one of u vocabulary words.
  - 8. The speech recognition system of claim 7 wherein the quantizer is a single codebook quantizer having codewords representing a vocabulary of u words.
  - 9. The speech recognition system of claim 7 further comprising:
    - a third speech classifier to receive output data from the first speech classifier and classify the speech input signal as one of the u vocabulary words.
  - 10. The speech recognition system of claim 7 wherein the second speech classifier is a neural network.
  - 11. The speech recognition system of claim 7 wherein the quantizer is a fuzzy matrix quantizer further for generating respective fuzzy distance measures between the respective speech input signal and reference speech signal P line spectral pair frequencies using the corresponding generated distance measures;
    - andwherein the second speech classifier is a neural network and the output data is a fuzzy distance measure proportional to a combination of the generated fuzzy distance measures.
  - 12. The speech recognition system of claim 11 wherein the quantizer is a fuzzy matrix quantizer further for generating an observation sequence of indices indicating the relative closeness between the respective speech input signal and reference speech signal P line spectral pair frequencies;
    - andwherein the second speech classifier includes u hidden Markov models and a fuzzy Viterbi algorithm module for determining a respective probability for each of the u hidden Markov models that the respective hidden Markov model produced the observation sequence.
  - 13. The speech recognition system of claim 1 wherein the speech input signal is represented by P line spectral pair frequencies, where P is a non-negative integer, and the reference speech signal is represented by P line spectral pair frequencies.

14. A speech recognition system comprising:
- means for generating P order line spectral pair frequencies for a speech input signal;
  
  means for determining a difference, for i=1 to N₁, between the ith line spectral pair frequency and an ith line spectral frequency of a reference speech signal;
  
  means for shifting the difference by an ith frequency shifting factor, for i=1 to N₁, to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise;
  
  means for determining a difference, for i=N₁ +1 to P, between ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency;
  
  means for weighting of the difference by an ith frequency weighting factor, for i=N₁ +1 to P, wherein ith frequency shifting and weighting factor is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal; and
  
  means for utilizing the shifted difference to classify the speech input signal.
- View Dependent Claims (15, 16, 17)
- - 15. The speech recognition system as in claim 14 wherein the distance measure, d(ƒ
    - ,ƒ
      
      ), between the speech input signal, ƒ
      
      , and the reference speech signal, ƒ
      
      , is defined by;
      
      ##EQU28## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequencies in the speech input signal and the reference speech signal, respectively, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantization error, and e_i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal.
  - 16. The speech recognition system of claim 15 wherein the i=1 to N₁ line spectral pair frequencies are in the 0 to 400 Hz range.
  - 17. The speech recognition system of claim 15 wherein α
    - ₁ is set to 1.6, α
      
      ₂ is set to 0.68, β
      
      ₁ is set to 0.5, and β
      
      ₂ is set to 0.25.

18. A method of generating a robust distance measure in a speech recognition system comprising the steps of:
- generating P order line spectral pair frequencies for a speech input signal;
  
  determining a difference, for i=1 to N₁, between the ith line spectral pair frequency and an ith line spectral frequency of a reference speech signal;
  
  shifting the difference, for i=1 to N₁, by an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise; and
  
  utilizing the shifted difference to classify the speech input signal.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The method of claim 18 further comprising the step of:
    - determining a difference, for i=N₁ +1 to P, between ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency; and
      
      weighting of the difference, for i=N₁ to P, by an ith frequency weighting factor.
  - 20. The method of claim 18 further comprising the steps of:
    - determining respective differences, for i=N₁ +1 to P, between speech input signal line spectral pair frequencies and corresponding reference speech signal line spectral pair frequencies; and
      
      weighting of the respective differences, for i=N₁ +1 to P, by respective frequency weighting factors.
  - 21. The method of claim 20 further comprising the steps of:
    - weighting the differences, for i=1 to N₁, by a first weighting constant, α
      
      ₁ ;
      
      weighting the differences, for i=N₁ +1 to P, by a second weighting constant, α
      
      ₂ ;
      
      adding the respective differences together to generate a distance measure between the speech input signal and the reference speech signal; and
      
      utilizing the P differences to classify the speech input signal.
  - 22. The method of claim 18 wherein the ith frequency shifting factor is proportional to a power spectrum of a linear prediction error at the ith line spectral pair frequency.

23. A method of robust speech recognition in an automotive environment comprising the steps of:
- receiving a speech input signal;
  
  representing the speech input signal with a vector ƒ
  
  of P line spectral pair frequencies;
  
  representing a codeword in a quantizer codebook as a vector ƒ
  
  of P line spectral pair frequencies; and
  
  determining a distance measure between the vector ƒ and
  
  the vector ƒ
  
  , wherein the distance measure, d(ƒ
  
  ,ƒ
  
  ), is defined by;
  
  ##EQU29## wherein the constants α
  
  ₁, α
  
  ₂, β
  
  ₁ and β
  
  ₂ are set to substantially minimize quantization error, and e_i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal.
- View Dependent Claims (24, 25, 26)
- - 24. The method as in claim 23 further comprising the steps of:
    - using the distance measure, d(ƒ
      
      ,ƒ
      
      ), to generate fuzzy distance measures in an fuzzy matrix quantization (FMQ)/hidden Markov model (HMM) speech recognition system.
  - 25. The method as in claim 23 further comprising the steps of:
    - using the distance measure, d(ƒ
      
      ,ƒ
      
      ), to generate fuzzy distance measures in an FMQ/HMM/neural network speech recognition system.
  - 26. The method as in claim 23 wherein the FMQ includes codebooks for each of u speech recognition system vocabulary words.

27. An apparatus comprising:
- a quantizer that receives speech input signal line spectral pair frequencies, that includes at least one codebook having reference speech signal line specral pair frequencies, and that determines a distance measure between an ith speech input signal line spectral pair frequency and an ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a subtraction from the difference of an ith frequency shifting factor to at least partially compensate for frequency shifting of the ith speech input signal line spectral pair frequency by acoustic noise, wherein N₁ is greater that or equal to one and less than or equal to P.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 28. The apparatus of claim 27 wherein the quantizer is for further determining a distance measure between ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency, wherein the distance measure, for i=N₁ +1 to P, is derived from (i) a difference between the ith speech input signal line spectral pair frequency and the ith reference speech signal line spectral pair frequency and (ii) a weighting of the difference by an ith frequency weighting factor.
  - 29. The apparatus of claim 28 wherein the quantizer is for further determining a distance measure, d(ƒ
    - , ƒ
      
      ), between the speech input signal, ƒ
      
      , and the reference speech signal, f, is defined by;
      
      ##EQU30## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequencies in the speech input signal and the reference speech signal, respectively, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantization error, and e_i is the error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal.
  - 30. The apparatus of claim 27 wherein the ith frequency shifting factor is proportional to a power spectrum of a linear prediction error at the ith line spectral pair frequency.
  - 31. The apparatus of claim 27 further comprising:
    - a quantizer having a codebook having C codewords for generating the distance measure between each of the P line spectral pair frequencies of the speech input signal and each of a plurality of reference speech signals;
      
      a first speech classifier to receive output data based on the distance measures and generate speech classification output data to classify the speech input signal as one of u vocabulary words; and
      
      a second speech classifier to receive output data from the first speech classifier and classify the speech input signal as one of the u vocabulary words.
  - 32. The apparatus of claim 31 wherein the second speech classifier is a neural network.
  - 33. The apparatus of claim 31 wherein the quantizer is a fuzzy matrix quantizer further for generating respective fuzzy distance measures between the respective speech input signal and reference speech signal P line spectral pair frequencies using the corresponding generated distance measures;
    - andwherein the second speech classifier is a neural network and the output data is a fuzzy distance measure proportional to a combination of the generated fuzzy distance measures.
  - 34. The apparatus of claim 33 wherein the quantizer is a fuzzy matrix quantizer further for generating an observation sequence of indices indicating the relative closeness between the respective speech input signal and reference speech signal P line spectral pair frequencies;
    - andwherein the second speech classifier is u hidden Markov models and a fuzzy Viterbi algorithm module for determining a respective probability for each of the u hidden Markov models that the respective hidden Markov model produced the observation sequence.
  - 35. The apparatus of claim 27 wherein the speech input signal is represented by P line spectral pair frequencies, where P is a non-negative integer, and the reference speech signal is represented by P line spectral pair frequencies.
  - 36. The apparatus of claim 27 wherein the ith frequency shifting factor is an error power spectrum of the speech input signal and a predicted speech input signal at the ith line spectral pair frequency of the speech input signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Corporation
Original Assignee
Advanced Micro Devices, Inc.
Inventors
Cong, Lin, Asghar, Safdar M.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US08/883,980
Time in Patent Office

977 Days
Field of Search

704/238, 704/236, 704/209, 704/207, 704/206, 704/276, 704/205
US Class Current

704/238
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 15/20   Speech recognition techniqu...

Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Distance measure in a speech recognition system for speech recognition using frequency shifting factors to compensate for input signal frequency shifts

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links