Split matrix quantization with split vector quantization error compensation and selective enhanced processing for robust speech recognition

US 6,067,515 A
Filed: 10/27/1997
Issued: 05/23/2000
Est. Priority Date: 10/27/1997
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

a split vector quantizer to receive first parameters of an input signal, to partition the first parameters into S₁ partitions, and to generate a first quantization observation sequence;

a first speech classifier to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech classification output data;

a split matrix quantizer to receive second parameters of the input signal, to partition the second parameters into S₂ partitions, and generate a second quantization observation sequence;

a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data; and

a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system utilizes both split matrix and split vector quantizers as front ends to a second stage speech classifier such as hidden Markov models (HMMs) to, for example, efficiently utilize processing resources and improve speech recognition performance. Fuzzy split matrix quantization (FSMQ) exploits the "evolution" of the speech short-term spectral envelopes as well as frequency domain information, and fuzzy split vector quantization (FSVQ) primarily operates on frequency domain information. Time domain information may be substantially limited which may introduce error into the matrix quantization, and the FSVQ may provide error compensation. Additionally, acoustic noise influence may affect particular frequency domain subbands. This system also, for example, exploits the localized noise by efficiently allocating enhanced processing technology to target noise-affected input signal parameters and minimize noise influence. The enhanced processing technology includes a weighted LSP and signal energy related distance measure in training Linde-Buzo-Gray (LBG) algorithm and during recognition. Multiple codebooks may also be combined to form single respective codebooks for split matrix and split vector quantization to lower processing resources demand.

Citations

58 Claims

1. A speech recognition system comprising:
- a split vector quantizer to receive first parameters of an input signal, to partition the first parameters into S₁ partitions, and to generate a first quantization observation sequence;
  
  a first speech classifier to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech classification output data;
  
  a split matrix quantizer to receive second parameters of the input signal, to partition the second parameters into S₂ partitions, and generate a second quantization observation sequence;
  
  a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data; and
  
  a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The speech recognition system as in claim 1 wherein the split vector and split matrix quantizers utilize respective single codebooks.
  - 3. The speech recognition system as in claim 1 wherein the input signal for reception by the split vector quantizer and split matrix quantizer is a spoken word.
  - 4. The speech recognition system as in claim 1 wherein the first parameters of the input signal include the energy of the input signal and first and second derivatives of the input signal energy.
  - 5. The speech recognition system as in claim 1 wherein the split vector and split matrix quantizers utilize fuzzy quantization.
  - 6. The speech recognition system as in claim 1 wherein the first and second speech classifiers are a first and second set, respectively, of hidden Markov models.
  - 7. The speech recognition system as in claim 6 wherein:
    - the speech recognition system has u vocabulary words, and u is an integer;
      
      the first respective speech classification output data includes probabilities, Pr(O_Vn |λ
      
      _Vn), n=1,2, . . . , u, related to respective ones of the first set of n hidden Markov models, λ
      
      _Vn, and the first quantization observation, O_V, sequence to one of the u vocabulary words, and n is an integer;
      
      the second respective speech classification output data includes probabilities, Pr(O_Mn |λ
      
      _Mn), n=1,2, . . . u, related to respective ones of the second set of n hidden Markov models, λ
      
      _Mn, and the second quantization observation sequence, O_Mn, to one of the u vocabulary words, and n is an integer;
      
      the third respective speech classification data is D(n)=α
      
      Pr(O_Mn |λ
      
      _Mn)+Pr(O_Vn |λ
      
      _Vn), n=1, 2, . . . , u and α
      
      is a weighting factor;
      
      to all Pr(O_Vn |λ
      
      _Vn) to compensate for recognition errors in Pr(O_Mn |λ
      
      _Mn) andthe hybrid decision generator is further capable of recognizing the input signal as the ith vocabulary word when D(i) represents the highest probability that the input signal is the ith of the u vocabulary words.
  - 8. The speech recognition system as in claim 1 wherein the first parameters of the input signal for reception by the split vector quantizer include P order line spectral pairs of the input signal, and the second parameters of the input signal for reception by the split matrix quantizer include temporally related P order line spectral pairs, wherein P is an integer.
  - 9. The speech recognition system as in claim 8 wherein P equals twelve.
  - 10. The speech recognition system as in claim 1 wherein split vector quantizer is capable of partitioning the first parameters to separate first parameters primarily affected by localized noise from the remaining first parameters, and the split matrix quantizer is capable of partitioning the second parameters to separate second parameters primarily affected by localized noise from the remaining second parameters.
  - 11. The speech recognition system as in claim 10 wherein the first and second parameters include line spectral pair coefficients, S₁ and S₂ equal two, the first parameters in a first submatrix of the split vector quantizer include the first N₁ of P order line spectral pair coefficients, and the second parameters in a first submatrix of the split matrix quantizer include the first N₁ of P order line spectral pair coefficients.
  - 12. The speech recognition system as in claim 11 wherein the split vector and split matrix quantizers include respective enhanced distance measures which are capable of operating on the first submatrix of the split vector quantizer and the first submatrix of the split matrix quantizer, respectively.
  - 13. The speech recognition system as in claim 11 wherein the split vector and split matrix quantizers respectively are capable of determining a distance measure between an ith order line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order codeword line spectral pair frequencies and (ii) a weighting of the difference by an ith frequency weighting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the input signal and codewords.
  - 14. The method as in claim 13 wherein the first distance measures, d(ƒ
    - ,ƒ
      
      ), between the first parameters of the input signal, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU21## wherein d(ƒ
      
      ,ƒ
      
      ), ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the first parameters of the input signal and the respective first codewords, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize respective processing error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the first parameters of the input signal; and
      
      the second distance measures, d(ƒ
      
      ,ƒ
      
      ), between the input signal parameters, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU22## wherein d(ƒ
      
      ,ƒ
      
      ), ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the second parameters of the input signal and the respective second codewords, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize respective processing error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the second parameters of the input signal.
  - 15. The speech recognition system as in claim 13 wherein noise frequencies are primarily located in the frequency range represented by line spectral pairs i=1 to N₁.
  - 16. The speech recognition system as in claim 11 wherein the split vector and split matrix quantizers respectively are capable of determining a distance measure between an ith line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the input signal and codewords.
  - 17. The speech recognition system as in claim 16 wherein noise frequencies are primarily located in the frequency range substantially coinciding with the frequency range represented by line spectral pairs i=1 to N₁.
  - 18. The speech recognition system as in claim 16 wherein a distance measure, d(ƒ
    - ,ƒ
      
      ), between input signal parameters, ƒ
      
      , and reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU23## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the input signal and respective codewords, respectively, α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are are constants, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal.
  - 19. The speech recognition system as in claim 18 wherein the constants α
    - ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantization error.

20. A speech recognition system comprising:
- a split vector quantizer to receive first parameters of an input signal, to partition the first parameters into S₁ partitions to generate first quantization output data, wherein the first quantization output data includes a first observation sequence;
  
  a first speech classifier to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech classification output data;
  
  a split matrix quantizer to receive second parameters of the input signal, to partition the second parameters into S₂ partitions to generate second quantization output data, wherein the second quantization output data includes a second observation sequence; and
  
  a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data; and
  
  a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data.
- View Dependent Claims (21, 22, 23, 31, 32, 33, 56)
- - 21. The speech recognition system as in claim 20 wherein the first and second speech classifiers are a first and second set, respectively, of hidden Markov models.
  - 22. The speech recognition system as in claim 20 wherein:
    - the speech recognition system has u vocabulary words, and u is an integer;
      
      the first respective speech classification output data includes first respective speech recognition probabilities;
      
      the first respective speech recognition probabilities, Pr(O_Vn |λ
      
      _Vn), n=1,2, . . . u, related to respective ones of the first of n hidden Markov models, λ
      
      _Vn, and the first quantization observation, O_V, sequence to one of the u vocabulary words, and n is an integer;
      
      the second respective speech classification output data includes second respective speech recognition probabilities;
      
      the second respective speech recognition probabilities, Pr(O_Mn |λ
      
      _Mn), n=1,2, . . . u, related to respective ones of the second of n hidden Markov models, λ
      
      _Mn, and the second quantization observation sequence, O_Mn, to one of the u vocabulary words, and n is an integer;
      
      the combined first and second respective recognition probabilities are respectively D(n)=α
      
      Pr(O_Mn |λ
      
      _Mn)+Pr(O_Vn |λ
      
      _Vn), n=1, 2, . . . , u and α
      
      is a weighting factor to allow Pr(O_Vn |λ
      
      _Vn) to compensate for recognition errors in Pr(O_Mn |λ
      
      _Mn) andthe hybrid decision generator is further capable of recognizing the input signal as the ith vocabulary word when D(i) represents the highest probability that the input signal is the ith vocabulary word.
  - 23. The speech recognition system as in claim 20 wherein:
    - the first parameters of a first partition of the S₁ partitions of the input signal include N₁ order line spectral pairs and of a second partition of the S₁ partitions of the input signal include N₁ +1 to P order line spectral pairs, wherein P is an integer; and
      
      the split vector quantizer is capable of determining respective a distance measure between an ith line spectral pair frequency of the first parameters of the input signal and respective ith order line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the first parameters of the input signal and codewords.
  - 31. The apparatus as in claim 21 wherein the second speech classifier is capable of operating on frequency domain parameters of the input signal.
  - 32. The apparatus as in claim 21 wherein the frequency domain parameters are P order line spectral pair frequencies, wherein P is an integer.
  - 33. The apparatus as in claim 21 wherein the first and second parameters of the input signal further include input signal energy related parameters.
  - 56. The speech recognition system as in claim 23 wherein the distance measure, d(ƒ
    - ,ƒ
      
      ), between the first parameters of the input signal, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU29## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters of the first parameters of the first partition of the S₁ partitions of the input signal and respective codewords, respectively, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantization error, and e_i is the error power spectrum of the first parameters of the first partition of the input signal and a predicted input signal at the ith line spectral pair frequency of the first parameters of the first partition of the input signal.

24. A speech recognition system comprising:
- a split vector quantizer to receive line spectral pair input data corresponding to an input speech signal and to generate a first quantization observation sequence;
  
  first hidden Markov models to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech recognition probabilities from each of the first hidden Markov models;
  
  a split matrix quantizer to receive temporally associated line spectral pair input data corresponding to the input speech signal and to generate a second quantization observation sequence;
  
  second hidden Markov models to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech recognition probabilities from each of the second hidden Markov models; and
  
  a hybrid decision generator to utilize the first and second respective speech recognition probabilities to generate input signal recognition information and to recognize the input speech signal from the input signal recognition information.
- View Dependent Claims (25, 26)
- - 25. The speech recognition system as in claim 24 wherein:
    - the line spectral pair input data are P order line spectral pairs of the input signal, wherein P is an integer; and
      
      the split vector and split matrix quantizers are each respectively capable of determining respective a distance measure between an ith line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the input signal and codewords.
  - 26. The speech recognition system as in claim 25 wherein the distance measure, d(ƒ
    - ,ƒ
      
      ), between input signal parameters, ƒ
      
      , and reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU24## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the input signal and respective codewords, respectively, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantization error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal.

27. An apparatus comprising:
- a first speech classifier to operate on S₁ partitions of first parameters of an input signal and to provide first output data relating the input signal to first reference data, wherein the input signal parameters include frequency and time domain parameters, wherein S₁ is an integer greater than one and the first speech classifier further includes a first set of hidden Markov models;
  
  a second speech classifier to operate on S₂ partitions of second parameters of the input signal and to provide second output data relating the input signal to second reference data, wherein the second parameters of the input signal include the frequency domain parameters, wherein S₂ is an integer greater than one and the second speech classifier further includes a second set of hidden Markov models; and
  
  a hybrid decision generator to combine the first output data and the second output data so that the second output data compensates for errors in the first output data and to generate third output data to classify the input signal.
- View Dependent Claims (28, 29, 30, 34, 35, 36)
- - 28. The apparatus as in claim 27 wherein the first speech classifier includes a fuzzy split matrix quantizer, and the second speech classifier includes a fuzzy split vector quantizer.
  - 29. The apparatus as in claim 27 wherein the first speech classifier is capable of operating on each of the S₁ partitions of the first parameters of the input signal using respective distance measures to relate the respective partitioned first parameters to partitioned first reference data, and the second speech classifier is capable of operating on each of the S₁ partitions of the second parameters of the input signal using respective distance measures to relate the respective partitioned second parameters to partitioned second reference data.
  - 30. The apparatus as in claim 29 wherein at least one of the S₁ partitions of first parameters of the input signal are corrupted by noise and the respective distance measure to relate the respective noise corrupted first parameters to partitioned first reference data has noise rejection features;
    - andwherein at least one of the S₂ partitions of the second parameters of the input signal are corrupted by noise and the respective distance measure to relate the respective noise corrupted second parameters to partitioned second reference data has noise rejection features.
  - 34. The apparatus as in claim 27 wherein:
    - the first parameters of a first partition of the S₁ partitions of the input signal each respectively include N₁ order line spectral pairs of the input signal and of a second partition of the S₁ partitions of the input signal include N₁ +1 to P order line spectral pairs, wherein P is an integer;
      
      the first speech classifier is capable of determining a respective distance measure between an ith line spectral pair frequency of the first parameters of the input signal and respective ith order line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the first parameters of the input signal and codewords;
      
      the second parameters of a first partition of the S₂ partitions of the input signal each respectively include N₂ order line spectral pairs of the input signal and of a second partition of the S₂ partitions of the input signal include N₂ +1 to P order line spectral pairs, wherein P is an integer; and
      
      the second speech classifier is capable of determining a respective distance measure between an ith line spectral pair frequency of the second parameters of the input signal and respective ith order line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₂, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₂ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the second parameters of the input signal and codewords.
  - 35. The apparatus as in claim 34 wherein:
    - the distance measure, d(ƒ
      
      ,ƒ
      
      ), between the first parameters of the input signal, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU25## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters of the first parameters of the input signal and respective codewords, respectively, α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are constants, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the first parameters of the first partition of the S₁ partitions of the input signal; and
      
      the distance measure, d(ƒ
      
      ,ƒ
      
      ), between the second parameters of the input signal, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU26## wherein ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters of the second parameters of the input signal and respective codewords, respectively, α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are constants, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the second parameters of the input signal.
  - 36. The apparatus as in claim 35 wherein the constants α
    - ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize classification error.

37. A method comprising:
- partitioning first parameters of an input signal into S₁ partitions, wherein the parameters include frequency and time domain parameters;
  
  processing the partitioned first parameters of the input signal using a first speech classifier to relate the partitioned first parameters to first reference data;
  
  providing first output data relating the input signal to first reference data, wherein the first output data is provided from the first speech classifier to a second speech classifier;
  
  processing the first output data using the second speech classifier;
  
  providing second output data from the second speech classifier;
  
  partitioning second parameters of the input signal into S₂ partitions, wherein the parameters include frequency domain parameters;
  
  processing the partitioned second parameters of the input signal using a third speech classifier to relate the partitioned second parameters to second reference data;
  
  providing third output data relating the input signal to the second reference data, wherein the third output data is provided from the third speech classifier to a fourth speech classifier;
  
  processing the third output data using the fourth speech classifier;
  
  providing fourth output data from the fourth speech classifier;
  
  combining the third output data and fourth output data to compensate for speech classification errors in the third output data; and
  
  classifying the input signal as recognized speech.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 38. The method as in claim 37 wherein:
    - the first parameters of a first partition of the S₁ partitions of the input signal include N₁ order line spectral pairs and of a second partition of the S₁ partitions of the input signal include N₁ +1 to P order line spectral pairs, wherein P is an integer;
      
      the second parameters of a first partition of the S₂ partitions of the input signal each respectively include N₁ order line spectral pairs of the input signal and of a second partition of the S₂ partitions of the input signal include N₁ +1 to P order line spectral pairs, wherein P is an integer;
      
      processing the first parameters of a first partition of the S₁ partitions of the input signal comprises;
      
      determining a first distance measure between an ith line spectral pair frequency of the first parameters of the input signal and respective ith order line spectral pair frequencies of a plurality of first codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the first codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the first parameters of the input signal and the first codewords; and
      
      processing the second parameters of a first partition of the S₂ partitions of the input signal comprises;
      
      determining a second distance measure between an ith line spectral pair frequency of the second parameters of the input signal and respective ith order line spectral pair frequencies of a plurality of second codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the second codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the second parameters of the input signal and the second codewords.
  - 39. The method as in claim 37 wherein:
    - the first and second reference data each represent u vocabulary words, and u is an integer;
      
      the first output data includes a first observation sequence, O_Vn, relating the input signal to the reference data;
      
      the second speech classifier includes a first set of n hidden Markov models;
      
      the second output data includes probabilities, Pr(O_Vn |λ
      
      _Vn), n=1, 2, . . . , u, related to respective ones of the first set of n hidden Markov models, λ
      
      _Vn, and the first observation sequence, O_Vn ;
      
      the third output data includes a second observation sequence, O_Mn, relating the input signal to the reference data;
      
      the fourth speech classifier includes a second set of n hidden Markov models;
      
      the fourth output data includes probabilities, Pr(O_Mn |λ
      
      _Mn), n=1, 2, . . . , u, related to respective ones of the second set of n hidden Markov models, λ
      
      _Mn, and the second observation sequence, O_Mn ;
      
      combining the third output data and fourth output data comprises;
      
      combining the probabilities Pr(O_Vn |λ
      
      _Vn) and Pr(O_Mn |λ
      
      _Mn) into a combination, D(n), wherein D(n)=α
      
      Pr(O_Mn |λ
      
      _Mn)+Pr(O_Vn |λ
      
      _Vn), n=1, 2, . . . , u and α
      
      is a weighting factor to allow Pr(O_Vn |λ
      
      _Vn) to compensate for speech classification errors in Pr(O_Mn |λ
      
      _Mn); and
      
      classifying the input signal as recognized speech comprises;
      
      classifying the input signal as the ith of the u vocabulary words when D(i) represents the highest probability that the input signal is the ith vocabulary word.
  - 40. The method as in claim 37 wherein combining the third output data and fourth output data comprises:
    - weighting the second output data; and
      
      adding the weighted second output data to the first output data.
  - 41. The method as in claim 37 wherein partitioning first parameters of an input signal into S₁ partitions comprises:
    - partitioning the first parameters of the input signal to group at least one subset of the first parameters which are generally corrupted by localized noise.
  - 42. The method as in claim 41 wherein the first parameters and first reference data include respective corresponding line spectral pair frequencies, the second parameters and second reference data include respective corresponding line spectral pair frequencies, and the subset of the first parameters which are generally corrupted by localized noise are the mth through nth line spectral frequencies, processing the partitioned first parameters further comprising:
    - matrix quantizing the mth through nth line spectral frequencies of the first parameters using a distance measure proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order first reference data line spectral pair frequencies and (ii) a weighting of the difference by an ith frequency weighting factor, wherein m is less than or equal to i, and n is greater than or equal to i; and
      
      processing the second parameters further comprising;
      
      vector quantizing the mth through nth line spectral frequencies of the second parameters using a distance measure proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order second reference data line spectral pair frequencies and (ii) a weighting of the difference by an ith frequency weighting factor, wherein m is less than or equal to i.
  - 43. The method as in claim 37 wherein processing the partitioned first parameters of the input signal comprises:
    - matrix quantizing each of the partitioned first parameters of the input signal; and
      
      processing second parameters of the input signal comprises;
      
      vector quantizing each of the partitioned second parameters of the input signal.
  - 44. The method as in claim 43 wherein processing the partitioned first parameters of the input signal further comprises:
    - determining first respective input signal recognition probabilities from a plurality of first hidden Markov models; and
      
      wherein processing the partitioned second parameters of the input signal further comprises;
      
      determining second respective input signal recognition probabilities from a plurality of second hidden Markov models.
  - 45. The method as in claim 43 wherein matrix quantizing further comprises:
    - fuzzy matrix quantizing each of the partitioned first parameters of the input signal; and
      
      wherein vector quantizing further comprises;
      
      fuzzy vector quantizing each of the partitioned second parameters of the input signal.
  - 46. The method as in claim 45 wherein fuzzy matrix quantizing further comprises:
    - fuzzy matrix quantizing each of the partitioned first parameters of the input signal using a first codebook; and
      
      wherein fuzzy vector quantizing further comprises;
      
      fuzzy vector quantizing each of the partitioned second parameters of the input signal using a second single codebook.

47. A method of recognizing speech comprising:
- receiving an input signal;
  
  determining parameters of the input signal;
  
  split vector quantizing the parameters of the input signal to obtain first quantization output data;
  
  classifying the first quantization output data using a first probabilistic process;
  
  split matrix quantizing the parameters of the input signal to obtain second quantization output data;
  
  classifying the second quantization output data using a second probabilistic process; and
  
  generating an identification of the input signal as recognized speech based upon the classification of the first and second quantization output data.
- View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55)
- - 48. The method as in claim 47 wherein generating the identification of the input signal further comprises:
    - weighting the classification of the first quantization output data; and
      
      adding a the weighted classification of the first quantization output data and the classification of the second quantization output data.
  - 49. The method as in claim 47 wherein determining parameters of the input signal comprises:
    - determining P order line spectral pairs for each of TO frames of the input signal.
  - 50. The method as in claim 47 wherein split vector quantizing further comprises:
    - split vector quantizing the parameters of the input signal using a first single codebook; and
      
      wherein split matrix quantizing further comprises;
      
      split matrix quantizing the parameters of the input signal using a second single codebook.
  - 51. The method as in claim 47 wherein split vector quantizing further comprises:
    - fuzzy split vector quantizing the parameters of the input signal, wherein the first quantization output data is fuzzy data; and
      
      wherein split matrix quantizing further comprises;
      
      fuzzy split matrix quantizing the parameters of the input signal, wherein the second quantization output data is fuzzy data.
  - 52. The method as in claim 47 wherein:
    - the identification of the input signal is one of u vocabulary words, and u is an integer;
      
      the first quantization output data is a first observation sequence, O_Vn, relating the input signal to the u vocabulary words;
      
      classifying the first quantization output data comprises;
      
      determining probabilities, Pr(O_Vn |λ
      
      _Vn), n=1, 2, . . . , u, related to respective ones of a first set of n hidden Markov models, λ
      
      _Vn, and the first observation sequence, O_Vn ;
      
      the second quantization output data is a second observation sequence, O_Mn, relating the input signal to the u vocabulary words;
      
      classifying the first quantization output data comprises;
      
      determining probabilities, Pr(O_Mn |λ
      
      _Mn), n=1, 2, . . . , u, related to respective ones of a second set of n hidden Markov models, λ
      
      _Mn, and the second observation sequence, O_Mn ; and
      
      generating an identification of the input signal further comprises;
      
      combining the probabilities Pr(O_Vn |λ
      
      _Vn) and Pr(O_Mn |λ
      
      _Mn) into a combination, D(n), wherein D(n)=α
      
      Pr(O_Mn |λ
      
      _Mn)+Pr(O_Vn |λ
      
      _Vn), n=1, 2, . . . , u and α
      
      is a weighting factor to allow Pr(O_Vn |λ
      
      _Vn) to compensate for speech classification errors in Pr(O_Mn |λ
      
      _Mn), and the identification of the input signal is the ith of the u vocabulary words when D(i) represents the highest probability that the input signal is the ith vocabulary word.
  - 53. The method as claim 47 wherein:
    - the parameters of the input signal include a first partition of the input signal into N₁ order line spectral pairs and a second partition into N₁ +1 to P order line spectral pairs of the input signal, wherein P is an integer; and
      
      split vector quantizing the parameters of the input signal comprises;
      
      determining a first distance measure between an ith line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of first codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the first codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the input signal and the first codewords; and
      
      matrix quantizing the parameters of the input signal comprises;
      
      determining a second distance measure between an ith line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of second codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the second codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the input signal and the second codewords.
  - 54. The method as in claim 53 wherein the first distance measures, d(ƒ
    - ,ƒ
      
      ), between the input signal parameters, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU27## wherein d(ƒ
      
      ,ƒ
      
      ), ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the input signal and the respective first codewords, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize respective processing error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal; and
      
      the second distance measures, d(ƒ
      
      ,ƒ
      
      ), between the input signal parameters, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU28## wherein d(ƒ
      
      ,ƒ
      
      ), ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the input signal and the respective second codewords, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize respective processing error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal.
  - 55. The apparatus as in claim 54 wherein the constants α
    - ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize quantization error.

57. A method of recognizing speech comprising:
- receiving an input signal;
  
  determining D order line spectral pairs for TO frames of the input signal, wherein D and TO are integers;
  
  determining parameters related to the energy of the input signal, wherein the parameters related to the energy of the input signal include the input signal energy and a first derivative of the input signal energy;
  
  split vector quantizing the D order line spectral pairs for each of the TO frames and the parameters related to the input signal energy;
  
  classifying the input signal using the split vector quantization of the D order line spectral pairs;
  
  split matrix quantizing the D order line spectral pairs and the parameters related to the input signal energy for T matrices of frames of the input signal, wherein T is defined as int(TO|N), and N is the number for input signal frames represented in each of the T matrices;
  
  classifying the input signal using the split matrix quantization of the D order line spectral pairs and parameters related to the input signal energy;
  
  combining the classifications of the input signal to generate a combination of the classifications; and
  
  recognizing the input signal as particular speech from the combination of the classifications.
- View Dependent Claims (58)
- - 58. The method as in claim 57 wherein:
    - split vector quantizing the D order line spectral pairs comprises;
      
      determining a first distance measure between an ith line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of first codewords, wherein the distance measure, for i=1 to N₁ in a first partition of the D order line spectral pairs, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the first codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₁ is greater than or equal to one and less than or equal to D, and D is the highest order line spectral pair frequency of the input signal and the first codewords;
      
      split matrix quantizing the D order line spectral pairs comprises;
      
      determining a second distance measure between an ith line spectral pair frequency of the input signal and respective ith order line spectral pair frequencies of a plurality of second codewords, wherein the distance measure, for i=1 to N₂ in a second partition of the D order line spectral pairs, is proportional to (i) a difference between the ith input signal line spectral pair frequencies and the ith order line spectral pair frequencies of the second codewords and (ii) a shift of the difference by an ith frequency shifting factor, wherein N₂ is greater than or equal to one and less than or equal to D, and D is the highest order line spectral pair frequency of the input signal and the second codewords;
      
      the first distance measures, d(ƒ
      
      ,ƒ
      
      ), between the input signal parameters, ƒ and
      
      the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU30## wherein d(ƒ
      
      ,ƒ
      
      ), ƒ
      
      _i and are the ith line spectral pair frequency parameters in the input signal and the respective first codewords, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize respective processing error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal; and
      
      the second distance measures, d(ƒ
      
      ,ƒ
      
      ), between the input signal parameters, ƒ
      
      , and the reference data parameters, ƒ
      
      , is defined by;
      
      ##EQU31## wherein d(ƒ
      
      ,ƒ
      
      ), ƒ
      
      _i and ƒ
      
      _i are the ith line spectral pair frequency parameters in the input signal and the respective second codewords, the constants α
      
      ₁, α
      
      ₂, β
      
      ₁ and β
      
      ₂ are set to substantially minimize respective processing error, and e_i is the error power spectrum of the input signal and a predicted input signal at the ith line spectral pair frequency of the input signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hanger Solutions, LLC (IP Investments Group LLC)
Original Assignee
Advanced Micro Devices, Inc.
Inventors
Cong, Lin, Asghar, Safdar M.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US08/957,903
Time in Patent Office

939 Days
Field of Search

704/222, 704/256, 704/243, 704/251
US Class Current

704/243
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 15/20   Speech recognition techniqu...

Split matrix quantization with split vector quantization error compensation and selective enhanced processing for robust speech recognition

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

58 Claims

Specification

Solutions

Use Cases

Quick Links

Split matrix quantization with split vector quantization error compensation and selective enhanced processing for robust speech recognition

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

58 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links