Matrix quantization with vector quantization error compensation and neural network postprocessing for robust speech recognition

US 6,347,297 B1
Filed: 10/05/1998
Issued: 02/12/2002
Est. Priority Date: 10/05/1998
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

a vector quantizer to receive first parameters of an input signal and to generate a first quantization observation sequence;

a first speech classifier to receive the first quantization observation sequence from the vector quantizer and to generate first respective speech classification output data;

a matrix quantizer to receive second parameters of the input signal, and to generate a second quantization observation sequence;

a second speech classifier to receive the second quantization observation sequence from the matrix quantizer and to generate second respective speech classification output data;

a mixer to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to generate output data from the first, second, and third speech classification data; and

a neural network to receive output data from the mixer and to determine fourth respective speech classification output data.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system utilizes both matrix and vector quantizers as front ends to a second stage speech classifier such as hidden Markov models (HMMs) and utilizes neural network postprocessing to, for example, improve speech recognition performance. Matrix quantization exploits the “evolution” of the speech short-term spectral envelopes as well as frequency domain information, and vector quantization (VQ) primarily operates on frequency domain information. Time domain information may be substantially limited which may introduce error into the matrix quantization, and the VQ may provide error compensation. The matrix and vector quantizers may split spectral subbands to target selected frequencies for enhanced processing and may use fuzzy associations to develop fuzzy observation sequence data. A mixer provides a variety of input data to the neural network for classification determination. The neural network'"'"'s ability to analyze the input data generally enhances recognition accuracy. Fuzzy operators may be utilized to reduce quantization error. Multiple codebooks may also be combined to form single respective codebooks for split matrix and split vector quantization to reduce processing resources demand.

87 Citations

View as Search Results

44 Claims

1. A speech recognition system comprising:
- a vector quantizer to receive first parameters of an input signal and to generate a first quantization observation sequence;
  
  a first speech classifier to receive the first quantization observation sequence from the vector quantizer and to generate first respective speech classification output data;
  
  a matrix quantizer to receive second parameters of the input signal, and to generate a second quantization observation sequence;
  
  a second speech classifier to receive the second quantization observation sequence from the matrix quantizer and to generate second respective speech classification output data;
  
  a mixer to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to generate output data from the first, second, and third speech classification data; and
  
  a neural network to receive output data from the mixer and to determine fourth respective speech classification output data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The speech recognition system as in claim 1 wherein the first and second speech classifiers are a first and second set, respectively, of hidden Markov models.
  - 3. The speech recognition system as in claim 1, wherein the vector quantizer is a split vector quantizer and the first parameters are partitioned into S₁partitions, wherein S₁is greater than 1.
  - 4. The speech recognition system as in claim 1 wherein the matrix quantizer is a split matrix quantizer and the second parameters are partitioned into S₂partitions, wherein S₂is greater than 1.
  - 5. The speech recognition system as in claim 1 wherein the vector and matrix quantizers utilize respective single codebooks.
  - 6. The speech recognition system as in claim 1 wherein the input signal for reception by the vector quantizer and matrix quantizer is a spoken word.
  - 7. The speech recognition system as in claim 1 wherein the vector and matrix quantizers are split vector and split matrix quantizers, respectively, and the first parameters are partitioned into S₁partitions and the second parameters are partitioned into S₂partitions.
  - 8. The speech recognition system as in claim 7 wherein the split vector quantizer is capable of partitioning the first parameters to separate first parameters primarily affected by localized noise from the remaining first parameters, and the split matrix quantizer is capable of partitioning the second parameters to separate second parameters primarily affected by localized noise from the remaining second parameters.
  - 9. The speech recognition system as in claim 8 wherein the first and second parameters include line spectral pair coefficients, S₁and S₂equal two, the first parameters in a first submatrix of the split vector quantizer include the first N₁of P order line spectral pair coefficients, and the second parameters in a first submatrix of the split matrix quantizer include the first N₁of P order line spectral pair coefficients.
  - 10. The speech recognition system as in claim 9 wherein the split vector and split matrix quantizers respectively are capable of determining a distance measure between an i^thline spectral pair frequency of the input signal and respective i^thorder line spectral pair frequencies of a plurality of codewords, wherein the distance measure, for i=1 to N₁, is proportional to (i) a difference between the i^thinput signal line spectral pair frequencies and the i^thborder line spectral pair frequencies of the codewords and (ii) a shift of the difference by an i^thfrequency shifting factor, wherein N₁is greater than or equal to one and less than or equal to P, and P is the highest order line spectral pair frequency of the input signal and codewords.
  - 11. The speech recognition system as in claim 10 wherein noise frequencies are primarily located in the frequency range substantially coinciding with the frequency range represented by line spectral pairs i=1 to N₁.
  - 12. The speech recognition system as in claim 9 wherein the split vector and split matrix quantizers include respective enhanced distance measures which are capable of operating on the first submatrix of the split vector quantizer and the first submatrix of the split matrix quantizer, respectively.
  - 13. The speech recognition system as in claim 1 wherein the first parameters of the input signal for reception by the vector quantizer include P order line spectral pairs of the input signal, and the second parameters of the input signal for reception by the matrix quantizer include temporally related P order line spectral pairs, wherein P is an integer.
  - 14. The speech recognition system as in claim 13 wherein P equals twelve.
  - 15. The speech recognition system as in claim 1 wherein the first parameters of the input signal include the energy of the input signal and first and second derivatives of the input signal energy.
  - 16. The speech recognition system as in claim 1 wherein the vector and matrix quantizers utilize fuzzy quantization.

17. A speech recognition system comprising:
- a vector quantizer to receive first parameters of an input signal and to generate a first quantization observation sequence, wherein the first parameters are grouped into S₁partition(s);
  
  a split matrix quantizer to receive second parameters of the input signal and to generate a second quantization observation sequence, wherein the second parameters are grouped into S₂partition(s);
  
  a first speech classifier to receive the first quantization observation sequence from the vector quantizer and generate first respective speech classification output data;
  
  a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data;
  
  a mixer to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to provide output data based on the first, second, and third classification data; and
  
  a neural network to receive the mixer output data and to generate fourth respective speech classification data based on the mixer output data.
- View Dependent Claims (18, 19)
- - 18. The speech recognition system as in claim 17 wherein S₁=S₂=1.
  - 19. The speech recognition system as in claim 17 wherein the first and second speech classifiers are a first and second set, respectively, of hidden Markov models.

20. An apparatus comprising:
- a first speech classifier to operate on S₁group(s) of first parameters of an input signal and to provide first output data relating the input signal to first reference data, wherein the first input signal parameters include frequency and time domain parameters, wherein S₁is a positive integer;
  
  a second speech classifier to operate on S₂group(s) of second parameters of the input signal and to provide second output data relating the second input signal to second reference data, wherein the second parameters of the input signal include the frequency domain parameters, wherein S₂is a positive integer;
  
  mixer to combine the first output data and the second output data into third output data so that the second output data compensates for errors in the first output data; and
  
  a neural network to receive selected output data from the mixer and to generate output data to classify the input signal.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 21. The apparatus as in claim 20 wherein S₁=S₂=1.
  - 22. The apparatus as in claim 20 wherein the first speech classifier is capable of operating on each of the S₁partitions of the first parameters of the input signal using respective distance measures to relate the respective partitioned first parameters to partitioned first reference data, and the second speech classifier is capable of operating on each of the S₂partitions of the second parameters of the input signal using respective distance measures to relate the respective partitioned second parameters to partitioned second reference data.
  - 23. The apparatus as in claim 22 wherein at least one of the S₁partitions of first parameters of the input signal are corrupted by noise and the respective distance measure to relate the respective noise corrupted first parameters to partitioned first reference data has noise rejection features;
    - and
24. The apparatus as in claim 20 wherein S₁is greater than one and S₂is greater than one.
25. The apparatus as in claim 20 wherein the first speech classifier includes a fuzzy split matrix quantizer, and the second speech classifier includes a fuzzy split vector quantizer.
26. The apparatus as in claim 25 wherein the first speech classifier further includes a first set of hidden Markov models, and the second speech classifier further includes a second set of hidden Markov models.
27. The apparatus as in claim 20 wherein the second speech classifier is capable of operating on frequency domain parameters of the input signal.
28. The apparatus as in claim 20 wherein the frequency domain parameters are P order line spectral pair frequencies, wherein P is an integer.
29. The apparatus as in claim 20 wherein the first and second parameters of the input signal further include input signal energy related parameters.

30. A method comprising the steps of:
- processing first parameters of the input signal to relate the first parameters to first reference data wherein the first parameters include frequency and time domain information;
  
  generating first output data relating the first parameters to reference data;
  
  processing second parameters of the input signal to relate the second parameters to second reference data wherein the second parameters include frequency domain information;
  
  generating second output data relating the second parameters to the second reference data;
  
  combining the first output data and second output data into third output data to compensate for errors in the first output data; and
  
  providing the first, second, and third output data to a neural network to classify the input signal.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 31. The method as in claim 30 further comprising the steps of:
32. The method as in claim 31 wherein the step of partitioning first parameters of an input signal into S₁groups comprises the step of:
- partitioning the first parameters of the input signal to group at least one subset of the first parameters which are generally corrupted by localized noise.
33. The method as in claim 32 wherein the step of partitioning first parameters of an input signal into S₁groups comprises the step of:
- partitioning the first parameters of the input signal to group at least one subset of the first parameters which are generally corrupted by localized noise.
34. The method as in claim 30 wherein the first parameters and first reference data include respective corresponding line spectral pair frequencies, the second parameters and second reference data include respective corresponding line spectral pair frequencies, and the subset of the first parameters which are generally corrupted by localized noise are the m^ththrough n^thline spectral frequencies, the step of processing the first parameters further comprising the step of:
- matrix quantizing the m^ththrough n^thline spectral frequencies of the first parameters using a distance measure proportional to (i) a difference between the i^thinput signal line spectral pair frequencies and the i^thorder first reference data line spectral pair frequencies and (ii) a weighting of the difference by an i^thfrequency weighting factor, wherein m is less than or equal to i, and n is greater than or equal to i; and
  
  the step of processing the second parameters further comprising the step of;
  
  vector quantizing the m^ththrough n^thline spectral frequencies of the second parameters using a distance measure proportional to (I) a difference between the i^thinput signal line spectral pair frequencies and the i^thorder second reference data line spectral pair frequencies and (ii) a weighting of the difference by an i^thfrequency weighting factor, wherein m is less than or equal to i.
35. The method as in claim 30 wherein the step of processing the first parameters of the input signal comprises the step of:
- matrix quantizing each of the partitioned first parameters of the input signal; and
  
  the step of processing second parameters of the input signal comprises the step of;
  
  vector quantizing each of the second parameters of the input signal.
36. The method as in claim 35 wherein the step of matrix quantizing further comprises the step of:
- fuzzy matrix quantizing each of the first parameters of the input signal; and
  
  wherein the step of vector quantizing further comprises the step of;
  
  fuzzy vector quantizing each of the second parameters of the input signal.
37. The method as in claim 36 wherein the step of fuzzy matrix quantizing further comprises the step of:
- fuzzy matrix quantizing each of the first parameters of the input signal using a first codebook; and
  
  wherein the step of fuzzy vector quantizing further comprises the step of;
  
  fuzzy vector quantizing each of the second parameters of the input signal using a second single codebook.
38. The method as in claim 35 wherein the step of processing the first parameters of the input signal further comprises the step of:
- determining first respective input signal recognition probabilities from a plurality of first hidden Markov models; and
  
  wherein the step of processing the second parameters of the input signal further comprises the step of;
  
  determining second respective input signal recognition probabilities from a plurality of second hidden Markov models.
39. The method as in claim 30 wherein the step of combining comprises the step of:
- weighting the second output data; and
  
  adding the weighted second output data to the first output data.

40. A method of recognizing speech comprising the steps of:
- receiving an input signal;
  
  determining parameters of the input signal;
  
  vector quantizing the parameters of the input signal to obtain first quantization output data;
  
  classifying the first quantization output data;
  
  matrix quantizing the parameters of the input signal to obtain second quantization output data;
  
  classifying the second quantization output data;
  
  combining the first and second quantization output data to generate third output data; and
  
  generating an identification of the input signal with a neural network based upon the classification of the first and second quantization output data and the third output data.
- View Dependent Claims (41, 42, 43)
- - 41. The method as in claim 40 wherein the step of generating the identification of the input signal further comprises the steps of:
42. The method as in claim 40 wherein the step of determining parameters of the input signal comprises the step of:
- determining P order line spectral pairs for each of TO frames of the input signal.
43. The method as in claim 40 wherein the step of vector quantizing further comprises the step of:
- fuzzy split vector quantizing the parameters of the input signal, wherein the first quantization output data is fuzzy data; and
  
  wherein the step of matrix quantizing further comprises the step of;
  
  fuzzy split matrix quantizing the parameters of the input signal, wherein the second quantization output data is fuzzy data.

44. A method of recognizing speech comprising the steps of:
- receiving an input signal;
  
  determining D order line spectral pairs for TO frames of the input signal, wherein D and TO are integers;
  
  determining parameters related to the energy of the input signal, wherein the parameters related to the energy of the input signal include the input signal energy and a first derivative of the input signal energy;
  
  vector quantizing the D order line spectral pairs for each of the TO frames and the parameters related to the input signal energy;
  
  classifying the input signal using the vector quantization of the D order line spectral pairs;
  
  matrix quantizing the D order line spectral pairs and the parameters related to the input signal energy for T matrices of frames of the input signal, wherein T is defined as int(TO/N), and N is the number for input signal frames represented in each of the T matrices;
  
  classifying the input signal using the matrix quantization of the D order line spectral pairs and parameters related to the input signal energy;
  
  combining the classifications of the input signal and providing the individual classifications of the input signal and the combined classification of the input signal to a neural network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Corporation
Original Assignee
Legerity Incorporated (Microsemi Semiconductor Corp.)
Inventors
Cong, Lin, Asghar, Safdar M.
Primary Examiner(s)
{haeck over (S)}mits, Ta̅livaldis Ivars
Assistant Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/166,640
Time in Patent Office

1,226 Days
Field of Search

704/222, 704/230, 704/231, 704/232, 704/256, 704/243
US Class Current

704/243
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 15/144 Training of HMMs

Matrix quantization with vector quantization error compensation and neural network postprocessing for robust speech recognition

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

87 Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Matrix quantization with vector quantization error compensation and neural network postprocessing for robust speech recognition

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

87 Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links