Matrix quantization with vector quantization error compensation for robust speech recognition
First Claim
1. A speech recognition system comprising:
- a vector quantizer to receive first parameters of an input signal and generate a first quantization observation sequence;
a first speech classifier to receive the first quantization observation sequence from the vector quantizer and generate first respective speech classification output data;
a matrix quantizer to receive second parameters of the input signal and generate a second quantization observation sequence;
a second speech classifier to receive the second quantization observation sequence from the matrix quantizer and generate second respective speech classification output data; and
a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data.
4 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system utilizes both matrix and vector quantizers as front ends to a second stage speech classifier. Matrix quantization exploits input signal information in both frequency and time domains, and the vector quantizer primarily operates on frequency domain information. However, in some circumstances, time domain information may be substantially limited which may introduce error into the matrix quantization. Information derived from vector quantization may be utilized by a hybrid decision generator to error compensate information derived from matrix quantization. Additionally, fuzz methods of quantization and robust distance measures may be introduced to also enhance speech recognition accuracy. Furthermore, other speech classification stages may be used, such as hidden Markov models which introduce probabilistic processes to further enhance speech recognition accuracy. Multiple codebooks may also be combined to form single respective codebooks for matrix and vector quantization to lessen the demand on processing resources.
-
Citations
44 Claims
-
1. A speech recognition system comprising:
-
a vector quantizer to receive first parameters of an input signal and generate a first quantization observation sequence; a first speech classifier to receive the first quantization observation sequence from the vector quantizer and generate first respective speech classification output data; a matrix quantizer to receive second parameters of the input signal and generate a second quantization observation sequence; a second speech classifier to receive the second quantization observation sequence from the matrix quantizer and generate second respective speech classification output data; and a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A speech recognition system comprising:
-
a vector quantizer to receive line spectral pair input data corresponding to an input speech signal and to generate a first quantization observation sequence; first hidden Markov models to receive the first quantization observation sequence from the vector quantizer and generate first respective speech recognition probabilities from each of the first hidden Markov models; a matrix quantizer to receive temporally associated line spectral pair input data corresponding to the input speech signal and to generate a second quantization observation sequence; second hidden Markov models to receive the second quantization observation sequence from the matrix quantizer and generate second respective speech recognition probabilities from each of the second hidden Markov models; and a hybrid decision generator to utilize the first and second respective speech recognition probabilities to combine corresponding first and second speech recognition probabilities and to recognize the input signal from the combined corresponding first and second speech recognition probabilities. - View Dependent Claims (17, 18, 19)
-
-
20. An apparatus comprising:
-
a first speech classifier to operate on first parameters of an input signal and provide first output data relating the input signal to reference data, wherein the input signal parameters include frequency and time domain parameters, wherein the first speech classifier further includes a first set of hidden Markov models; a second speech classifier to operate on second parameters of the input signal and to provide second output data relating the input signal to the reference data, wherein the second parameters of the input signal include the frequency domain parameters, the second speech classifier further includes a second set of hidden Markov models; and a hybrid decision generator to combine the first output data and the second output data so that the second output data compensates for errors in the first output data and to generate third output data to classify the input signal. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27)
-
-
28. A method comprising:
-
processing first parameters of an input signal using a first speech classifier, wherein the parameters include frequency and time domain parameters; providing first output data relating the input signal to reference data, wherein the first output data is provided from the first speech classifier to a second speech classifier; processing the first output data using the second speech classifier; providing second output data from the second speech classifier; processing second parameters of the input signal using a third speech classifier, wherein the parameters include frequency domain parameters; providing third output data relating the input signal to the reference data, wherein the third output data is provided from the third speech classifier to a fourth speech classifier; processing the third output data using the fourth speech classifier;
providing fourth output data from the fourth speech classifier;combining the third output data and fourth output data to compensate for speech classification errors in the third output data; and classifying the input signal as recognized speech. - View Dependent Claims (29, 30, 31, 32, 33)
-
-
34. A method of recognizing speech comprising:
-
receiving an input signal; determining parameters of the input signal; vector quantizing the parameters of the input signal to obtain first quantization output data; classifying the first quantization output data; matrix quantizing the parameters of the input signal to obtain second quantization output data; classifying the second quantization output data; and generating an identification of the input signal as recognized speech based upon the classification of the first and second quantization output data. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. A method of recognizing speech comprising the steps of:
-
receiving an input signal; determining P order line spectral pairs for TO frames of the input signal, wherein P and TO are integers; vector quantizing the P order line spectral pairs for each of the TO frames; classifying the input signal using the vector quantization of the P order line spectral pairs; matrix quantizing the P order line spectral pairs for T matrices of frames of the input signal, wherein T is defined as int(TO/N), and N is the number for input signal frames represented in each of the T matrices; classifying the input signal using the matrix quantization of the P order line spectral pairs; combining the classifications of the input signal to generate a combination of the classifications; and recognizing the input signal as particular speech from the combination of the classifications. - View Dependent Claims (44)
-
Specification