Split matrix quantization with split vector quantization error compensation and selective enhanced processing for robust speech recognition
First Claim
1. A speech recognition system comprising:
- a split vector quantizer to receive first parameters of an input signal, to partition the first parameters into S1 partitions, and to generate a first quantization observation sequence;
a first speech classifier to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech classification output data;
a split matrix quantizer to receive second parameters of the input signal, to partition the second parameters into S2 partitions, and generate a second quantization observation sequence;
a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data; and
a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data.
11 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system utilizes both split matrix and split vector quantizers as front ends to a second stage speech classifier such as hidden Markov models (HMMs) to, for example, efficiently utilize processing resources and improve speech recognition performance. Fuzzy split matrix quantization (FSMQ) exploits the "evolution" of the speech short-term spectral envelopes as well as frequency domain information, and fuzzy split vector quantization (FSVQ) primarily operates on frequency domain information. Time domain information may be substantially limited which may introduce error into the matrix quantization, and the FSVQ may provide error compensation. Additionally, acoustic noise influence may affect particular frequency domain subbands. This system also, for example, exploits the localized noise by efficiently allocating enhanced processing technology to target noise-affected input signal parameters and minimize noise influence. The enhanced processing technology includes a weighted LSP and signal energy related distance measure in training Linde-Buzo-Gray (LBG) algorithm and during recognition. Multiple codebooks may also be combined to form single respective codebooks for split matrix and split vector quantization to lower processing resources demand.
-
Citations
58 Claims
-
1. A speech recognition system comprising:
-
a split vector quantizer to receive first parameters of an input signal, to partition the first parameters into S1 partitions, and to generate a first quantization observation sequence; a first speech classifier to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech classification output data; a split matrix quantizer to receive second parameters of the input signal, to partition the second parameters into S2 partitions, and generate a second quantization observation sequence; a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data; and a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A speech recognition system comprising:
-
a split vector quantizer to receive first parameters of an input signal, to partition the first parameters into S1 partitions to generate first quantization output data, wherein the first quantization output data includes a first observation sequence; a first speech classifier to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech classification output data; a split matrix quantizer to receive second parameters of the input signal, to partition the second parameters into S2 partitions to generate second quantization output data, wherein the second quantization output data includes a second observation sequence; and a second speech classifier to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech classification output data; and a hybrid decision generator to combine corresponding first and second respective speech classification data to generate third respective speech classification data and to recognize the input signal from the third respective speech classification data. - View Dependent Claims (21, 22, 23, 31, 32, 33, 56)
-
-
24. A speech recognition system comprising:
-
a split vector quantizer to receive line spectral pair input data corresponding to an input speech signal and to generate a first quantization observation sequence; first hidden Markov models to receive the first quantization observation sequence from the split vector quantizer and generate first respective speech recognition probabilities from each of the first hidden Markov models; a split matrix quantizer to receive temporally associated line spectral pair input data corresponding to the input speech signal and to generate a second quantization observation sequence; second hidden Markov models to receive the second quantization observation sequence from the split matrix quantizer and generate second respective speech recognition probabilities from each of the second hidden Markov models; and a hybrid decision generator to utilize the first and second respective speech recognition probabilities to generate input signal recognition information and to recognize the input speech signal from the input signal recognition information. - View Dependent Claims (25, 26)
-
-
27. An apparatus comprising:
-
a first speech classifier to operate on S1 partitions of first parameters of an input signal and to provide first output data relating the input signal to first reference data, wherein the input signal parameters include frequency and time domain parameters, wherein S1 is an integer greater than one and the first speech classifier further includes a first set of hidden Markov models; a second speech classifier to operate on S2 partitions of second parameters of the input signal and to provide second output data relating the input signal to second reference data, wherein the second parameters of the input signal include the frequency domain parameters, wherein S2 is an integer greater than one and the second speech classifier further includes a second set of hidden Markov models; and a hybrid decision generator to combine the first output data and the second output data so that the second output data compensates for errors in the first output data and to generate third output data to classify the input signal. - View Dependent Claims (28, 29, 30, 34, 35, 36)
-
-
37. A method comprising:
-
partitioning first parameters of an input signal into S1 partitions, wherein the parameters include frequency and time domain parameters; processing the partitioned first parameters of the input signal using a first speech classifier to relate the partitioned first parameters to first reference data; providing first output data relating the input signal to first reference data, wherein the first output data is provided from the first speech classifier to a second speech classifier; processing the first output data using the second speech classifier; providing second output data from the second speech classifier; partitioning second parameters of the input signal into S2 partitions, wherein the parameters include frequency domain parameters; processing the partitioned second parameters of the input signal using a third speech classifier to relate the partitioned second parameters to second reference data; providing third output data relating the input signal to the second reference data, wherein the third output data is provided from the third speech classifier to a fourth speech classifier; processing the third output data using the fourth speech classifier; providing fourth output data from the fourth speech classifier;
combining the third output data and fourth output data to compensate for speech classification errors in the third output data; andclassifying the input signal as recognized speech. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A method of recognizing speech comprising:
-
receiving an input signal;
determining parameters of the input signal;split vector quantizing the parameters of the input signal to obtain first quantization output data; classifying the first quantization output data using a first probabilistic process; split matrix quantizing the parameters of the input signal to obtain second quantization output data; classifying the second quantization output data using a second probabilistic process; and generating an identification of the input signal as recognized speech based upon the classification of the first and second quantization output data. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55)
-
-
57. A method of recognizing speech comprising:
-
receiving an input signal; determining D order line spectral pairs for TO frames of the input signal, wherein D and TO are integers; determining parameters related to the energy of the input signal, wherein the parameters related to the energy of the input signal include the input signal energy and a first derivative of the input signal energy; split vector quantizing the D order line spectral pairs for each of the TO frames and the parameters related to the input signal energy; classifying the input signal using the split vector quantization of the D order line spectral pairs; split matrix quantizing the D order line spectral pairs and the parameters related to the input signal energy for T matrices of frames of the input signal, wherein T is defined as int(TO|N), and N is the number for input signal frames represented in each of the T matrices; classifying the input signal using the split matrix quantization of the D order line spectral pairs and parameters related to the input signal energy; combining the classifications of the input signal to generate a combination of the classifications; and recognizing the input signal as particular speech from the combination of the classifications. - View Dependent Claims (58)
-
Specification