Speech recognition using preclassification and spectral normalization
First Claim
1. A speech recognition system for recognizing units of speech input comprising:
- means for generating first speech vectors characteristic of units of a speech input;
means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;
means for generating second speech vectors characteristic of units of speech input normalized with respect to the first speech vectors; and
means, responsive to the means for comparing the first speech vectors and to the means for generating the second speech vectors, for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors.
4 Assignments
0 Petitions
Accused Products
Abstract
A two stage classification process is used in a speech recognition system. In the first stage, a slope vector template is generated from an extended LPC analysis using a universal bandwidth expansion technique. Using a dynamic programming technique, that first vector template identifies a subset of the overall vocabulary of the system. The speech signal is inverse filtered using the slope vector and a second LPC analysis is performed on the slope removed speech. The LPC vector is applied to an all-pass filter for initial nonlinear spectral shift of the speech. Final classification is then based on a normalizing spectral warp routine within a dynamic time warp program. The spectral warp is based on a closed form, near log transformation.
168 Citations
51 Claims
-
1. A speech recognition system for recognizing units of speech input comprising:
-
means for generating first speech vectors characteristic of units of a speech input; means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors; means for generating second speech vectors characteristic of units of speech input normalized with respect to the first speech vectors; and means, responsive to the means for comparing the first speech vectors and to the means for generating the second speech vectors, for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A speech recognition system for recognizing units of speech input comprising:
-
means for generating speech vectors characteristic of units of speech input; and means for comparing the speech vectors with reference vectors corresponding to a set of reference speech units, the means for comparing including spectral warp means for causing a nonlinear spectral shift of the frequency characteristic of each vector of the speech vectors relative to the frequency characteristics of the reference vectors in a closed form transformation to generate a spectrally warped vector which provides closer correspondence between the speech and reference vectors, a single predetermined transformation function being selected for an entire spectrum of a frame of speech samples. - View Dependent Claims (20, 21, 22)
-
-
23. A system for generating coefficients of an inverse filter corresponding to the slope of the frequency characteristics of a linear predictive coding (LPC) vector comprising:
-
means for performing an LPC analysis to generate linear prediction coefficients of an LPC inverse filter; and filter estimate means for generating the coefficients of the inverse filter corresponding to the slope by concatenating, with bandwidth expansion, the LPC inverse filter with itself. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A speech recognition system for recognizing units of speech input comprising:
-
first linear predictive coding (LPC) analysis means for generating first speech vectors characteristic of units of speech input; means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors; an inverse filter based on the first speech vectors for filtering the speech samples; second linear predictive coding analysis means, coupled to receive filtered speech samples from the inverse filter, for generating second speech vectors characteristic of units of speech input; and means for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors, the means for comparing comprising a dynamic time warp program which causes a nonlinear time shift of the second speech vectors relative to the second reference vectors to provide a closer correspondence between the speech and reference vectors, the dynamic time warp program including a spectral warp routine for causing a normalizing nonlinear spectral shift of the frequency characteristics of each vector of the second speech vectors relative to the frequency characteristics of the second reference vectors in a closed form transformation to generate a spectrally warped vector which provides a closer correspondence between the speech and reference vectors. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A method of recognizing units of speech input comprising:
-
generating first speech vectors characteristic of units of speech input; comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units to select a limited subset of reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors; generating from the speech input second speech vectors characteristic of units of speech input normalized with respect to the first speech vectors; and comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units to select the speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
-
-
43. A method of recognizing speech comprising:
-
generating speech vectors characteristic of speech samples; and comparing the speech vectors with reference vectors corresponding to a set of reference speech units to select a speech unit for which the reference vectors have the closest correspondence with the speech vectors, the comparison including the step of causing a nonlinear spectral shift of the frequency characteristics of each vector of the speech vectors relative to the frequency characteristics of the corresponding vector of the reference vectors in a closed form transformation to generate a spectrally warped vector which provides a closer correspondence between the speech and reference vectors, a single predetermined transformation function being selected for an entire spectrum of a frame of speech samples. - View Dependent Claims (44, 45, 46)
-
- 47. A method of generating coefficients of an inverse filter corresponding to the slope of the frequency characteristics of an LPC vector comprising performing an LPC analysis to generate linear prediction coefficients of an LPC inverse filter and concatenating, with bandwidth expansion, the LPC inverse filter with itself.
-
50. A speech recognition system for recognizing units of speech input comprising:
-
means for generating first speech vectors characteristic of units of a speech input, each speech vector defining the magnitude slope of the frequency characteristics of a frame of speech samples; means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors; means for generating second speech vectors characteristic of units of speech input; and means, responsive to the means for comparing the first speech vectors and to the means for generating the second speech vectors, for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors has the closest correspondence with the second speech vectors.
-
-
51. A method of recognizing units of speech input comprising:
-
generating first speech vectors characteristic of units of speech input, each speech vector defining the magnitude slope of the frequency characteristics of a frame of speech samples; comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units to select a limited subset of reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors; generating from the speech input second speech vectors characteristic of units of speech input; and comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units to select the speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors.
-
Specification