Speech recognition using preclassification and spectral normalization

US 4,941,178 A
Filed: 05/09/1989
Issued: 07/10/1990
Est. Priority Date: 04/01/1986
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system for recognizing units of speech input comprising:

means for generating first speech vectors characteristic of units of a speech input;

means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;

means for generating second speech vectors characteristic of units of speech input normalized with respect to the first speech vectors; and

means, responsive to the means for comparing the first speech vectors and to the means for generating the second speech vectors, for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A two stage classification process is used in a speech recognition system. In the first stage, a slope vector template is generated from an extended LPC analysis using a universal bandwidth expansion technique. Using a dynamic programming technique, that first vector template identifies a subset of the overall vocabulary of the system. The speech signal is inverse filtered using the slope vector and a second LPC analysis is performed on the slope removed speech. The LPC vector is applied to an all-pass filter for initial nonlinear spectral shift of the speech. Final classification is then based on a normalizing spectral warp routine within a dynamic time warp program. The spectral warp is based on a closed form, near log transformation.

168 Citations

51 Claims

1. A speech recognition system for recognizing units of speech input comprising:
- means for generating first speech vectors characteristic of units of a speech input;
  
  means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;
  
  means for generating second speech vectors characteristic of units of speech input normalized with respect to the first speech vectors; and
  
  means, responsive to the means for comparing the first speech vectors and to the means for generating the second speech vectors, for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. A speech recognition system as claimed in claim 1 wherein the first speech vectors define the magnitude slope of the frequency characteristics of a frame of speech samples.
  - 3. A speech recognition system as claimed in claim 1 wherein:
    - the means for generating first speech vectors comprises means for performing a first linear predictive coding (LPC) analysis of the speech samples, andthe means for generating second speech vectors comprises an inverse filter based on the first speech vectors for filtering the speech samples and means for performing a second linear predictive coding analysis of the filtered samples.
  - 4. A speech recognition system as claimed in claim 3 wherein the first speech vectors define the magnitude slopes of the frequency characteristics of frames of speech samples.
  - 5. A speech recognition system as claimed in claim 4 wherein the means for generating first speech vectors comprises means for concatenating, with bandwidth expansion, an inverse filter of the first LPC analysis.
  - 6. A speech recognition system as claimed in claim 5 wherein the first speech vectors correspond to an inverse filter defined by the function:
    - ##EQU11## where ##EQU12## r is a bandwidth broadening factor and a_n and a_i-n are the coefficients from the first LPC analysis.
  - 7. A speech recognition system as claimed in claim 6 wherein r=exp(π
    - DT) and D and T are respectively the bandwidth of the pole and the sampling interval.
  - 8. A speech recognition system as claimed in claim 4 wherein the means for generating the second speech vectors comprises means for providing bandwidth reduction of the inverse filter of the second LPC analysis.
  - 9. A speech recognition system as claimed in claim 4 wherein the means for comparing the second speech vectors comprises spectral warp means for causing a nonlinear spectral shift of the frequency characteristics of each vector of the speech vectors relative to the frequency characteristics of reference vectors in a closed form transformation to generate a spectrally warped vector which provides closer correspondence between the speech and reference vectors, a single predetermined transformation function being selected for an entire spectrum of a frame of speech samples.
  - 10. A speech recognition system as claimed in claim 9 wherein the spectral warp means comprises an all-pass filter.
  - 11. A speech recognition system as claimed in claim 9 wherein the means for comparing the second speech signals comprises a dynamic time warp program including the special warp means.
  - 12. A speech recognition system as claimed in claim 11 further comprising prewarp means for causing a nonlinear spectral shift of the frequency characteristics of each vector prior to the dynamic program.
  - 13. A speech recognition system as claimed in claim 3 wherein the means for generating the second speech vectors comprises means for providing bandwidth reduction of the inverse filter of the second LPC analysis.
  - 14. A speech recognition system claim in claim 13 wherein the means for providing bandwidth reduction produces vectors of the form ##EQU13## wherein r_c is a bandwidth reduction factor equal to exp (π
    - β
      
      _c T) where β
      
      _c is about 50 Hz and T is the sampling interval.
  - 15. A speech recognition system as claimed in claim 3 wherein the means for comparing the second speech signals comprises spectral warp means for causing a normalizing nonlinear spectral shift of the frequency characteristics of each vector of the second speech vectors relative to the frequency characteristics of second reference vectors in a closed form transformation to generate a spectrally warped vector which provides a closer correspondence between the speech and reference vectors, a single predetermined transformation function being selected for an entire spectrum of a frame of speech samples.
  - 16. A speech recognition system as claimed in claim 15 wherein the spectral warp means comprises an all-pass filter.
  - 17. A speech recognition system as claimed in claim 15 wherein the spectral warp means comprises a routine within a dynamic time warp program which causes a nonlinear time shift of the second speech vectors relative to second reference vectors to provide a closer correspondence between the second speech vectors and second reference vectors.
  - 18. A speech recognition system as claimed in claim 17 further comprising prewarp means for causing a nonlinear spectral shift of the frequency characteristics of each vector prior to the dynamic program.

19. A speech recognition system for recognizing units of speech input comprising:
- means for generating speech vectors characteristic of units of speech input; and
  
  means for comparing the speech vectors with reference vectors corresponding to a set of reference speech units, the means for comparing including spectral warp means for causing a nonlinear spectral shift of the frequency characteristic of each vector of the speech vectors relative to the frequency characteristics of the reference vectors in a closed form transformation to generate a spectrally warped vector which provides closer correspondence between the speech and reference vectors, a single predetermined transformation function being selected for an entire spectrum of a frame of speech samples.
- View Dependent Claims (20, 21, 22)
- - 20. A speech recognition system as claimed in claim 19 wherein the means for generating speech vectors comprises means for performing a linear predictive coding analysis and the spectral warp means comprises an all-pass filter for causing a near log spectral transformation.
  - 21. A speech recognition system as claimed in claim 19 wherein the means for comparing the speech vectors comprises a dynamic time warp program means including the spectral warp means.
  - 22. A speech recognition system as claimed in claim 21 further comprising prewarp means for causing a nonlinear spectral shift of the frequency characteristics of each vector prior to the dynamic program.

23. A system for generating coefficients of an inverse filter corresponding to the slope of the frequency characteristics of a linear predictive coding (LPC) vector comprising:
- means for performing an LPC analysis to generate linear prediction coefficients of an LPC inverse filter; and
  
  filter estimate means for generating the coefficients of the inverse filter corresponding to the slope by concatenating, with bandwidth expansion, the LPC inverse filter with itself.
- View Dependent Claims (24, 25, 26, 27)
- - 24. A system as claimed in claim 23 wherein the filter estimate means includes means for concantenating the LPC inverse filter with itself and for then subjecting the resultant filter to bandwidth expansion.
  - 25. A system as claimed in claim 23 wherein the filter estimate means subjects the LPC inverse filter to bandwidth expansion and then concantenates the resultant filter with itself.
  - 26. A system as claimed in claim 23 wherein the inverse filter corresponding to the slope is defined by the function:
    - ##EQU14## where ##EQU15## r is a bandwidth broadening factor and a_n and a_i-n are the LPC coefficients from the LPC analysis.
  - 27. A system as claimed in claim 26 wherein r=exp (π
    - DT) and D and T are respectively the bandwidth of the pole and the sampling interval.

28. A speech recognition system for recognizing units of speech input comprising:
- first linear predictive coding (LPC) analysis means for generating first speech vectors characteristic of units of speech input;
  
  means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;
  
  an inverse filter based on the first speech vectors for filtering the speech samples;
  
  second linear predictive coding analysis means, coupled to receive filtered speech samples from the inverse filter, for generating second speech vectors characteristic of units of speech input; and
  
  means for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors, the means for comparing comprising a dynamic time warp program which causes a nonlinear time shift of the second speech vectors relative to the second reference vectors to provide a closer correspondence between the speech and reference vectors, the dynamic time warp program including a spectral warp routine for causing a normalizing nonlinear spectral shift of the frequency characteristics of each vector of the second speech vectors relative to the frequency characteristics of the second reference vectors in a closed form transformation to generate a spectrally warped vector which provides a closer correspondence between the speech and reference vectors.
- View Dependent Claims (29, 30, 31, 32, 33, 34)
- - 29. A speech recognition system as claimed in claim 28 wherein each of the first speech vectors defines the magnitude slope of the frequency characteristics of the frame of speech samples.
  - 30. A speech recognition system as claimed in claim 29 wherein the means for generating first speech vectors comprises means for concatenating, with bandwidth expansion, the inverse filter of the first LPC analysis.
  - 31. A speech recognition system as claimed in claim 28 wherein the means for generating the second speech vectors comprises means for providing bandwidth reduction of the inverse filter of the second LPC analysis.
  - 32. A speech recognition system as claimed in claim 28 further comprising a prewarp filter wherein the LPC coefficients generated by the second LPC analysis undergo a nonlinear spectral transformation prior to the dynamic time warp program.
  - 33. A speech recognition system as claimed in claim 28 wherein the spectral warp routine includes an all-pass filter.
  - 34. A speech recognition system as claimed in claim 28 wherein the second reference vectors are token representations of clusters of speech representations.

35. A method of recognizing units of speech input comprising:
- generating first speech vectors characteristic of units of speech input;
  
  comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units to select a limited subset of reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;
  
  generating from the speech input second speech vectors characteristic of units of speech input normalized with respect to the first speech vectors; and
  
  comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units to select the speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors.
- View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
- - 36. A method as claimed in claim 35 further comprising filtering the speech samples in an inverse filter based on the first speech vectors and generating the second speech samples from the inverse filtered speech.
  - 37. A method as claimed in claim 36 wherein the first and second speech vectors are generated by means of respective linear predictive coding analyses.
  - 38. A method as claimed in claim 37 wherein the first speech vectors define the magnitude slopes of the frequency characteristics of frames of speech samples.
  - 39. A method as claimed in claim 38 wherein the first speech vectors are generated by performing an LPC analysis to generate an LPC inverse filter and concatenating, with bandwidth expansion, the inverse filter with itself.
  - 40. A method as claimed in claim 37 further comprising providing bandwidth reduction of the LPC filter of the second LPC analyis.
  - 41. A method as claimed in claim 37 wherein the second speech vectors are compared in a dynamic time warp program including a spectral warp routine for causing a normalizing nonlinear spectral shift of the frequency characteristics of each vector of the second speech vectors relative to the frequency characteristics of the corresponding vector of the second reference vectors in a closed form transformation to generate a spectrally warped vector which provides a closer correspondence between the speech and reference vectors.
  - 42. A method as claimed in claim 41 further comprising causing a nonlinear spectral shift of the frequency characteristics of each vector prior to the dynamic time warp program.

43. A method of recognizing speech comprising:
- generating speech vectors characteristic of speech samples; and
  
  comparing the speech vectors with reference vectors corresponding to a set of reference speech units to select a speech unit for which the reference vectors have the closest correspondence with the speech vectors, the comparison including the step of causing a nonlinear spectral shift of the frequency characteristics of each vector of the speech vectors relative to the frequency characteristics of the corresponding vector of the reference vectors in a closed form transformation to generate a spectrally warped vector which provides a closer correspondence between the speech and reference vectors, a single predetermined transformation function being selected for an entire spectrum of a frame of speech samples.
- View Dependent Claims (44, 45, 46)
- - 44. A method as claimed in claim 43 wherein the speech vectors are generated in a linear predictive coding analysis and the nonlinear spectral shift is by means of an all-pass filter.
  - 45. A method as claimed in claim 43 wherein the speech vectors are compared in a dynamic time warp program including a routine for causing the nonlinear spectral shift.
  - 46. A method as claimed in claim 45 further comprising causing a nonlinear spectral shift of the frequency characteristics of each speech vector prior to the dynamic time warp program.

47. A method of generating coefficients of an inverse filter corresponding to the slope of the frequency characteristics of an LPC vector comprising performing an LPC analysis to generate linear prediction coefficients of an LPC inverse filter and concatenating, with bandwidth expansion, the LPC inverse filter with itself.
- View Dependent Claims (48, 49)
- - 48. A method as claimed in claim 47 wherein the generated inverse filter is defined by the function:
    - ##EQU16## where ##EQU17## r is a bandwidth broadening factor and a_n and a_i-n are the LPC coefficients from the LPC analysis.
  - 49. A method as claimed in claim 48 wherein r=exp (λ
    - DT) and D and T are respectively the bandwidth of the pole and the sampling interval.

50. A speech recognition system for recognizing units of speech input comprising:
- means for generating first speech vectors characteristic of units of a speech input, each speech vector defining the magnitude slope of the frequency characteristics of a frame of speech samples;
  
  means for comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units and for selecting a limited subset of the reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;
  
  means for generating second speech vectors characteristic of units of speech input; and
  
  means, responsive to the means for comparing the first speech vectors and to the means for generating the second speech vectors, for comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units and for selecting a speech unit of the subset for which the second reference vectors has the closest correspondence with the second speech vectors.

51. A method of recognizing units of speech input comprising:
- generating first speech vectors characteristic of units of speech input, each speech vector defining the magnitude slope of the frequency characteristics of a frame of speech samples;
  
  comparing the first speech vectors with first reference vectors corresponding to a set of reference speech units to select a limited subset of reference speech units for which the first reference vectors have the closest correspondence with the first speech vectors;
  
  generating from the speech input second speech vectors characteristic of units of speech input; and
  
  comparing the second speech vectors to second reference vectors corresponding to the selected subset of speech units to select the speech unit of the subset for which the second reference vectors have the closest correspondence with the second speech vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
GTE Laboratories Incorporated (Lumen Technologies, Inc.)
Inventors
Chuang, Chiu-Kuang
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/349,877
Time in Patent Office

427 Days
Field of Search

364/513.5, 364/724.15, 364/724.17, 364/724.19, 381/36-50
US Class Current

704/252
CPC Class Codes

G10L 15/065 Adaptation

G10L 15/12 using dynamic programming t...

Speech recognition using preclassification and spectral normalization

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

168 Citations

51 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using preclassification and spectral normalization

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

168 Citations

51 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links