On-line parametric histogram normalization for noise robust speech recognition

US 20030204398A1
Filed: 04/30/2002
Published: 10/30/2003
Est. Priority Date: 04/30/2002
Status: Active Grant

First Claim

Patent Images

1. A method of improving noise robustness in a speech recognition system, the system including a front-end for extracting speech features from an input speech and a back-end for speech recognition based on the extracted features, wherein the front-end comprises:

means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;

means, responsive to the data segments, for spectrally converting the data segments into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and

means, responsive to the spectral data, for performing decorrelation conversion on the spectral coefficients for providing the extracted features, characterized by obtaining a parametric representation of the probability distribution of values of the spectral coefficients;

modifying the parametric representation based on one or more reference values; and

adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the decorrelation conversion.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for improving noise robustness in speech recognition, wherein a front-end is used for extracting speech feature from an input speech and for providing a plurality of scaled spectral coefficients. The histogram of the scaled spectral coefficients is normalized to the histogram of a training set using Gaussian approximations. The normalized spectral coefficients are then converted into a set of cepstrum coefficients by a decorrelation module and further subjected to ceptral domain feature-vector normalization.

34 Citations

View as Search Results

28 Claims

1. A method of improving noise robustness in a speech recognition system, the system including a front-end for extracting speech features from an input speech and a back-end for speech recognition based on the extracted features, wherein the front-end comprises:
- means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
  
  means, responsive to the data segments, for spectrally converting the data segments into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
  
  means, responsive to the spectral data, for performing decorrelation conversion on the spectral coefficients for providing the extracted features, characterized by obtaining a parametric representation of the probability distribution of values of the spectral coefficients;
  
  modifying the parametric representation based on one or more reference values; and
  
  adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the decorrelation conversion.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein a plurality of spectral coefficients of a training speech are used for matching, said method further characterized in that said one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 3. The method of claim 2, further characterized in that said one or more reference values further include a standard deviation of the spectral coefficients of the training speech.
  - 4. The method of claim 1, further characterized in that the parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 5. The method of claim 1, further characterized in that the parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 6. The method of claim 1, further characterized in that the parametric representation is obtained based on a Gaussian approximation.
  - 7. The method of claim 3, wherein the spectral coefficients of the training speech have a further probability distribution of values, said method further characterized in that the mean value and the standard deviation are obtained based on a Gaussian approximation of the further probability distribution.

8. A speech recognition front-end for use in a speech recognition system having a back-end, the front end extracting speech features from an input speech so as to allow the back-end to recognize the input speech based on the extracted features, the front-end comprising:
- means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
  
  means for spectrally converting the data into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
  
  means for performing decorrelation conversion on the spectral coefficients for providing the extracted features to the back-end, characterized by means, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The front-end of claim 8, wherein a plurality of spectral coefficients of a training speech are used for matching, said system further characterized in that said one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 10. The front-end of claim 9, further characterized in that said one or more reference values further include a standard deviation of the spectral coefficients of the training speech.
  - 11. The front-end of claim 8, further characterized in that the parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 12. The front-end of claim 8, further characterized in that the parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 13. The front-end of claim 8, further characterized in that the parametric representation is obtained based on a Gaussian approximation.
  - 14. The front-end of claim 10, the spectral coefficients of the training speech have a further probability distribution of values, said front-end further characterized in that the mean value and the standard deviation are obtained based on a Gaussian approximation of the further probability distribution.

15. A network element in a communication system including a back-end for receiving speech data from the network element, the network element comprising:
- a voice input device to receive input speech; and
  
  a front-end, responsive to the input speech, for extracting speech features from the input speech for providing speech data indicative of the speech features so as to allow the back-end to recognize the input speech based on the speech features, wherein the front-end comprises;
  
  means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
  
  means for spectrally converting the data into a plurality of spectral coefficients for providing spectral data indicative of the spectral coefficients having a related probability distribution of values; and
  
  means for performing decorrelation conversion on the spectral coefficients for providing the extracted features, said network element characterized in that the front-end further comprises means, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The network element of claim 15, wherein a plurality of spectral coefficients of a training speech are used for matching, said network element further characterized in that said one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 17. The network element of claim 16, further characterized in that said one or more reference values further include a standard deviation of the spectral coefficients of the training speech.
  - 18. The network element of claim 15, further characterized in that the parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 19. The network element of claim 15, further characterized in that the parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 20. The network element of claim 15, further characterized in that the parametric representation is obtained based on a Gaussian approximation.
  - 21. The network element of claim 16, wherein the spectral coefficients of the training speech have a further probability distribution of value, said method further characterized in that the mean value and the standard deviation are obtained based on a Gaussian approximation of the further probability distribution.

22. A computer program for use in a speech recognition front-end for extracting speech features from an input speech so as to allow a speech recognition back-end to recognize the input speech based on the extracted features, wherein the front-end comprises:
- means, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
  
  means for spectrally converting the data into a plurality of spectral coefficients having a related probability distribution of values for providing spectral data indicative of the spectral coefficients; and
  
  means for performing decorrelation conversion on the spectral coefficients for providing the extracted feature, said computer program characterized by an algorithm for generating a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion.
- View Dependent Claims (23, 24, 25, 26, 27, 28)
- - 23. The computer program of claim 22, wherein a plurality of spectral coefficients of a training speech are used for matching, said computer program further characterized in that said one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 24. The computer of program of claim 23, further characterized in that said one or more reference values include a standard deviation of the spectral coefficients of the training speech.
  - 25. The computer program of claim 22, further characterized in that the parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 26. The computer program of claim 22, further characterized in that the parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 27. The computer program of claim 22, further characterized in that the parametric representation is obtained based on a Gaussian approximation.
  - 28. The computer program of claim 24, wherein the coefficients of the training speech has a further probability distribution of values, said computer program further characterized in that the mean value and the standard deviation are obtained from a Gaussian approximation of the further probability distribution.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Corporation
Original Assignee
Nokia Corporation
Inventors
Kiss, Imre, Haverinen, Hemmo

Granted Patent

US 7,197,456 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/20   Speech recognition techniqu...

G10L 25/18   the extracted parameters be...

On-line parametric histogram normalization for noise robust speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

On-line parametric histogram normalization for noise robust speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links