On-line parametric histogram normalization for noise robust speech recognition

US 7,197,456 B2
Filed: 04/30/2002
Issued: 03/27/2007
Est. Priority Date: 04/30/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

providing in a speech recognition system speech data indicative of an input speech at a plurality of time instants based on the input speech, the speech data comprising a plurality of data segments;

spectrally converting the data segments into a plurality of spectral coefficients having a probability distribution of values in spectral domain for providing spectral data indicative of the spectral coefficients based on the data segments;

obtaining a parametric representation of the probability distribution of values of the spectral coefficients based on the spectral data;

modifying the parametric representation based on one or more reference values for providing a modified parametric representation;

adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data; and

performing decorrelation conversion on the changed spectral data for providing extracted features of the input speech.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for improving noise robustness in speech recognition, wherein a front-end is used for extracting speech feature from an input speech and for providing a plurality of scaled spectral coefficients. The histogram of the scaled spectral coefficients is normalized to the histogram of a training set using Gaussian approximations. The normalized spectral coefficients are then converted into a set of cepstrum coefficients by a decorrelation module and further subjected to ceptral domain feature-vector normalization.

Citations

32 Claims

1. A method, comprising:
- providing in a speech recognition system speech data indicative of an input speech at a plurality of time instants based on the input speech, the speech data comprising a plurality of data segments;
  
  spectrally converting the data segments into a plurality of spectral coefficients having a probability distribution of values in spectral domain for providing spectral data indicative of the spectral coefficients based on the data segments;
  
  obtaining a parametric representation of the probability distribution of values of the spectral coefficients based on the spectral data;
  
  modifying the parametric representation based on one or more reference values for providing a modified parametric representation;
  
  adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data; and
  
  performing decorrelation conversion on the changed spectral data for providing extracted features of the input speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein a plurality of spectral coefficients of a training speech are used for matching, and whereinsaid one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 3. The method of claim 2, whereinsaid one or more reference values further include a standard deviation of the spectral coefficients of the training speech.
  - 4. The method of claim 1, whereinthe parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 5. The method of claim 1, whereinthe parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 6. The method of claim 1, whereinthe parametric representation is obtained based on a Gaussian approximation.
  - 7. The method of claim 3, wherein the spectral coefficients of the training speech have a further probability distribution of values, and whereinthe mean value and the standard deviation are obtained based on a Gaussian approximation of the further probability distribution.

8. A speech recognition front-end comprising:
- a processing module, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
  
  a transform module for spectrally converting the data into a plurality of spectral coefficients having a related probability distribution of values in a spectral domain for providing spectral data indicative of the spectral coefficients;
  
  a software program, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data; and
  
  a decorrelation module, responsive to the modified parametric representation, for providing extracted features based on the changed spectral data.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The front-end of claim 8, wherein a plurality of spectral coefficients of a training speech are used for matching, and wherein said one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 10. The front-end of claim 9, whereinsaid one or more reference values further include a standard deviation of the spectral coefficients of the training speech.
  - 11. The front-end of claim 8, whereinthe parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 12. The front-end of claim 8, whereinthe parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 13. The front-end of claim 8, whereinthe parametric representation is obtained based on a Gaussian approximation.
  - 14. The front-end of claim 10, wherein the spectral coefficients of the training speech have a further probability distribution of values, and wherein the mean value and the standard deviation are obtained based on a Gaussian approximation of the further probability distribution.

15. A network element in a communication system comprising:
- a voice input device to receive input speech; and
  
  a speech recognition front-end, responsive to the input speech, for extracting speech features from the input speech for providing speech data indicative of the speech features so as to allow the back-end to recognize the input speech based on the speech features, wherein the front-end comprises;
  
  a processing module, responsive to the input speech, for providing data indicative of the input speech at a plurality of time instants;
  
  a transform module for spectrally converting the data into a plurality of spectral coefficients for providing spectral data indicative of the spectral coefficients having a related probability distribution of values in spectral domain;
  
  a computation module for performing decorrelation conversion on the spectral coefficients for providing the extracted features, anda software program, responsive to the spectral coefficients, for obtaining a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The network element of claim 15, wherein a plurality of spectral coefficients of a training speech are used for matching, and whereinsaid one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 17. The network element of claim 16, whereinsaid one or more reference values further include a standard deviation of the spectral coefficients of the training speech.
  - 18. The network element of claim 15, whereinthe parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 19. The network element of claim 15, whereinthe parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 20. The network element of claim 15, whereinthe parametric representation is obtained based on a Gaussian approximation.
  - 21. The network element of claim 17, wherein the spectral coefficients of the training speech have a further probability distribution of value, wherein the mean value and the standard deviation are obtained based on a Gaussian approximation of the further probability distribution.

22. A software application product comprising a storage medium having a software application for use in a speech recognition front-end, the front end configured for extracting speech features from an input speech so as to allow a speech recognition back-end to recognize the input speech based on the extracted features, wherein the front-endis configured to provide data indicative of the input speech at a plurality of time instants;
- to spectrally convert the data into a plurality of spectral coefficients having a related probability distribution of values in spectral domain for providing spectral data indicative of the spectral coefficients; and
  
  to perform decorrelation conversion on the spectral coefficients for providing the extracted feature, said software application comprisingan algorithm for generating a parametric representation of the probability distribution of values of the spectral coefficients, for modifying the parametric representation based on one or more reference values, and for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data prior to the performing of the decorrelation conversion.
- View Dependent Claims (23, 24, 25, 26, 27, 28)
- - 23. The software application product of claim 22, wherein a plurality of spectral coefficients of a training speech are used for matching, and whereinsaid one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 24. The software application product of claim 23, whereinsaid one or more reference values include a standard deviation of the spectral coefficients of the training speech.
  - 25. The software application product of claim 22, whereinthe parametric representation comprises a mean value of the probability distribution of values of the spectral coefficients.
  - 26. The software application product of claim 22, whereinthe parametric representation comprises a standard deviation of the probability distribution of values of the spectral coefficients.
  - 27. The software application product of claim 22, whereinthe parametric representation is obtained based on a Gaussian approximation.
  - 28. The software application product of claim 24, wherein the coefficients of the training speech has a further probability distribution of values, and whereinthe mean value and the standard deviation are obtained from a Gaussian approximation of the further probability distribution.

29. An electronic module, comprising:
- means, responsive to an input speech in a speech recognition front-end, for providing data indicative of the input speech at a plurality of time instants, the speech data comprising a plurality of data segments;
  
  means for spectrally converting the data segments into a plurality of spectral coefficients having a probability distribution of values in a spectral domain for providing spectral data indicative of the spectral coefficients;
  
  means for performing decorrelation conversion on the spectral coefficients for providing extracted features based on the data segments;
  
  means for obtaining a parametric representation of the probability distribution of values of the spectral coefficients,means for modifying the parametric representation based on one or more reference values, andmeans, for adjusting at least one of the spectral coefficients in the spectral domain based on the modified parametric representation for changing the spectral data prior to the decorrelation conversion on the spectral coefficients.
- View Dependent Claims (30, 31, 32)
- - 30. The electronic module of claim 29, further comprising:
    - means, responsive to the modified parametric representation, for providing extracted features based on the changed spectral data.
  - 31. The electronic module of claim 29, wherein a plurality of spectral coefficients of a training speech are used for matching, and whereinsaid one or more reference values include a mean value of the spectral coefficients of the training speech.
  - 32. The electronic module of claim 31, whereinsaid one or more reference values further include a standard deviation of the spectral coefficients of the training speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Corporation
Original Assignee
Nokia Corporation
Inventors
Haverinen, Hemmo, Kiss, Imre
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Shortledge, Thomas E.

Application Number

US10/136,039
Publication Number

US 20030204398A1
Time in Patent Office

1,792 Days
Field of Search

704/233, 704/250, 704/205, 704/247, 704/234, 704/224, 704/226, 704/237, 704/206, 704/227
US Class Current

704/233
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/20   Speech recognition techniqu...

G10L 25/18   the extracted parameters be...

On-line parametric histogram normalization for noise robust speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

On-line parametric histogram normalization for noise robust speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links