Speech recognition faciliation method and apparatus

US 20040034526A1
Filed: 08/14/2002
Published: 02/19/2004
Est. Priority Date: 08/14/2002
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

providing information having varying amplitude in a spectral domain to be speech-recognized;

adding masking information to the information as a function, at least in part, of the amplitude of the information to provide modified information

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech recognition platform, a masking unit 17 can be utilized to mask noisy content within an audio sample. By masking such noise in a dynamic but predictable manner, valid content can be preserved while largely overcoming the random and detrimental presence of noise. In one embodiment, speech recognition features are extracted pursuant to a hierarchical process that localizes, at least to some extent, some of the resultant features from other resultant features. As a result, noisy or otherwise unreliable information corresponding to the audio sample will not be leveraged unduly across the entire feature set. In another embodiment, an average energy value for processed samples is calculated with individual energy values that are downwardly weighted when such individual energy values are likely representative of noise.

10 Citations

View as Search Results

26 Claims

1. A method comprising:
- providing information having varying amplitude in a spectral domain to be speech-recognized;
  
  adding masking information to the information as a function, at least in part, of the amplitude of the information to provide modified information
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein providing information includes providing digitized information that corresponds to an original analog audio input signal.
  - 3. The method of claim 1 wherein providing information having varying amplitude in a spectral domain includes providing information that includes harmonics, which harmonics have corresponding amplitudes that vary in the spectral domain.
  - 4. The method of claim 1 wherein adding masking information includes using a non-linear filter such that at least some valleys in the information amplitude are at least partially masked as a function of the information amplitude on either side of such valleys.
  - 5. The method of claim 1 wherein providing the information includes using a fast Fourier transform.
  - 6. The method of claim 1 and further comprising extracting at least some speech recognition features from the modified information.
  - 7. The method of claim 6 wherein extracting at least some speech recognition features from the modified information comprises processing the modified information to obtain cepstral coefficients corresponding to the modified information.
  - 8. The method of claim 6 wherein extracting at least some speech recognition features from the modified information comprises processing the modified information to obtain localized speech recognition coefficients wherein at least some of the localized speech recognition coefficients are determined independent of other of the speech recognition coefficients.
  - 9. The method of claim 8 wherein processing the modified information to obtain localized speech recognition coefficients wherein at least some of the localized speech recognition coefficients are determined independent of other of the localized speech recognition coefficients includes processing the modified information to obtain localized speech recognition coefficients wherein substantially half of the localized speech recognition coefficients are determined independent of substantially half of the localized speech recognition coefficients.
  - 10. The method of claim 1 and further comprising conditioning an average value representing energy of the information by downwardly scaling energy amplitude levels in a temporal domain when such energy amplitude levels fall below a predetermined threshold.
  - 11. The method of claim 10 wherein the predetermined threshold represents, at least in part, an average noise level plus an amount that corresponds to peak noise variances.
  - 12. The method of claim 11 wherein conditioning an average value representing energy of the information by downwardly scaling energy amplitude levels in a temporal domain when such energy amplitude levels fall below a predetermined threshold includes estimating average energy m of a first N frames of a non-speech signal input and an upper bound M thereof, and for at least some speech frames that follow, scaling the energy amplitude level E by a factor of β
    - wherein;

13. A device comprising:
- an information signal input;
  
  a spectral transformation unit having an input operably coupled to the information signal input and having an output providing a spectrally transformed information signal; and
  
  a masking unit having an input operably coupled to the output of the spectral transformation unit and having an output providing a modified spectrally transformed information signal wherein at least some amplitude valleys are at least partially masked.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The device of claim 13 wherein the spectral transformation unit comprises a fast Fourier transform unit.
  - 15. The device of claim 13 and further comprising a speech recognition feature extraction unit having an input operably coupled to the output of the masking unit and having an output providing speech recognition features that correspond to an information signal as input at the information signal input.
  - 16. The device of claim 15 wherein the speech recognition feature extraction unit comprises a cepstral coefficient extraction unit.
  - 17. The device of claim 15 wherein the speech recognition feature extraction unit comprises a localized speech feature extraction unit having an output providing channel energy ratios that correspond to a hierarchical split of a speech spectrum that corresponds to the output of the masking unit.
  - 18. The device of claim 13 and further comprising an energy measurement unit having an input operably coupled to the information signal input and an output providing a value that corresponds to information signal energy over a period of time.
  - 19. The device of claim 18 wherein the output provides a value that corresponds to information signal energy as modified to reduce portions of the information signal energy that are less than a predetermined threshold.
  - 20. The device of claim 19 wherein the predetermined threshold corresponds to an average value of noise.
  - 21. The device of claim 20 wherein the average value of noise is modified as a function of noise peaks.
  - 22. The device of claim 13 wherein the masking unit includes masking means for masking portions of the spectrally transformed information signal that correspond to substantially deep valleys as compared to adjacent peaks.
  - 23. The device of claim 22 wherein the masking means further masks portions of the spectrally transformed information signal that correspond to substantially deep valleys as a function of at least the adjacent peaks.

24. A device comprising:
- an information signal input;
  
  a localized speech recognition feature extraction unit having an input operably coupled to the information signal input and an output providing localized speech recognition features.
- View Dependent Claims (25, 26)
- - 25. The device of claim 24 wherein the localized speech recognition feature extraction unit includes localized feature extraction means for determining at least some of the speech recognition features independently from at least others of the speech recognition features.
  - 26. The device of claim 25 wherein the speech recognition features comprise channel energy ratios for varying combinations of channels.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Ma, Changxue

Granted Patent

US 7,013,272 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/200.1
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

Speech recognition faciliation method and apparatus

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition faciliation method and apparatus

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links