Noise suppression assisted automatic speech recognition

US 9,558,755 B1
Filed: 12/07/2010
Issued: 01/31/2017
Est. Priority Date: 05/20/2010
Status: Active Grant

First Claim

Patent Images

1. A method for processing an audio signal, comprising:

generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;

determining two or more features for the sub-band signals, the two or more features including a speech energy level for the sub-band noise level and at least one of the following;

inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;

suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising;

applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising;

determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;

accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; and

applying the accessed gain to the sub-band frequency; and

providing the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a voice activity detection signal.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Noise suppression information is used to optimize or improve automatic speech recognition performed for a signal. Noise suppression can be performed on a noisy speech signal using a gain value. The gain to apply to the noisy speech signal is selected to optimize speech recognition analysis of the resulting signal. The gain may be selected based on one or more features for a current sub band and time frame, as well as one or more features for other sub bands and/or time frames. Noise suppression information can be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information can also be used to encode and identify speech.

446 Citations

14 Claims

1. A method for processing an audio signal, comprising:
- generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;
  
  determining two or more features for the sub-band signals, the two or more features including a speech energy level for the sub-band noise level and at least one of the following;
  
  inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;
  
  suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising;
  
  applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising;
  
  determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;
  
  accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; and
  
  applying the accessed gain to the sub-band frequency; and
  
  providing the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a voice activity detection signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising determining whether the primary acoustic signal includes speech, the determination performed based on the two or more features.
  - 3. The method of claim 2, further comprising providing a noise signal to the automatic speech recognition processing module in response to detecting that the primary acoustic signal does not include speech.
  - 4. The method of claim 2, wherein the voice activity detection signal is generated based on the determination of whether the primary acoustic signal includes speech, and the voice activity detection signal indicating whether automatic speech recognition is to occur.
  - 5. The method of claim 4, wherein the voice activity detection signal is a value within a range of values corresponding to the level of speech detected in the primary acoustic signal.
  - 6. The method of claim 2, wherein the noise suppression information includes a speech to noise ratio for the current time frame and the sub-band to the automatic speech recognition processing module.
  - 7. The method of claim 1, wherein the noise suppression information includes a speech to noise ratio, the method further comprising modulating a bit rate of a speech encoder or decoder based on the speech to noise ratio for a particular frame.
  - 8. The method of claim 1, wherein the noise suppression information includes a speech to noise ratio, the method further comprising setting a node search depth level by the automatic speech recognition processing module based on the speech to noise ratio for a current frame.

9. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising:
- generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;
  
  determining two or more features for a sub-band signal, the two or more features including a speech energy level for the sub-band noise level and at least one of the following;
  
  inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;
  
  suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising;
  
  applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising;
  
  determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;
  
  accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; and
  
  applying the accessed gain to the sub-band frequency; and
  
  providing the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a speech to noise ratio for each of the sub-band signals and a voice activity detection signal.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The non-transitory computer readable storage medium of claim 9, further comprising providing a noise signal to the automatic speech recognition processing module in response to detecting that the primary acoustic signal does not include speech.
  - 11. The non-transitory computer readable storage medium of claim 9, wherein the voice activity detection signal is generated based on the determination of whether the primary acoustic signal includes speech, and the voice activity detection signal indicating whether automatic speech recognition is to occur.
  - 12. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio for the current time frame and the sub-band to the automatic speech recognition processing module.
  - 13. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio, the method further comprising modulating a bit rate of a speech encoder or decoder based on the speech to noise ratio for a particular frame.
  - 14. The non-transitory computer readable storage medium of claim 9, wherein the noise suppression information includes a speech to noise ratio, the method further comprising setting a node search depth level by the automatic speech recognition processing module based on the speech to noise ratio for a current frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd. (Samsung Group)
Original Assignee
Knowles Electronics Llc (Knowles Corporation)
Inventors
Laroche, Jean, Murgia, Carlo
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Kovacek, David

Application Number

US12/962,519
Time in Patent Office

2,247 Days
Field of Search

704200-2001, 704227-229, 704500-504, 704E19001-E19049, 704E21001-E2102, 381 711- 7114, 381 941- 949
US Class Current

1/1
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 2021/02165   Two microphones, one receiv...

G10L 21/00   Speech or voice signal proc...

G10L 21/02   Speech enhancement, e.g. no...

G10L 21/0232   Processing in the frequency...

G10L 25/78   Detection of presence or ab...

Noise suppression assisted automatic speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

446 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Noise suppression assisted automatic speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

446 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links