Noise suppression assisted automatic speech recognition
First Claim
1. A method for processing an audio signal, comprising:
- generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal;
determining two or more features for the sub-band signals, the two or more features including a speech energy level for the sub-band noise level and at least one of the following;
inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;
suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising;
applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising;
determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal;
accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; and
applying the accessed gain to the sub-band frequency; and
providing the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a voice activity detection signal.
4 Assignments
0 Petitions
Accused Products
Abstract
Noise suppression information is used to optimize or improve automatic speech recognition performed for a signal. Noise suppression can be performed on a noisy speech signal using a gain value. The gain to apply to the noisy speech signal is selected to optimize speech recognition analysis of the resulting signal. The gain may be selected based on one or more features for a current sub band and time frame, as well as one or more features for other sub bands and/or time frames. Noise suppression information can be provided to a speech recognition module to improve the robustness of the speech recognition analysis. Noise suppression information can also be used to encode and identify speech.
446 Citations
14 Claims
-
1. A method for processing an audio signal, comprising:
-
generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal; determining two or more features for the sub-band signals, the two or more features including a speech energy level for the sub-band noise level and at least one of the following;
inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising; applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising; determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal; accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; and applying the accessed gain to the sub-band frequency; and providing the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a voice activity detection signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising:
-
generating sub-band signals from a received primary acoustic signal and a received secondary acoustic signal; determining two or more features for a sub-band signal, the two or more features including a speech energy level for the sub-band noise level and at least one of the following;
inter-microphone level differences, inter-microphone time differences, and inter-microphone phase differences between the primary acoustic signal and the secondary acoustic signal;suppressing a noise component in the primary acoustic signal based on the two or more features, the suppressing configured to clean the primary acoustic signal to create a cleaned speech signal optimized for accurate speech recognition processing by an automatic speech recognition processing module, the suppressing comprising; applying a gain to a sub-band of the primary acoustic signal to provide a noise suppressed signal, the applying comprising; determining a speech to noise ratio (SNR) for the sub-band of the primary acoustic signal; accessing the gain, based on the frequency of the sub-band and the determined SNR for the sub-band, from a datastore, the datastore including a plurality of pre-stored gains configured to create cleaned speech signals optimized for accurate speech recognition processing by the automatic speech recognition processing module, each pre-stored gain in the plurality of pre-stored gains associated with a corresponding frequency and an SNR value; and applying the accessed gain to the sub-band frequency; and providing the cleaned speech signal and corresponding noise suppression information to the automatic speech recognition processing module, the noise suppression information based on the two or more features and including a speech to noise ratio for each of the sub-band signals and a voice activity detection signal. - View Dependent Claims (10, 11, 12, 13, 14)
-
Specification