Adaptive ambient sound suppression and speech tracking

US 8,219,394 B2
Filed: 01/20/2010
Issued: 07/10/2012
Est. Priority Date: 01/20/2010
Status: Active Grant

First Claim

Patent Images

1. A computing device configured to receive speech inputs, the computing device comprising:

a microphone array having a plurality of microphones;

a processor in operative communication with the microphone array;

an analog-to-digital converter in operative communication with the microphone array and with the processor; and

memory comprising instructions stored therein that are executable by the processor to;

receive a plurality of digital sound signals from the analog-to-digital converter, each digital sound signal being based on an analog sound signal originating at the microphone array,receive a multi-channel speaker signal from a speaker signal source,for each digital sound signal, generate a monophonic approximation signal of the multi-channel speaker signal that approximates speaker sounds as received by the corresponding microphone,apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal based at least in part on the monophonic approximation signal,generate a combined directionally-adaptive sound signal from a combination of each digital sound signal based at least in part on a combination of time-invariant and adaptive beamforming techniques, andapply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based at least in part on a directional characteristic of the combined directionally-adaptive sound signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal.

Citations

20 Claims

1. A computing device configured to receive speech inputs, the computing device comprising:
- a microphone array having a plurality of microphones;
  
  a processor in operative communication with the microphone array;
  
  an analog-to-digital converter in operative communication with the microphone array and with the processor; and
  
  memory comprising instructions stored therein that are executable by the processor to;
  
  receive a plurality of digital sound signals from the analog-to-digital converter, each digital sound signal being based on an analog sound signal originating at the microphone array,receive a multi-channel speaker signal from a speaker signal source,for each digital sound signal, generate a monophonic approximation signal of the multi-channel speaker signal that approximates speaker sounds as received by the corresponding microphone,apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal based at least in part on the monophonic approximation signal,generate a combined directionally-adaptive sound signal from a combination of each digital sound signal based at least in part on a combination of time-invariant and adaptive beamforming techniques, andapply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based at least in part on a directional characteristic of the combined directionally-adaptive sound signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The device of claim 1, wherein the instructions are further executable by the processor to apply a linear stationary tone remover to each digital sound signal before generating the combined directionally-adaptive sound signal.
  - 3. The device of claim 1, wherein the suppression of the second ambient sound portion occurs by applying one or more ofa nonlinear acoustic echo suppressor for suppressing a sound magnitude artifact, wherein the nonlinear acoustic echo suppressor is applied by determining and applying an acoustic echo gain based at least in part on a direction of a speech source,a nonlinear spatial filter for suppressing a sound phase artifact, wherein the nonlinear spatial filter is applied by determining and applying a spatial filter gain based at least in part on a direction of the speech source,a nonlinear stationary noise suppressor, wherein the stationary noise suppressor is applied by determining and applying a suppression filter gain based at least in part on a statistical model of a remaining noise component, and/oran automatic gain controller for adjusting a volume gain of the combined directionally-adaptive sound signal, wherein the automatic gain controller is applied by determining and applying the volume gain based at least in part on a direction of the speech source.
  - 4. The device of claim 1, wherein the suppression of the second ambient sound portion occurs by applying a nonlinear joint noise suppressor including a joint gain filter, the joint gain filter being calculated from a plurality of individual gain filters.
  - 5. The device of claim 1, wherein the instructions are further executable by the processor to:
    - determine a calibration signal for each microphone by emitting a calibration audio signal from each of a plurality of speakers and detecting the calibration audio signal at each microphone, and todetermine the monophonic approximation signal based at least in part on the calibration signal for each microphone.
  - 6. The device of claim 1, wherein the analog-to-digital converter is configured to convert an analog sound signal generated by each microphone to a corresponding digital sound signal at the analog-to-digital converter, wherein each digital sound signal from each microphone has a first, higher bit depth, andwherein the instructions are further executable by the processor to convert each digital sound signal to a digital sound signal having a second, lower bit depth after applying the linear acoustical echo canceller to each digital sound signal.
  - 7. The device of claim 1, wherein the analog-to-digital converter is configured to synchronize the multi-channel speaker signal to each digital sound signal via a clock signal received from a remote computing device.
  - 8. The device of claim 1, wherein the microphones are unevenly spaced from one another in the microphone array.
  - 9. The device of claim 1, wherein the combination of time-invariant and adaptive beamforming techniques for generating the combined directionally-adaptive sound signal includes instructions executable by the processor to:
    - apply a series of predetermined weighting coefficients to each digital sound signal, each predetermined weighting coefficient being calculated based at least in part on an isotropic ambient noise distribution within a predefined sound reception zone of the microphone array, and toapply a sound source localizer to determine a reception angle of a speech source with respect to the microphone array and to track the speech source based at least in part on the reception angle as the speech source moves in real time.

10. A method for suppressing ambient sounds from speech received by a microphone array, comprising, at memory including instructions stored therein that are executable by a processor:
- receiving a plurality of digital sound signals from an analog-to-digital converter, each digital sound signal based on an analog sound signal originating at the microphone array;
  
  receiving a multi-channel speaker signal from a speaker signal source;
  
  generating a monophonic approximation signal of the multi-channel speaker signal for each digital sound signal that approximates speaker sounds as received by the corresponding microphone;
  
  applying a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal based at least in part on the monophonic approximation signal;
  
  generating a combined directionally-adaptive sound signal from a combination of each digital sound signal based at least in part on a combination of time-invariant and adaptive beamforming techniques for tracking a speech source;
  
  applying one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based at least in part on a directional characteristic of the combined directionally-adaptive sound signal; and
  
  outputting a resulting sound signal.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method of claim 10, wherein generating a monophonic approximation signal of the multi-channel speaker signal for each digital sound signal that approximates speaker sounds as received by the corresponding microphone further comprises:
    - determining a calibration signal for each microphone by emitting a calibration audio signal from each of a plurality of speakers;
      
      detecting the calibration audio signal at each microphone; and
      
      generating the monophonic approximation signal based at least in part on the calibration signal for each microphone.
  - 12. The method of claim 10, further comprising applying a linear stationary tone remover to each digital sound signal before generating the combined directionally-adaptive sound signal.
  - 13. The method of claim 10, wherein applying one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based in part on a directional characteristic of the combined directionally-adaptive sound signal further comprises applying one or more ofa nonlinear acoustic echo suppressor for suppressing a sound magnitude artifact, wherein the nonlinear acoustic echo suppressor is applied by determining and applying an acoustic echo gain based on a direction of the speech source,a nonlinear spatial filter for suppressing a sound phase artifact, wherein the nonlinear spatial filter is applied by determining and applying a spatial filter gain based on a time characteristic of the speech source,a nonlinear stationary noise suppressor, wherein the stationary noise suppressor is applied by determining and applying a suppression filter gain based at least in part on a statistical model of a remaining noise component, and/oran automatic gain controller for adjusting a volume gain of the combined directionally-adaptive sound signal, wherein the automatic gain controller is applied by determining and applying the volume gain based at least in part on a relative volume of the speech source.
  - 14. The method of claim 10, wherein applying one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based at least in part on a magnitude and/or a time characteristic of the combined directionally-adaptive sound signal further comprises applying a nonlinear joint noise suppressor including a joint gain filter, the joint gain filter being calculated from a plurality of individual gain filters.
  - 15. The method of claim 10, further comprising:
    - converting an analog sound signal generated by each microphone to a corresponding digital sound signal at the analog-to-digital converter, wherein each digital sound signal from each microphone has a first, higher bit depth; and
      
      converting each digital sound signal to a digital sound signal having a second, lower bit depth after applying the linear acoustical echo canceller to each digital sound signal.
  - 16. The method of claim 10, further comprising synchronizing the multi-channel speaker signal to each digital sound signal via a clock signal received from a remote computing device.
  - 17. The method of claim 10, wherein generating a combined directionally-adaptive sound signal from a combination of each digital sound signal based at least in part on a combination of time-invariant and adaptive beamforming techniques for tracking the speech source further comprises:
    - applying a series of predetermined weighting coefficients to each digital sound signal, each predetermined weighting coefficient being calculated based at least in part on an isotropic ambient noise distribution within a predefined sound reception zone of the microphone array, andapplying a sound source localizer to determine a reception angle of the speech source with respect to the microphone array and to track the speech source based at least in part on the reception angle as the speech source moves in real time.

18. A method for suppressing ambient sounds from speech received by a microphone array, at memory including instructions stored therein that are executable by a processor:
- receiving an analog sound signal generated at each microphone of a microphone array comprising a plurality of microphones, each analog sound signal being separately received at least in part from a speech source;
  
  converting each analog sound signal to a corresponding first digital sound signal having a first, higher bit depth at an analog-to-digital converter;
  
  receiving a multi-channel speaker signal for a plurality of speakers from a speaker signal source;
  
  synchronizing the multi-channel speaker signal to each first digital sound signal via a clock signal received from a remote computing device;
  
  determining a calibration signal for each microphone by emitting a calibration audio signal from each of the plurality of speakers;
  
  detecting the calibration audio signal at each microphone of the microphone array;
  
  generating a monophonic approximation signal of the multi-channel speaker signal for each first digital sound signal that approximates speaker sounds as received by the corresponding microphone based at least in part on the calibration signal for each microphone;
  
  applying a linear acoustic echo canceller to suppress a first ambient sound portion of each first digital sound signal based at least in part on the monophonic approximation signal;
  
  converting each first digital sound signal to a second digital sound signal having a second, lower bit depth after applying the linear acoustic echo canceller to each digital sound signal;
  
  applying a linear stationary tone remover to each second digital sound signal;
  
  generating a combined directionally-adaptive sound signal from a combination of each second digital sound signal byapplying a series of predetermined weighting coefficients to each second digital sound signal, each predetermined weighting coefficient being calculated based at least in part on an isotropic ambient noise distribution within a predefined sound reception zone of the microphone array, and byapplying a sound source localizer to determine a reception angle of the speech source with respect to the microphone array and to track the speech source based at least in part on the reception angle as the speech source moves in real time;
  
  applying one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based at least in part on a directional characteristic of the combined directionally-adaptive sound signal; and
  
  outputting a resulting sound signal.
- View Dependent Claims (19, 20)
- - 19. The method of claim 18, wherein applying one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal based in part on a magnitude and/or a time characteristic of the combined directionally-adaptive sound signal further comprises suppressing the second ambient sound portion of each digital sound signal by applying one or more of:
    - a nonlinear acoustic echo suppressor for suppressing a sound magnitude artifact, wherein the nonlinear acoustic echo suppressor is applied by determining and applying an acoustic echo gain based on a direction of the speech source,a nonlinear spatial filter for suppressing a sound phase artifact, wherein the nonlinear spatial filter is applied by determining and applying a spatial filter gain based at least in part on a direction of the speech source,a nonlinear stationary noise suppressor wherein the stationary noise suppressor is applied by determining and applying a suppression filter gain based at least in part on a statistical model of a remaining noise component, and/ora automatic gain controller for adjusting a volume gain of the combined directionally-adaptive sound signal, wherein the automatic gain controller is applied by determining and applying the volume gain based at least in part on a direction of the speech source.
  - 20. The method of claim 18, wherein applying one or more nonlinear noise suppression techniques to suppress a second audio sound portion of the combined directionally-adaptive sound signal based at least in part on a magnitude and/or a time characteristic of the combined directionally-adaptive sound signal further comprises applying a nonlinear joint noise suppressor including a joint gain filter, the joint gain filter being calculated from a plurality of individual gain filters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Flaks, Jason, Tashev, Ivan, McKay, Duncan, Ni, Xudong, Heitkamp, Robert, Guo, Wei, Tardif, John, Shing, Leo, Baseflug, Michael
Primary Examiner(s)
RIDER, JUSTIN W

Application Number

US12/690,827
Publication Number

US 20110178798A1
Time in Patent Office

902 Days
Field of Search

704/200, 704/227, 381/302, 379/406.03
US Class Current

704/227
CPC Class Codes

G10L 2021/02085   Periodic noise

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

G10L 21/0272   Voice signal separating

H04S 3/008   in which the audio signals ...

Adaptive ambient sound suppression and speech tracking

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive ambient sound suppression and speech tracking

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links