Systems, methods, and apparatus for multi-microphone based speech enhancement

US 8,175,291 B2
Filed: 12/12/2008
Issued: 05/08/2012
Est. Priority Date: 12/19/2007
Status: Active Grant

First Claim

Patent Images

1. A method of processing an M-channel input signal that includes a speech component and a noise component, M being an integer greater than one, to produce a spatially filtered output signal, said method comprising:

applying a first spatial processing filter to the input signal;

applying a second spatial processing filter to the input signal;

at a first time, determining that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter;

in response to said determining at a first time, producing a signal that is based on a first spatially processed signal as the output signal;

at a second time subsequent to the first time, determining that the second spatial processing filter begins to separate the speech and noise components better than the first spatial processing filter; and

in response to said determining at a second time, producing a signal that is based on a second spatially processed signal as the output signal,wherein the first and second spatially processed signals are based on the input signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and apparatus for processing an M-channel input signal are described that include outputting a signal produced by a selected one among a plurality of spatial separation filters. Applications to separating an acoustic signal from a noisy environment are described, and configurations that may be implemented on a multi-microphone handheld device are also described.

133 Citations

View as Search Results

50 Claims

1. A method of processing an M-channel input signal that includes a speech component and a noise component, M being an integer greater than one, to produce a spatially filtered output signal, said method comprising:
- applying a first spatial processing filter to the input signal;
  
  applying a second spatial processing filter to the input signal;
  
  at a first time, determining that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter;
  
  in response to said determining at a first time, producing a signal that is based on a first spatially processed signal as the output signal;
  
  at a second time subsequent to the first time, determining that the second spatial processing filter begins to separate the speech and noise components better than the first spatial processing filter; and
  
  in response to said determining at a second time, producing a signal that is based on a second spatially processed signal as the output signal,wherein the first and second spatially processed signals are based on the input signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method according to claim 1, wherein a plurality of the coefficient values of at least one of the first and second spatial processing filters is based on a plurality of multichannel training signals that is recorded under a plurality of different acoustic scenarios.
  - 3. The method according to claim 1, wherein a plurality of the coefficient values of at least one of the first and second spatial processing filters is obtained from a converged filter state that is based on a plurality of multichannel training signals, wherein the plurality of multichannel training signals is recorded under a plurality of different acoustic scenarios.
  - 4. The method according to claim 1, wherein a plurality of the coefficient values of the first spatial processing filter is based on a plurality of multichannel training signals that is recorded under a first plurality of different acoustic scenarios, andwherein a plurality of the coefficient values of the second spatial processing filter is based on a plurality of multichannel training signals that is recorded under a second plurality of different acoustic scenarios that is different than the first plurality.
  - 5. The method according to claim 1, wherein said applying the first spatial processing filter to the input signal produces the first spatially processed signal, and wherein said applying the second spatial processing filter to the input signal produces the second spatially processed signal.
  - 6. The method according to claim 5, wherein said producing a signal that is based on a first spatially processed signal as the output signal comprises producing the first spatially processed signal as the output signal, andwherein said producing a signal that is based on a second spatially processed signal as the output signal comprises producing the second spatially processed signal as the output signal.
  - 7. The method according to claim 1, wherein the first spatial processing filter is characterized by a first matrix of coefficient values and the second spatial processing filter is characterized by a second matrix of coefficient values, andwherein the second matrix is at least substantially equal to the result of flipping the first matrix about a central vertical axis.
  - 8. The method according to claim 1, wherein said method comprises determining that the first spatial processing filter continues to separate the speech and noise components better than the second spatial processing filter over a first delay interval immediately following the first time, andwherein said producing a signal that is based on a first spatially processed signal as the output signal begins after the first delay interval.
  - 9. The method according to claim 8, wherein said method comprises determining that the second spatial processing filter continues to separate the speech and noise components better than the first spatial processing filter over a second delay interval immediately following the second time, andwherein said producing a signal that is based on a second spatially processed signal as the output signal occurs after the second delay interval, andwherein the second delay interval is longer than the first delay interval.
  - 10. The method according to claim 1, wherein said producing a signal that is based on a second spatially processed signal as the output signal includes transitioning the output signal, over a first merge interval, from the signal that is based on the first spatially processed signal to a signal that is based on the second spatially processed signal, andwherein said transitioning includes, during the first merge interval, producing a signal that is based on both of the first and second spatially processed signals as the output signal.
  - 11. The method according to claim 1, wherein said method comprises:
    - applying a third spatial processing filter to the input signal;
      
      at a third time subsequent to the second time, determining that the third spatial processing filter begins to separate the speech and noise components better than the first spatial processing filter and better than the second spatial processing filter; and
      
      in response to said determining at a third time, producing a signal that is based on a third spatially processed signal as the output signal,wherein the third spatially processed signal is based on the input signal.
  - 12. The method according to claim 11, wherein said producing a signal that is based on a second spatially processed signal as the output signal includes transitioning the output signal, over a first merge interval, from the signal that is based on the first spatially processed signal to a signal that is based on the second spatially processed signal, andwherein said producing a signal that is based on a third spatially processed signal as the output signal includes transitioning the output signal, over a second merge interval, from the signal that is based on the second spatially processed signal to a signal that is based on the third spatially processed signal,wherein the second merge interval is longer than the first merge interval.
  - 13. The method according to claim 1, wherein said applying a first spatial processing filter to the input signal produces a first filtered signal, andwherein said applying a second spatial processing filter to the input signal produces a second filtered signal, andwherein said determining at a first time includes detecting that an energy difference between a channel of the input signal and a channel of the first filtered signal is greater than an energy difference between the channel of the input signal and a channel of the second filtered signal.
  - 14. The method according to claim 1, wherein said applying a first spatial processing filter to the input signal produces a first filtered signal, andwherein said applying a second spatial processing filter to the input signal produces a second filtered signal, andwherein said determining at a first time includes detecting that the value of a correlation between two channels of the first filtered signal is less than the value of a correlation between two channels of the second filtered signal.
  - 15. The method according to claim 1, wherein said applying a first spatial processing filter to the input signal produces a first filtered signal, andwherein said applying a second spatial processing filter to the input signal produces a second filtered signal, andwherein said determining at a first time includes detecting that an energy difference between channels of the first filtered signal is greater than an energy difference between channels of the second filtered signal.
  - 16. The method according to claim 1, wherein said applying a first spatial processing filter to the input signal produces a first filtered signal, andwherein said applying a second spatial processing filter to the input signal produces a second filtered signal, andwherein said determining at a first time includes detecting that a value of a speech measure for a channel of the first filtered signal is greater than a value of the speech measure for a channel of the second filtered signal.
  - 17. The method according to claim 1, wherein said applying a first spatial processing filter to the input signal produces a first filtered signal, andwherein said applying a second spatial processing filter to the input signal produces a second filtered signal, andwherein said determining at a first time includes calculating a time difference of arrival among two channels of the input signal.
  - 18. The method according to claim 1, wherein said method comprises applying a noise reference based on at least one channel of the output signal to reduce noise in another channel of the output signal.

19. An apparatus for processing an M-channel input signal that includes a speech component and a noise component, M being an integer greater than one, to produce a spatially filtered output signal, said apparatus comprising:
- means for performing a first spatial processing operation on the input signal;
  
  means for performing a second spatial processing operation on the input signal;
  
  means for determining, at a first time, that the means for performing a first spatial processing operation begins to separate the speech and noise components better than the means for performing a second spatial processing operation;
  
  means for producing, in response to an indication from said means for determining at a first time, a signal that is based on a first spatially processed signal as the output signal;
  
  means for determining, at a second time subsequent to the first time, that the means for performing a second spatial processing operation begins to separate the speech and noise components better than the means for performing a first spatial processing operation; and
  
  means for producing, in response to an indication from said means for determining at a second time, a signal that is based on a second spatially processed signal as the output signal,wherein the first and second spatially processed signals are based on the input signal.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The apparatus according to claim 19, wherein a plurality of the coefficient values of at least one among (A) said means for performing a first spatial processing operation and (B) said means for performing a second spatial processing operation is based on a plurality of multichannel training signals that is recorded under a plurality of different acoustic scenarios.
  - 21. The apparatus according to claim 19, wherein said means for performing the first spatial processing operation on the input signal is configured to produce the first spatially processed signal, and wherein said means for performing the second spatial processing operation on the input signal is configured to produce the second spatially processed signal, andwherein said means for producing a signal that is based on a first spatially processed signal as the output signal is configured to produce the first spatially processed signal as the output signal, andwherein said means for producing a signal that is based on a second spatially processed signal as the output signal is configured to produce the second spatially processed signal as the output signal.
  - 22. The apparatus according to claim 19, wherein said apparatus comprises means for determining that the means for performing a first spatial processing operation continues to separate the speech and noise components better than the means for performing a second spatial processing operation over a first delay interval immediately following the first time, andwherein said means for producing the signal that is based on a first spatially processed signal as the output signal is configured to begin to produce said signal after the first delay interval.
  - 23. The apparatus according to claim 19, wherein said means for producing a signal that is based on a second spatially processed signal as the output signal includes means for transitioning the output signal, over a first merge interval, from the signal that is based on the first spatially processed signal to a signal that is based on the second spatially processed signal, andwherein said means for transitioning is configured to produce, during the first merge interval, a signal that is based on both of the first and second spatially processed signals as the output signal.
  - 24. The apparatus according to claim 19, wherein said means for performing a first spatial processing operation on the input signal produces a first filtered signal, andwherein said means for performing a second spatial processing operation on the input signal produces a second filtered signal, andwherein said means for determining at a first time includes means for detecting that an energy difference between a channel of the input signal and a channel of the first filtered signal is greater than an energy difference between the channel of the input signal and a channel of the second filtered signal.
  - 25. The apparatus according to claim 19, wherein said means for performing a first spatial processing operation on the input signal produces a first filtered signal, andwherein said means for performing a second spatial processing operation on the input signal produces a second filtered signal, andwherein said means for determining at a first time includes means for detecting that the value of a correlation between two channels of the first filtered signal is less than the value of a correlation between two channels of the second filtered signal.
  - 26. The apparatus according to claim 19, wherein said means for performing a first spatial processing operation on the input signal produces a first filtered signal, andwherein said means for performing a second spatial processing operation on the input signal produces a second filtered signal, andwherein said means for determining at a first time includes means for detecting that an energy difference between channels of the first filtered signal is greater than an energy difference between channels of the second filtered signal.
  - 27. The apparatus according to claim 19, wherein said means for performing a first spatial processing operation on the input signal produces a first filtered signal, andwherein said means for performing a second spatial processing operation on the input signal produces a second filtered signal, andwherein said means for determining at a first time includes means for detecting that a value of a speech measure for a channel of the first filtered signal is greater than a value of the speech measure for a channel of the second filtered signal.
  - 28. The apparatus according to claim 19, wherein said apparatus comprises an array of microphones configured to produce an M-channel signal upon which the input signal is based.
  - 29. The apparatus according to claim 19, wherein said apparatus comprises means for applying a noise reference based on at least one channel of the output signal to reduce noise in another channel of the output signal.

30. An apparatus for processing an M-channel input signal that includes a speech component and a noise component, M being an integer greater than one, to produce a spatially filtered output signal, said apparatus comprising:
- a first spatial processing filter configured to filter the input signal;
  
  a second spatial processing filter configured to filter the input signal;
  
  a state estimator configured to indicate, at a first time, that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter; and
  
  a transition control module configured to produce, in response to the indication at a first time, a signal that is based on a first spatially processed signal as the output signal,wherein said state estimator is configured to indicate, at a second time subsequent to the first time, that the second spatial processing filter begins to separate the speech and noise components better than the first spatial processing filter, andwherein said transition control module is configured to produce, in response to the indication at a second time, a signal that is based on a second spatially processed signal as the output signal, andwherein the first and second spatially processed signals are based on the input signal.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
- - 31. The apparatus according to claim 30, wherein a plurality of the coefficient values of at least one of the first and second spatial processing filters is obtained from a converged filter state that is based on a plurality of multichannel training signals, wherein the plurality of multichannel training signals is recorded under a plurality of different acoustic scenarios.
  - 32. The apparatus according to claim 30, wherein said first spatial processing filter is configured to produce the first spatially processed signal in response to the input signal, and wherein said second spatial processing filter is configured to produce the second spatially processed signal in response to the input signal,wherein said transition control module is configured to produce a signal that is based on a first spatially processed signal as the output signal by producing the first spatially processed signal as the output signal, andwherein said transition control module is configured to produce a signal that is based on a second spatially processed signal as the output signal by producing the second spatially processed signal as the output signal.
  - 33. The apparatus according to claim 30, wherein said state estimator is configured to determine that the first spatial processing filter continues to separate the speech and noise components better than the second spatial processing filter over a first delay interval immediately following the first time, andwherein said transition control module is configured to produce a signal that is based on the second spatially processed signal as the output signal during the first delay interval, andwherein said transition control module is configured to produce the signal that is based on the first spatially processed signal as the output signal after the first delay interval.
  - 34. The apparatus according to claim 30, wherein said transition control module is configured to produce the signal that is based on a second spatially processed signal as the output signal by transitioning the output signal, over a first merge interval, from the signal that is based on the first spatially processed signal to a signal that is based on the second spatially processed signal, andwherein, during the first merge interval, said transition control module is configured to produce a signal that is based on both of the first and second spatially processed signals as the output signal.
  - 35. The apparatus according to claim 30, wherein said first spatial processing filter is configured to produce a first filtered signal in response to the input signal, andwherein said second spatial processing filter is configured to produce a second filtered signal in response to the input signal, andwherein said state estimator is configured to determine, at the first time, that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter by detecting that an energy difference between a channel of the input signal and a channel of the first filtered signal is greater than an energy difference between the channel of the input signal and a channel of the second filtered signal.
  - 36. The apparatus according to claim 30, wherein said first spatial processing filter is configured to produce a first filtered signal in response to the input signal, andwherein said second spatial processing filter is configured to produce a second filtered signal in response to the input signal, andwherein said state estimator is configured to determine, at the first time, that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter by detecting that the value of a correlation between two channels of the first filtered signal is less than the value of a correlation between two channels of the second filtered signal.
  - 37. The apparatus according to claim 30, wherein said first spatial processing filter is configured to produce a first filtered signal in response to the input signal, andwherein said second spatial processing filter is configured to produce a second filtered signal in response to the input signal, andwherein said state estimator is configured to determine, at the first time, that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter by detecting that an energy difference between channels of the first filtered signal is greater than an energy difference between channels of the second filtered signal.
  - 38. The apparatus according to claim 30, wherein said first spatial processing filter is configured to produce a first filtered signal in response to the input signal, andwherein said second spatial processing filter is configured to produce a second filtered signal in response to the input signal, andwherein said state estimator is configured to determine, at the first time, that the first spatial processing filter begins to separate the speech and noise components better than the second spatial processing filter by detecting that a value of a speech measure for a channel of the first filtered signal is greater than a value of the speech measure for a channel of the second filtered signal.
  - 39. The apparatus according to claim 30, wherein said apparatus comprises an array of microphones configured to produce an M-channel signal upon which the input signal is based.
  - 40. The apparatus according to claim 30, wherein said apparatus comprises a noise reduction filter configured to apply a noise reference based on at least one channel of the output signal to reduce noise in another channel of the output signal.

41. A computer-readable medium comprising instructions which when executed by a processor cause the processor to perform a method of processing an M-channel input signal that includes a speech component and a noise component, M being an integer greater than one, to produce a spatially filtered output signal, said instructions comprising instructions which when executed by a processor cause the processor to:
- perform a first spatial processing operation on the input signal;
  
  perform a second spatial processing operation on the input signal;
  
  indicate, at a first time, that the first spatial processing operation begins to separate the speech and noise components better than the second spatial processing operation;
  
  produce, in response to said indication at a first time, a signal that is based on a first spatially processed signal as the output signal;
  
  indicate, at a second time subsequent to the first time, that the second spatial processing operation begins to separate the speech and noise components better than the first spatial processing operation; and
  
  produce, in response to said indication at a second time, a signal that is based on a second spatially processed signal as the output signal,wherein the first and second spatially processed signals are based on the input signal.
- View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 42. The computer-readable medium according to claim 41, wherein a plurality of the coefficient values of at least one of the first and second spatial processing operations is obtained from a converged filter state that is based on a plurality of multichannel training signals, wherein the plurality of multichannel training signals is recorded under a plurality of different acoustic scenarios.
  - 43. The computer-readable medium according to claim 41, wherein said instructions which when executed by a processor cause the processor to perform the first spatial processing operation on the input signal cause the processor to produce the first spatially processed signal, and wherein said instructions which when executed by a processor cause the processor to perform the second spatial processing operation on the input signal cause the processor to produce the second spatially processed signal,wherein said instructions which when executed by a processor cause the processor to produce a signal that is based on a first spatially processed signal as the output signal cause the processor to produce the first spatially processed signal as the output signal, andwherein said instructions which when executed by a processor cause the processor to produce a signal that is based on a second spatially processed signal as the output signal cause the processor to produce the second spatially processed signal as the output signal.
  - 44. The computer-readable medium according to claim 41, wherein said medium comprises instructions which when executed by a processor cause the processor to determine that the first spatial processing operation continues to separate the speech and noise components better than the second spatial processing operation over a first delay interval immediately following the first time, andwherein said instructions which when executed by a processor cause the processor to produce the signal that is based on a first spatially processed signal as the output signal cause the processor to begin to produce said signal after the first delay interval.
  - 45. The computer-readable medium according to claim 41, wherein said instructions which when executed by a processor cause the processor to produce a signal that is based on a second spatially processed signal as the output signal include instructions which when executed by a processor cause the processor to transition the output signal, over a first merge interval, from the signal that is based on the first spatially processed signal to a signal that is based on the second spatially processed signal, andwherein said instructions which when executed by a processor cause the processor to transition include instructions which when executed by a processor cause the processor to produce, during the first merge interval, a signal that is based on both of the first and second spatially processed signals as the output signal.
  - 46. The computer-readable medium according to claim 41, wherein said instructions which when executed by a processor cause the processor to perform a first spatial processing operation on the input signal cause the processor to produce a first filtered signal, andwherein said instructions which when executed by a processor cause the processor to perform a second spatial processing operation on the input signal cause the processor to produce a second filtered signal, andwherein said instructions which when executed by a processor cause the processor to indicate at a first time include instructions which when executed by a processor cause the processor to detect that an energy difference between a channel of the input signal and a channel of the first filtered signal is greater than an energy difference between the channel of the input signal and a channel of the second filtered signal.
  - 47. The computer-readable medium according to claim 41, wherein said instructions which when executed by a processor cause the processor to perform a first spatial processing operation on the input signal cause the processor to produce a first filtered signal, andwherein said instructions which when executed by a processor cause the processor to perform a second spatial processing operation on the input signal cause the processor to produce a second filtered signal, andwherein said instructions which when executed by a processor cause the processor to indicate at a first time include instructions which when executed by a processor cause the processor to detect that the value of a correlation between two channels of the first filtered signal is less than the value of a correlation between two channels of the second filtered signal.
  - 48. The computer-readable medium according to claim 41, wherein said instructions which when executed by a processor cause the processor to perform a first spatial processing operation on the input signal cause the processor to produce a first filtered signal, andwherein said instructions which when executed by a processor cause the processor to perform a second spatial processing operation on the input signal cause the processor to produce a second filtered signal, andwherein said instructions which when executed by a processor cause the processor to indicate at a first time include instructions which when executed by a processor cause the processor to detect that an energy difference between channels of the first filtered signal is greater than an energy difference between channels of the second filtered signal.
  - 49. The computer-readable medium according to claim 41, wherein said instructions which when executed by a processor cause the processor to perform a first spatial processing operation on the input signal cause the processor to produce a first filtered signal, andwherein said instructions which when executed by a processor cause the processor to perform a second spatial processing operation on the input signal cause the processor to produce a second filtered signal, andwherein said instructions which when executed by a processor cause the processor to indicate at a first time include instructions which when executed by a processor cause the processor to detect that a value of a speech measure for a channel of the first filtered signal is greater than a value of the speech measure for a channel of the second filtered signal.
  - 50. The computer-readable medium according to claim 41, wherein said medium comprises instructions which when executed by a processor cause the processor to apply a noise reference based on at least one channel of the output signal to reduce noise in another channel of the output signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Chan, Kwok-Leung, Visser, Erik, Park, Hyun Jin, Toman, Jeremy
Primary Examiner(s)
Ha, Nathan

Application Number

US12/334,246
Publication Number

US 20090164212A1
Time in Patent Office

1,243 Days
Field of Search

381 92- 94, 381/94.7, 704/233, 704/E15.039
US Class Current

381/94.7
CPC Class Codes

G10L 2021/02165   Two microphones, one receiv...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

Systems, methods, and apparatus for multi-microphone based speech enhancement

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

133 Citations

50 Claims

Specification

Solutions

Use Cases

Quick Links

Systems, methods, and apparatus for multi-microphone based speech enhancement

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

133 Citations

50 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links