Adaptive beamforming to create reference channels

US 9,747,920 B2
Filed: 12/17/2015
Issued: 08/29/2017
Est. Priority Date: 12/17/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for cancelling an echo from an audio signal to isolate received speech, the method comprising:

sending a first output audio signal to a first wireless speaker;

receiving a first input audio signal from a first microphone of a microphone array, the first input audio signal including a first representation of audible sound output by the first wireless speaker and a first representation of speech input;

receiving a second input audio signal from a second microphone of the microphone array, the second input audio signal including a second representation of the audible sound output by the first wireless speaker and a second representation of the speech input;

performing first audio beamforming to determine a first portion of combined input audio data comprising a first portion of the first input audio signal corresponding to a first direction and a first portion of the second input audio signal corresponding to the first direction;

performing second audio beamforming to determine a second portion of the combined input audio data comprising a second portion of the first input audio signal corresponding to a second direction and a second portion of the second input audio signal corresponding to the second direction;

selecting at least the first portion as a target signal on which to perform echo cancellation;

selecting at least the second portion as a reference signal to remove from the target signal;

removing the reference signal from the target signal to generate a second output audio signal including a third representation of the speech input;

performing speech recognition processing on the second output audio signal to determine a command; and

executing the command.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An echo cancellation system that performs audio beamforming to separate audio input into multiple directions and determines a target signal and a reference signal from the multiple directions. For example, the system may detect a strong signal associated with a speaker and select the strong signal as a reference signal, selecting another direction as a target signal. The system may determine a speech position and may select the speech position as a target signal and an opposite direction as a reference signal. The system may create pairwise combinations of opposite directions, with an individual direction being selected as a target signal and a reference signal. The system may select a fixed beamformer output for the target signal and an adaptive beamformer output for the reference signal, or vice versa. The system may remove the reference signal (e.g., audio output by the loudspeaker) to isolate speech included in the target signal.

Citations

20 Claims

1. A computer-implemented method for cancelling an echo from an audio signal to isolate received speech, the method comprising:
- sending a first output audio signal to a first wireless speaker;
  
  receiving a first input audio signal from a first microphone of a microphone array, the first input audio signal including a first representation of audible sound output by the first wireless speaker and a first representation of speech input;
  
  receiving a second input audio signal from a second microphone of the microphone array, the second input audio signal including a second representation of the audible sound output by the first wireless speaker and a second representation of the speech input;
  
  performing first audio beamforming to determine a first portion of combined input audio data comprising a first portion of the first input audio signal corresponding to a first direction and a first portion of the second input audio signal corresponding to the first direction;
  
  performing second audio beamforming to determine a second portion of the combined input audio data comprising a second portion of the first input audio signal corresponding to a second direction and a second portion of the second input audio signal corresponding to the second direction;
  
  selecting at least the first portion as a target signal on which to perform echo cancellation;
  
  selecting at least the second portion as a reference signal to remove from the target signal;
  
  removing the reference signal from the target signal to generate a second output audio signal including a third representation of the speech input;
  
  performing speech recognition processing on the second output audio signal to determine a command; and
  
  executing the command.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, further comprising:
    - determining that the second portion corresponds to a highest amplitude representation of the audible sound output of a plurality of portions;
      
      determining that an amplitude of the second portion is above a threshold;
      
      associating the second portion with the first wireless speaker;
      
      selecting the second portion as the reference signal; and
      
      selecting remaining portions of the plurality of portions as the target signal.
  - 3. The computer-implemented method of claim 1, further comprising:
    - determining that the speech input is associated with the first direction;
      
      selecting the first portion as the target signal; and
      
      selecting at least the second portion as the reference signal.
  - 4. The computer-implemented method of claim 1, further comprising:
    - determining that the second portion corresponds to a highest amplitude representation of the audible sound output of a plurality of portions;
      
      determining that an amplitude of the second portion is below a threshold;
      
      selecting the first portion as the target signal;
      
      determining that the second direction is opposite the first direction;
      
      selecting the second portion as the reference signal;
      
      selecting the second portion as a second target signal;
      
      selecting the first portion as a second reference signal;
      
      removing the reference signal from the target signal to generate the second output audio signal; and
      
      removing the second reference signal from the second target signal to generate a third output audio signal.

5. A computer-implemented method, comprising:
- receiving first input audio data from a first microphone of a microphone array, the first input audio data including a first representation of sound output by a first wireless speaker and a first representation of speech input;
  
  receiving second input audio data from a second microphone of the microphone array, the second input audio data including a second representation of the audible sound output by the first wireless speaker and a second representation of the speech input;
  
  performing first audio beamforming to determine a first portion of combined input audio data comprising a first portion of the first input audio signal corresponding to a first direction and a first portion of the second input audio signal corresponding to the first direction;
  
  performing second audio beamforming to determine a second portion of the combined input audio data comprising a second portion of the first input audio signal corresponding to a second direction and a second portion of the second input audio signal corresponding to the second direction;
  
  selecting at least the first portion as a target signal;
  
  selecting at least the second portion as a reference signal; and
  
  removing the reference signal from the target signal to generate first output audio data including a third representation of the speech input.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The computer-implemented method of claim 5, further comprising:
    - sending second output audio data to the first wireless speaker;
      
      determining that the second portion corresponds to a highest amplitude of a plurality of portions;
      
      determining that an amplitude of the second portion is above a threshold; and
      
      associating the second portion with the first wireless speaker.
  - 7. The computer-implemented method of claim 5, further comprising:
    - determining that an amplitude associated with the second portion is above a threshold;
      
      determining that a highest amplitude associated with remaining portions of a plurality of portions is below the threshold;
      
      selecting the second portion as the reference signal; and
      
      selecting the remaining portions as the target signal.
  - 8. The computer-implemented method of claim 5, further comprising:
    - determining that a first amplitude associated with the second portion is above a threshold;
      
      determining that a second amplitude associated with a third portion of a plurality of portions is above the threshold;
      
      selecting the second portion as the reference signal;
      
      selecting the third portion as a second reference signal;
      
      selecting at least the first portion as the target signal; and
      
      removing the reference signal and the second reference signal from the target signal to generate the first output audio data.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining that a first amplitude associated with the first portion is above a threshold;
      
      determining that a second amplitude associated with the second portion is above the threshold;
      
      determining that the speech input is associated with the first direction;
      
      selecting the first portion as the target signal; and
      
      selecting the second portion as the reference signal.
  - 10. The computer-implemented method of claim 5, further comprising:
    - determining that the speech input is associated with the first direction selecting the first portion as the target signal;
      
      determining that the second direction is opposite the first direction; and
      
      selecting at least the second portion as the reference signal.
  - 11. The computer-implemented method of claim 5, further comprising:
    - determining that the second portion corresponds to a highest amplitude of a plurality of portions;
      
      determining that an amplitude of the second portion is below a threshold;
      
      selecting the first portion as the target signal;
      
      determining that the second direction is opposite the first direction;
      
      selecting the second portion as the reference signal;
      
      selecting the second portion as a second target signal;
      
      selecting the first portion as a second reference signal; and
      
      removing the second reference signal from the second target signal to generate second output audio data including a fourth representation of the speech input.
  - 12. The computer-implemented method of claim 5, further comprising:
    - performing the first audio beamforming to determine the first portion using a fixed beamforming technique;
      
      performing the second audio beamforming to determine the second portion using the fixed beamforming technique;
      
      determining that a first amplitude associated with the first portion is below a threshold;
      
      determining that a second amplitude associated with the second portion is above the threshold;
      
      performing, using an adaptive beamforming technique, third audio beamforming to determine a third portion of the combined input audio data comprising a third portion of the first input audio signal corresponding to the second direction and a third portion of the second input audio signal corresponding to the second direction;
      
      selecting at least the first portion as the target signal; and
      
      selecting at least the third portion as the reference signal.

13. A device, comprising:
- at least one processor;
  
  a memory device including instructions operable to be executed by the at least one processor to configure the device to;
  
  receive first input audio data from a first microphone of a microphone array, the first input audio data including a first representation of sound output by a first wireless speaker and a first representation of speech input;
  
  receive second input audio data from a second microphone of the microphone array, the second input audio data including a second representation of the audible sound output by the first wireless speaker and a second representation of the speech input;
  
  perform first audio beamforming to determine a first portion of combined input audio data comprising a first portion of the first input audio signal corresponding to a first direction and a first portion of the second input audio signal corresponding to the first direction;
  
  perform second audio beamforming to determine a second portion of the combined input audio data comprising a second portion of the first input audio signal corresponding to a second direction and a second portion of the second input audio signal corresponding to the second direction;
  
  select at least the first portion as a target signal;
  
  select at least the second portion as a reference signal; and
  
  remove the reference signal from the target signal to generate first output audio data including a third representation of the speech input.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system of claim 13, wherein the instructions further configure the system to:
    - sending second output audio data to the first wireless speaker;
      
      determine that the second portion corresponds to a highest amplitude of a plurality of portions;
      
      determine that an amplitude of the second portion is above a threshold; and
      
      associate the second portion with the first wireless speaker.
  - 15. The system of claim 13, wherein the instructions further configure the system to:
    - determine that an amplitude associated with the second portion is above a threshold;
      
      determine that a highest amplitude associated with remaining portions of a plurality of portions is below the threshold;
      
      select the second portion as the reference signal; and
      
      select the remaining portions as the target signal.
  - 16. The system of claim 13, wherein the instructions further configure the system to:
    - determine that a first amplitude associated with the second portion is above a threshold;
      
      determine that a second amplitude associated with a third portion of a plurality of portions is above the threshold;
      
      select the second portion as the reference signal;
      
      select the third portion as a second reference signal;
      
      select at least the first portion as the target signal; and
      
      remove the reference signal and the second reference signal from the target signal to generate the first output audio data.
  - 17. The system of claim 13, wherein the instructions further configure the system to:
    - determine that a first amplitude associated with the first portion is above a threshold;
      
      determine that a second amplitude associated with the second portion is above the threshold;
      
      determine that the speech input is associated with the first direction;
      
      select the first portion as the target signal; and
      
      select the second portion as the reference signal.
  - 18. The system of claim 13, wherein the instructions further configure the system to:
    - determine that the speech input is associated with the first direction select the first portion as the target signal;
      
      determine that the second direction is opposite the first direction; and
      
      select at least the second portion as the reference signal.
  - 19. The system of claim 13, wherein the instructions further configure the system to:
    - determine that the second portion corresponds to a highest amplitude of a plurality of portions;
      
      determine that an amplitude of the second portion is below a threshold;
      
      select the first portion as the target signal;
      
      determine that the second direction is opposite the first direction;
      
      select the second portion as the reference signal;
      
      select the second portion as a second target signal;
      
      select the first portion as a second reference signal; and
      
      remove the second reference signal from the second target signal to generate second output audio data including a fourth representation of the speech input.
  - 20. The system of claim 13, wherein the instructions further configure the system to:
    - perform the first audio beamforming to determine the first portion using a fixed beamforming technique;
      
      perform the second audio beamforming to determine the second portion using the fixed beamforming technique;
      
      determine that a first amplitude associated with the first portion is below a threshold;
      
      determine that a second amplitude associated with the second portion is above the threshold;
      
      perform, using an adaptive beamforming technique, third audio beamforming to determine a third portion of the combined input audio data comprising a third portion of the first input audio signal corresponding to the second direction and a third portion of the second input audio signal corresponding to the second direction;
      
      select at least the first portion as the target signal; and
      
      select at least the third portion as the reference signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Ayrapetian, Robert, Hilmes, Philip Ryan
Primary Examiner(s)
HUBER, PAUL W

Application Number

US14/973,274
Publication Number

US 20170178662A1
Time in Patent Office

621 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 2021/02082   the noise being echo, rever...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

G10L 21/0216   characterised by the method...

H04R 2201/40   Details of arrangements for...

H04R 2203/12   Beamforming aspects for ste...

H04R 2420/07   Applications of wireless lo...

H04R 3/005   for combining the signals o...

H04R 5/04   Circuit arrangements, e.g. ...

Adaptive beamforming to create reference channels

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive beamforming to create reference channels

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links