Echo cancellation based on shared reference signals

US 9,779,731 B1
Filed: 08/20/2012
Issued: 10/03/2017
Est. Priority Date: 08/20/2012
Status: Active Grant

First Claim

Patent Images

1. A first device comprising:

a processor;

a microphone to capture a speech utterance and an audio output from a second device and to generate an audio signal based at least in part on the speech utterance and the audio output from the second device, the audio signal including at least;

(1) a speech component corresponding to the speech utterance; and

(2) an audio component associated with the audio output from the second device, the second device being physically independent from the first device and from a source of the speech utterance;

a reference signal module configured to be operated by the processor to receive a reference signal from the second device, the reference signal corresponding to the audio output from the second device;

a signal processing module configured to be operated by the processor to process the audio signal to generate a processed audio signal by removing at least a part of the audio signal that corresponds to the reference signal; and

a speech recognition module configured to be operated by the processor to perform speech recognition on the processed audio signal or to provide the processed audio signal to another entity for performing the speech recognition, the processed audio signal substantially including the speech component.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio processing system configured to generate, based at least in part on captured sound, an audio signal that includes a speech component corresponding to a user'"'"'s speech utterance and an audio component corresponding to audio output of another device is described herein. The audio processing system is also configured to receive a reference signal that corresponds to the audio output of the other device. The reference signal may be received as ultrasonic audio output of the other device or from a remote server. The audio processing device then processes the generated audio signal to remove at least a part of the generated audio signal that corresponds to the reference signal.

42 Citations

View as Search Results

26 Claims

1. A first device comprising:
- a processor;
  
  a microphone to capture a speech utterance and an audio output from a second device and to generate an audio signal based at least in part on the speech utterance and the audio output from the second device, the audio signal including at least;
  
  (1) a speech component corresponding to the speech utterance; and
  
  (2) an audio component associated with the audio output from the second device, the second device being physically independent from the first device and from a source of the speech utterance;
  
  a reference signal module configured to be operated by the processor to receive a reference signal from the second device, the reference signal corresponding to the audio output from the second device;
  
  a signal processing module configured to be operated by the processor to process the audio signal to generate a processed audio signal by removing at least a part of the audio signal that corresponds to the reference signal; and
  
  a speech recognition module configured to be operated by the processor to perform speech recognition on the processed audio signal or to provide the processed audio signal to another entity for performing the speech recognition, the processed audio signal substantially including the speech component.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The first device of claim 1, wherein the reference signal module receives the reference signal from the second device via an ultrasonic audio output of the second device.
  - 3. The first device of claim 1, wherein the first device and second device are connected to a remote server over a network and receive audio signals from the remote server.
  - 4. The first device of claim 3, wherein the reference signal module receives the reference signal from the remote server.
  - 5. The first device of claim 3, wherein the reference signal module receives an indication of the reference signal from the second device and utilizes the indication to receive the reference signal from the remote server.

6. A computer-implemented method comprising:
- receiving, at a first device via a microphone of the first device, a speech utterance and an audio output from a second device, the second device being physically separated from the first device and the second device being physically separated from a source of the speech utterance;
  
  generating, by the first device, an audio signal including a speech component corresponding to the speech utterance and an audio component associated with the audio output from the second device;
  
  receiving, by the first device, a reference signal from the second device, the reference signal corresponding to the audio output from the second device; and
  
  processing the audio signal to generate a processed audio signal by removing at least a part of the audio signal that corresponds to the reference signal.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 7. The method of claim 6, further comprising performing speech recognition on the processed audio signal.
  - 8. The method of claim 6, wherein the receiving comprises receiving the reference signal from the second device.
  - 9. The method of claim 8, wherein the reference signal is received via ultrasonic audio output from the second device or through a wireless data transmission from the second device.
  - 10. The method of claim 6, wherein the first device and second device are connected to a remote server over a network and receive audio signals from the remote server.
  - 11. The method of claim 10, wherein the receiving comprises receiving the reference signal from the remote server.
  - 12. The method of claim 10, wherein the receiving comprises receiving an indication of the reference signal and utilizing the indication to retrieve the reference signal from the remote server.
  - 13. The method of claim 12, wherein the indication comprises at least one of an identifier of the reference signal, a timestamp associated with a time at which the audio output of the second device was played, or reference signal metadata.
  - 14. The method of claim 6, wherein the receiving is performed substantially concurrently with capturing the audio component via the microphone or is performed prior to capturing the audio component via the microphone.
  - 15. The method of claim 6, wherein the removing comprises utilizing an echo cancellation filter to remove distortions from the audio signal and then subtracting the reference signal from a resulting audio signal produced by the echo cancellation filter.

16. One or more non-transitory computer-readable media storing computer-executable instructions configured to program a first device to perform operations comprising:
- generating an audio signal based at least in part on a speech utterance and an audio output from a second device, wherein;
  
  the second device is physically separate from the first device and the second device is physically separate from a source of the speech utterance,the speech utterance and the audio output are captured via a microphone,the audio output includes a human-audible audio output and an ultrasonic audio output including a reference signal, andthe audio signal includes a human-audible sound component and an ultrasonic audio component, the human-audible sound component including a speech component corresponding to the speech utterance and an audio component corresponding to the human-audible audio output, and the ultrasonic audio component including the reference signal;
  
  processing the audio signal to generate a processed audio signal by removing at least a part of the audio component corresponding to the audio output by the second device from the audio signal based at least in part on the reference signal; and
  
  sending the processed audio signal to a remote system.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The one or more non-transitory computer-readable media of claim 16, wherein the ultrasonic audio output comprises a digitally encoded reference signal, the digitally encoded reference signal corresponding to the audio output of the second device.
  - 18. The one or more non-transitory computer-readable media of claim 17, wherein the digitally encoded reference signal is encoded with error correction codes or with an error correction encoding scheme.
  - 19. The one or more non-transitory computer-readable media of claim 18, wherein the operations further comprise utilizing the error correction codes or error correction encoding scheme to remove distortions to the digitally encoded reference signal from a part of the audio signal that corresponds to the ultrasonic audio output.
  - 20. The one or more non-transitory computer-readable media of claim 16, wherein the audio output of the second device and the ultrasonic audio output are broadcast substantially concurrently by the second device.
  - 21. The one or more non-transitory computer-readable media of claim 16, wherein the operations further comprise separating a first part of the audio signal corresponding to the human-audible sound from a second part of the audio signal corresponding to the ultrasonic audio output, the separating being performed based at least in part on audio frequencies of the first part and the second part of the audio signal.

22. One or more non-transitory computer-readable media storing computer-executable instructions configured to program a first device to perform operations comprising:
- generating an audio signal based at least in part on a speech utterance and an audio output from a second device, the speech utterance and the audio output from the second device being captured via a microphone of the first device, the audio signal including a speech component corresponding to the speech utterance and an audio component associated with the audio output from the second device, the second device being physically separate from the first device and the second device being physically separate from a source of the speech utterance;
  
  receiving, from a remote server via a wireless unit, a reference signal corresponding to the audio output from the second device; and
  
  processing the generated audio signal by removing at least a part of the audio signal that corresponds to the reference signal.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The one or more non-transitory computer-readable media of claim 22, wherein the operations further comprise, prior to generating the audio signal, generating another audio signal based at least in part on additional audio output that is played by the second device prior to playing the audio output of the second device.
  - 24. The one or more non-transitory computer-readable media of claim 23, wherein the operations further comprise utilizing the other audio signal and a reference signal corresponding to the additional audio output to determine one or more distortions applied to audio output by the second device or by an environment that includes the first device and the second device.
  - 25. The one or more non-transitory computer-readable media of claim 24, wherein the removing comprises utilizing the one or more distortions and the reference signal corresponding to the audio output to remove the part of the audio signal that corresponds to the reference signal.
  - 26. The one or more non-transitory computer-readable media of claim 22, wherein the removing comprises utilizing an echo cancellation filter to remove distortions from the audio signal and then subtracting the reference signal from a resulting audio signal produced by the echo cancellation filter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Haskin, Menashe, Velusamy, Kavitha
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US13/589,967
Time in Patent Office

1,870 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/20   Speech recognition techniqu...

G10L 19/018   Audio watermarking, i.e. em...

G10L 2021/02082   the noise being echo, rever...

G10L 21/0208   Noise filtering

Echo cancellation based on shared reference signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Echo cancellation based on shared reference signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links