Tailoring beamforming techniques to environments

US 9,640,179 B1
Filed: 06/27/2013
Issued: 05/02/2017
Est. Priority Date: 06/27/2013
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising:

one or more processors;

a speaker;

a microphone array; and

one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

instructing the speaker to emit a known sound in an environment;

generating a first audio signal representing at least sound of the known sound reflected from the environment and captured by the microphone array;

comparing characteristics of the known sound to characteristics of the reflected sound representing in the first audio signal to determine an acoustic characteristic of the environment;

generating a second audio signal based on sound uttered by a user in the environment and captured by the microphone array;

applying a set of beamformer coefficients to the second audio signal to generate a processed audio signal representing a beampattern, the beampattern having multiple lobes each focused on a region within the environment;

determining which of the multiple lobes correspond to regions of the environment from which speech has previously been found to originate from;

selecting a lobe of the multiple lobes based at least in part on an amount of energy associated with the lobe, the acoustic characteristic of the environment, and whether previously processed audio signals associated with the lobe have previously been selected; and

preparing the processed audio signal for automatic speech recognition (ASR) based at least in part on the selecting.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for tailoring beamforming techniques to environments such that processing resources may be devoted to a portion of an audio signal corresponding to a lobe of a beampattern that is most likely to contain user speech. The techniques take into account both acoustic characteristics of an environment and heuristics regarding lobes that have previously been found to include user speech.

Citations

22 Claims

1. An apparatus comprising:
- one or more processors;
  
  a speaker;
  
  a microphone array; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  instructing the speaker to emit a known sound in an environment;
  
  generating a first audio signal representing at least sound of the known sound reflected from the environment and captured by the microphone array;
  
  comparing characteristics of the known sound to characteristics of the reflected sound representing in the first audio signal to determine an acoustic characteristic of the environment;
  
  generating a second audio signal based on sound uttered by a user in the environment and captured by the microphone array;
  
  applying a set of beamformer coefficients to the second audio signal to generate a processed audio signal representing a beampattern, the beampattern having multiple lobes each focused on a region within the environment;
  
  determining which of the multiple lobes correspond to regions of the environment from which speech has previously been found to originate from;
  
  selecting a lobe of the multiple lobes based at least in part on an amount of energy associated with the lobe, the acoustic characteristic of the environment, and whether previously processed audio signals associated with the lobe have previously been selected; and
  
  preparing the processed audio signal for automatic speech recognition (ASR) based at least in part on the selecting.
- View Dependent Claims (2, 3, 4, 5)
- - 2. An apparatus as recited in claim 1, the acts further comprising sending the processed audio signal at least partly after the preparing to a remote entity for performing the ASR on the processed audio signal.
  - 3. An apparatus as recited in claim 1, wherein the comparing is effective to determine a room impulse response (RIR) of the environment.
  - 4. An apparatus as recited in claim 1, wherein the comparing is effective to determine an amount of echo in respective portions of the first audio signal corresponding to respective lobes of the multiple lobes.
  - 5. An apparatus as recited in claim 1, wherein the preparing comprises performing more acoustic echo cancelation (AEC) on the portion of the processed audio signal corresponding to the selected lobe than to a remainder of the processed audio signal.

6. A method comprising:
- under control of one or more computing systems configured with executable instructions,measuring an acoustic characteristic of an environment, the measuring comprising;
  
  instructing a speaker to emit a known sound in the environment;
  
  capturing reflected sound, the reflected sound corresponding to reflection of the known sound by the environment; and
  
  comparing characteristics of the known sound to characteristics of the reflected sound at least in part to identify the acoustic characteristic;
  
  selecting a portion of an audio signal, the portion corresponding to a lobe of a beampattern and the selecting based at least in part on;
  
  (1) an amount of energy associated with the lobe of the beampattern;
  
  (2) the acoustic characteristic of the environment, and (3) whether previous portions of audio signals corresponding to the lobe of the beampattern have been previously selected for enhancement; and
  
  enhancing the portion of the audio signal corresponding to the lobe to increase a signal-to-noise (SNR) ratio of the portion of the audio signal.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
- - 7. A method as recited in claim 6, wherein the acoustic characteristic comprises a room impulse response (RIR) of the environment.
  - 8. A method as recited in claim 6, wherein a device within the environment generates the audio signal, and the acoustic characteristic comprises an amount of echo received by the device from at least one direction within the environment.
  - 9. A method as recited in claim 6, wherein the enhancing comprises devoting additional processing resources for performing acoustic echo cancelation (AEC) on the portion of the audio signal as compared to a remainder of the audio signal.
  - 10. A method as recited in claim 6, wherein the enhancing comprises devoting more resources of the one or more computing systems to processing of the portion of the audio signal corresponding to the lobe as compared to an amount of the resources devoted to a remainder of the audio signal.
  - 11. A method as recited in claim 6, wherein the acoustic characteristic comprises a room impulse response (RIR) of the environment, and the acts further comprise measuring the RIR of the environment, the measuring comprising:
    - instructing the speaker to emit a second known sound in the environment;
      
      capturing reflected sound, the reflected sound corresponding to reflection of the second known sound by the environment;
      
      comparing characteristics of the second known sound to characteristics of the reflected sound at least in part to identify the RIR.
  - 12. A method as recited in claim 6, further comprising determining whether previous portions of audio signals corresponding to the lobe of the beampattern have been previously selected more than a threshold number of times.
  - 13. A method as recited in claim 6, further comprising determining whether previous portions of audio signals corresponding to the lobe of the beampattern have been previously selected more than a threshold percentage of times.
  - 14. A method as recited in claim 6, further comprising determining whether previous portions of audio signals corresponding to the lobe of the beampattern have been more recently selected for enhancement.

15. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- measuring a value of an acoustic characteristic of an environment, the measuring comprising;
  
  instructing a speaker to emit a known sound in the environment;
  
  capturing reflected sound, the reflected sound corresponding to reflection of the known sound by the environment; and
  
  comparing characteristics of the known sound to characteristics of the reflected sound at least in part to identify the acoustic characteristic;
  
  capturing speech uttered by a user within the environment;
  
  generating an audio signal that includes the speech;
  
  processing the audio signal by applying a set of beamformer coefficients to the audio signal to generate a processed audio signal that represents a beampattern, the beampattern having multiple lobes each focused on a region within the environment;
  
  determining a portion of the processed audio signal corresponding to one or more lobes of the beampattern focused on one or more regions from which user speech was previously determined to have originated from;
  
  comparing the value of the acoustic characteristic to a previously measured value of the acoustic characteristic of the environment; and
  
  at least partly in response to determining that the value and the previously measured value differ by less than a threshold amount, applying relatively more processing resources to the portion of the processed audio signal and relatively less processing resources to a remainder of the processed audio signal.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. One or more non-transitory computer-readable media as recited in claim 15, the acts further comprising applying substantially equal processing resources to the portion of the processed audio signal and the remainder of the processed audio signal at least partly in response to determining that value and the previously measured value do not differ by less than the threshold amount.
  - 17. One or more non-transitory computer-readable media as recited in claim 15, wherein the acoustic characteristic comprises a room impulse response (RIR) of the environment.
  - 18. One or more non-transitory computer-readable media as recited in claim 17, wherein measuring the RIR of the environment comprises:
    - instructing a speaker to emit a second known sound within the environment;
      
      capturing sound of the second known sound that has been reflected by one or more surfaces within the environment;
      
      generating an audio signal based on the captured sound; and
      
      comparing the captured sound in the audio signal to the second known sound to determine a disparity there between, the RIR being based at least in part on the disparity.
  - 19. One or more non-transitory computer-readable media as recited in claim 15, wherein the acoustic characteristic is based at least in part on an amount of echo in the environment.
  - 20. One or more non-transitory computer-readable media as recited in claim 15, wherein the processing resources applied to the portion and to the remainder of the audio signal increase a signal-to-noise (SNR) ratio of the portion and the remainder of the audio signal.
  - 21. One or more non-transitory computer-readable media as recited in claim 15, wherein the value of the acoustic characteristic is measured after capturing the speech uttered by the user, and the previously measured value comprises a value of the acoustic characteristic that was measured prior to the capturing of the speech.
  - 22. One or more non-transitory computer-readable media as recited in claim 15, wherein the applying is further based at least in part on an amount of energy associated with the one or more lobes from which user speech was previously determined to have originated from.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory Michael, Velusamy, Kavitha, Worley, III, William Spencer
Primary Examiner(s)
JACKSON, JAKIEDA R

Application Number

US13/928,751
Time in Patent Office

1,405 Days
Field of Search

704233
US Class Current
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 2021/02082   the noise being echo, rever...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

G10L 21/0232   Processing in the frequency...

G10L 25/51   for comparison or discrimin...

H04R 2420/09   Applications of special con...

H04R 3/005   for combining the signals o...

Tailoring beamforming techniques to environments

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Tailoring beamforming techniques to environments

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links