Tailoring beamforming techniques to environments

US 10,249,299 B1
Filed: 05/01/2017
Issued: 04/02/2019
Est. Priority Date: 06/27/2013
Status: Active Grant

First Claim

Patent Images

1. An apparatus comprising:

one or more processors;

a microphone array; and

one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

generating, based at least in part on sound captured by the microphone array, a plurality of audio signals, wherein each of the plurality of audio signals corresponds to a respective microphone of the microphone array;

processing, by a beamforming component configured with one or more beamforming coefficients, at least a first audio signal of the plurality of audio signals to generate a first processed audio signal, wherein the first processed audio signal corresponds to a first portion of the sound received from a first direction;

processing, by the beamforming component configured with the one or more beamforming coefficients, at least a second audio signal of the plurality of audio signals to generate a second processed audio signal, wherein the second processed audio signal corresponds to a second portion of the sound received from a second direction;

selecting a direction of interest based at least in part on;

an amount of energy associated with a portion of the first processed audio signal;

an amount of energy associated with a portion of the second processed audio signal; and

directional data indicating at least one of a number of times speech has been identified from the first direction in previously processed audio signals or a number of times speech has been identified from the second direction in the previously processed audio signals; and

selecting, based at least in part on the direction of interest, the first processed audio signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for tailoring beamforming techniques to environments such that processing resources may be devoted to a portion of an audio signal corresponding to a lobe of a beampattern that is most likely to contain user speech. The techniques take into account both acoustic characteristics of an environment and heuristics regarding lobes that have previously been found to include user speech.

27 Citations

View as Search Results

18 Claims

1. An apparatus comprising:
- one or more processors;
  
  a microphone array; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  generating, based at least in part on sound captured by the microphone array, a plurality of audio signals, wherein each of the plurality of audio signals corresponds to a respective microphone of the microphone array;
  
  processing, by a beamforming component configured with one or more beamforming coefficients, at least a first audio signal of the plurality of audio signals to generate a first processed audio signal, wherein the first processed audio signal corresponds to a first portion of the sound received from a first direction;
  
  processing, by the beamforming component configured with the one or more beamforming coefficients, at least a second audio signal of the plurality of audio signals to generate a second processed audio signal, wherein the second processed audio signal corresponds to a second portion of the sound received from a second direction;
  
  selecting a direction of interest based at least in part on;
  
  an amount of energy associated with a portion of the first processed audio signal;
  
  an amount of energy associated with a portion of the second processed audio signal; and
  
  directional data indicating at least one of a number of times speech has been identified from the first direction in previously processed audio signals or a number of times speech has been identified from the second direction in the previously processed audio signals; and
  
  selecting, based at least in part on the direction of interest, the first processed audio signal.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus according to claim 1, further comprising preparing, based at least in part on selecting the first processed audio signal, the first processed audio signal for automatic speech recognition.
  - 3. The apparatus according to claim 1, further comprising amplifying, based at least in part on selecting the first processed audio signal, the portion of the first processed audio signal, wherein the portion of the first processed audio signal corresponds to speech of a speaker.
  - 4. The apparatus according to claim 1, wherein selecting the direction of interest is further based at least in part on a room impulse response associated with characteristics of an environment in which the apparatus resides.
  - 5. The apparatus according to claim 1, wherein the portion of the first processed audio signal corresponds to a region of gain within the first processed audio signal.

6. A method comprising:
- generating, based at least in part on sound captured by a plurality of microphones, a plurality of audio signals, wherein individual audio signals of the plurality of audio signals corresponds to a respective microphone of the plurality of microphones;
  
  processing, by a beamforming component configured with one or more beamforming coefficients, at least a first audio signal of the plurality of audio signals;
  
  generating, based at least in part on processing at least the first audio signal, a first processed audio signal corresponding to a first portion of the sound received from a first direction;
  
  processing, by the beamforming component configured with the one or more beamforming coefficients, at least a second audio signal of the plurality of audio signals;
  
  generating, based at least in part on processing at least the second audio signal, a second processed audio signal corresponding to a second portion of the sound received from a second direction;
  
  selecting a direction of interest based at least in part on;
  
  an amount of energy associated with a portion of the first processed audio signal;
  
  an amount of energy associated with a portion of the second processed audio signal; and
  
  directional data indicating at least one of a number of times speech has been identified from the first direction in previously processed audio signals or a number of times speech has been identified from the second direction in the previously processed audio signals; and
  
  selecting, based at least in part on selecting the direction of interest, the first processed audio signal.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The method according to claim 6, further comprising preparing, based at least in part on selecting the first processed audio signal, the first processed audio signal for automatic speech recognition.
  - 8. The method according to claim 6, further comprising amplifying the portion of the first processed audio signal based at least in part on selecting the first processed audio signal, wherein the portion of the first processed audio signal corresponds to speech.
  - 9. The method according to claim 6, further comprising determining a room impulse response associated with acoustic characteristics of a room within which the sound is received, and wherein selecting the direction of interest is further based at least in part on the room impulse response.
  - 10. The method according to claim 6, wherein the processing at least the first audio signal comprises processing at least the first audio signal and the second audio signal, and wherein the generating the first processed audio signal comprises generating the first processed audio signal based at least in part on the processing of at least the first audio signal and the second audio signal.
  - 11. The method according to claim 6, further comprising providing attenuation in the second direction.

12. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- generating, based at least in part on sound captured by a plurality of microphones within an environment, a plurality of audio signals, wherein individual audio signals of the plurality of audio signals corresponds to a respective microphone of the plurality of microphones;
  
  processing, by a beamforming component configured with one or more beamforming coefficients, a first audio signal of the plurality of audio signals;
  
  generating, based at least in part on processing the first audio signal, a first processed audio signal corresponding to a first portion of the sound received from a first direction within the environment;
  
  processing, by the beamforming component configured with the one or more beamforming coefficients, a second audio signal of the plurality of audio signals;
  
  generating, based at least in part on processing the second audio signal, a second processed audio signal corresponding to a second portion of the sound received from a second direction within the environment;
  
  selecting a direction within the environment based at least in part on;
  
  an amount of energy associated with a portion of the first processed audio signal;
  
  an amount of energy associated with a portion of the second processed audio signal; and
  
  directional data indicating at least one of a number of times speech has been identified from the first direction in previously processed audio signals or a number of times speech has been identified from the second direction in the previously processed audio signals; and
  
  selecting, based at least in part on selecting the direction within the environment, the first processed audio signal.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The one or more non-transitory computer-readable media according to claim 12, further comprising preparing, based at least in part on selecting the first processed audio signal, the first processed audio signal for automatic speech recognition.
  - 14. The one or more non-transitory computer-readable media according to claim 12, further comprising amplifying the portion of the first processed audio signal based at least in part on selecting the first processed audio signal, wherein the portion of the first processed audio signal corresponds to speech of a speaker.
  - 15. The one or more non-transitory computer-readable media according to claim 12, further comprising determining an impulse response associated with acoustic characteristics of the environment in which the sound is received, and wherein selecting the direction is further based at least in part on the room impulse response.
  - 16. The one or more non-transitory computer-readable media according to claim 12, wherein at least one of the amount of energy associated with the portion of the first processed audio signal or the amount of energy associated with the portion of the second processed audio signal is greater than a threshold energy level.
  - 17. The one or more non-transitory computer-readable media according to claim 12, wherein the directional data further indicates a most recently selected direction within the environment corresponding to user speech.
  - 18. The one or more non-transitory computer-readable media according to claim 12, wherein the directional data further indicates a direction corresponding to user speech within the environment selected a threshold percentage of time, and wherein selecting the direction of interest is further based at least in part on the first direction or the second direction being the direction selected the threshold percentage of time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory Michael, Velusamy, Kavitha, Worley, III, William Spencer
Primary Examiner(s)
Jackson, Jakieda R

Application Number

US15/583,182
Time in Patent Office

701 Days
Field of Search

704233
US Class Current
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 2021/02082   the noise being echo, rever...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

G10L 21/0232   Processing in the frequency...

G10L 25/51   for comparison or discrimin...

H04R 2420/09   Applications of special con...

H04R 3/005   for combining the signals o...

Tailoring beamforming techniques to environments

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Tailoring beamforming techniques to environments

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links