Tailoring beamforming techniques to environments
First Claim
Patent Images
1. An apparatus comprising:
- one or more processors;
a speaker;
a microphone array; and
one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
instructing the speaker to emit a known sound in an environment;
generating a first audio signal representing at least sound of the known sound reflected from the environment and captured by the microphone array;
comparing characteristics of the known sound to characteristics of the reflected sound representing in the first audio signal to determine an acoustic characteristic of the environment;
generating a second audio signal based on sound uttered by a user in the environment and captured by the microphone array;
applying a set of beamformer coefficients to the second audio signal to generate a processed audio signal representing a beampattern, the beampattern having multiple lobes each focused on a region within the environment;
determining which of the multiple lobes correspond to regions of the environment from which speech has previously been found to originate from;
selecting a lobe of the multiple lobes based at least in part on an amount of energy associated with the lobe, the acoustic characteristic of the environment, and whether previously processed audio signals associated with the lobe have previously been selected; and
preparing the processed audio signal for automatic speech recognition (ASR) based at least in part on the selecting.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for tailoring beamforming techniques to environments such that processing resources may be devoted to a portion of an audio signal corresponding to a lobe of a beampattern that is most likely to contain user speech. The techniques take into account both acoustic characteristics of an environment and heuristics regarding lobes that have previously been found to include user speech.
-
Citations
22 Claims
-
1. An apparatus comprising:
-
one or more processors; a speaker; a microphone array; and one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising; instructing the speaker to emit a known sound in an environment; generating a first audio signal representing at least sound of the known sound reflected from the environment and captured by the microphone array; comparing characteristics of the known sound to characteristics of the reflected sound representing in the first audio signal to determine an acoustic characteristic of the environment; generating a second audio signal based on sound uttered by a user in the environment and captured by the microphone array; applying a set of beamformer coefficients to the second audio signal to generate a processed audio signal representing a beampattern, the beampattern having multiple lobes each focused on a region within the environment; determining which of the multiple lobes correspond to regions of the environment from which speech has previously been found to originate from; selecting a lobe of the multiple lobes based at least in part on an amount of energy associated with the lobe, the acoustic characteristic of the environment, and whether previously processed audio signals associated with the lobe have previously been selected; and preparing the processed audio signal for automatic speech recognition (ASR) based at least in part on the selecting. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
under control of one or more computing systems configured with executable instructions, measuring an acoustic characteristic of an environment, the measuring comprising; instructing a speaker to emit a known sound in the environment; capturing reflected sound, the reflected sound corresponding to reflection of the known sound by the environment; and comparing characteristics of the known sound to characteristics of the reflected sound at least in part to identify the acoustic characteristic; selecting a portion of an audio signal, the portion corresponding to a lobe of a beampattern and the selecting based at least in part on;
(1) an amount of energy associated with the lobe of the beampattern;
(2) the acoustic characteristic of the environment, and (3) whether previous portions of audio signals corresponding to the lobe of the beampattern have been previously selected for enhancement; andenhancing the portion of the audio signal corresponding to the lobe to increase a signal-to-noise (SNR) ratio of the portion of the audio signal. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
-
measuring a value of an acoustic characteristic of an environment, the measuring comprising; instructing a speaker to emit a known sound in the environment; capturing reflected sound, the reflected sound corresponding to reflection of the known sound by the environment; and comparing characteristics of the known sound to characteristics of the reflected sound at least in part to identify the acoustic characteristic; capturing speech uttered by a user within the environment; generating an audio signal that includes the speech; processing the audio signal by applying a set of beamformer coefficients to the audio signal to generate a processed audio signal that represents a beampattern, the beampattern having multiple lobes each focused on a region within the environment;
determining a portion of the processed audio signal corresponding to one or more lobes of the beampattern focused on one or more regions from which user speech was previously determined to have originated from;comparing the value of the acoustic characteristic to a previously measured value of the acoustic characteristic of the environment; and at least partly in response to determining that the value and the previously measured value differ by less than a threshold amount, applying relatively more processing resources to the portion of the processed audio signal and relatively less processing resources to a remainder of the processed audio signal. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
Specification