Neural network based beam selection
First Claim
1. A computer-implemented method comprising:
- capturing, during a first time period, first audio using a first microphone;
capturing, during the first time period, second audio using a second microphone;
determining first audio data corresponding to the first audio;
determining second audio data corresponding to the second audio;
determining, using at least the first audio data and the second audio data, third audio data corresponding to a first direction;
determining, using at least the first audio data and the second audio data, fourth audio data corresponding to a second direction; and
processing the third audio data and the fourth audio data with a neural network classifier to determine that the third audio data better represents speech than does the fourth audio data.
1 Assignment
0 Petitions
Accused Products
Abstract
A neural network model, such as a deep neural network (DNN), is trained using many speech examples to perform beam selection in a microphone array-based speech processing system. The DNN is trained using many different speech examples that are labeled with position or direction information relative to a training microphone array. The DNN may then be trained to recognize a direction of incoming speech so that at runtime the trained DNN may process input audio data from a microphone array and may output to a beam selector an indicator of the desired beam that may be selected for further processing. The DNN may be configured to output a beam index and/or coordinates (or other position data) corresponding to an estimated location of the detected speech. The DNN may also be configured to output acoustic unit data corresponding to speech units (for example corresponding to phonemes, senons, etc. such as those of a detected wakeword or other word).
13 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
capturing, during a first time period, first audio using a first microphone; capturing, during the first time period, second audio using a second microphone; determining first audio data corresponding to the first audio; determining second audio data corresponding to the second audio; determining, using at least the first audio data and the second audio data, third audio data corresponding to a first direction; determining, using at least the first audio data and the second audio data, fourth audio data corresponding to a second direction; and processing the third audio data and the fourth audio data with a neural network classifier to determine that the third audio data better represents speech than does the fourth audio data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
at least one processor; at least one memory including instructions that, when executed by the at least one processor, cause the system to; capture, during a first time period, first audio using a first microphone; capture, during the first time period, second audio using a second microphone; determine first audio data corresponding to the first audio; determine second audio data corresponding to the second audio; determine, using at least the first audio data and the second audio data, third audio data corresponding to a first direction; determine, using at least the first audio data and the second audio data, fourth audio data corresponding to a second direction; and process the third audio data and the fourth audio data with a neural network classifier to determine that the third audio data better represents speech than does the fourth audio data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification