Direction based end-pointing for speech recognition
First Claim
Patent Images
1. A computing device for performing speech recognition on audio received by a microphone array, comprising:
- at least one processor; and
a memory device including instructions which, when executed by the at least one processor, cause the computing device to;
receive a plurality of audio signals from the microphone array;
process the plurality of audio signals to generate a plurality of beamformed signals;
determine a first plurality of characteristics of a first beamformed signal of the plurality of beamformed signals;
detect, using the first plurality of characteristics, that speech is represented in the first beamformed signal;
determine a second plurality of characteristics of a second beamformed signal of the plurality of beamformed signals;
detect, using the second plurality of characteristics, that speech is represented in the second beamformed signal;
select the first beamformed signal, the first beamformed signal corresponding to a first utterance;
cause speech recognition to be performed on the first beamformed signal;
determine an end of the speech represented in the first beamformed signal using the first plurality of characteristics of the first beamformed signal;
perform at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal;
select the second beamformed signal during performance of speech recognition on the first beamformed signal, the second beamformed signal corresponding to a second utterance;
cause speech recognition to be performed on the second beamformed signal; and
determine an end of the speech represented in the second beamformed signal using the second plurality of characteristics of the second beamformed signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.
-
Citations
14 Claims
-
1. A computing device for performing speech recognition on audio received by a microphone array, comprising:
-
at least one processor; and a memory device including instructions which, when executed by the at least one processor, cause the computing device to; receive a plurality of audio signals from the microphone array; process the plurality of audio signals to generate a plurality of beamformed signals; determine a first plurality of characteristics of a first beamformed signal of the plurality of beamformed signals; detect, using the first plurality of characteristics, that speech is represented in the first beamformed signal; determine a second plurality of characteristics of a second beamformed signal of the plurality of beamformed signals; detect, using the second plurality of characteristics, that speech is represented in the second beamformed signal; select the first beamformed signal, the first beamformed signal corresponding to a first utterance; cause speech recognition to be performed on the first beamformed signal; determine an end of the speech represented in the first beamformed signal using the first plurality of characteristics of the first beamformed signal; perform at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal; select the second beamformed signal during performance of speech recognition on the first beamformed signal, the second beamformed signal corresponding to a second utterance; cause speech recognition to be performed on the second beamformed signal; and determine an end of the speech represented in the second beamformed signal using the second plurality of characteristics of the second beamformed signal. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
receiving a plurality of audio signals from a microphone array; processing the plurality of audio signals to generate a plurality of beamformed signals, the plurality of beamformed signals comprising a first beamformed signal corresponding to a first utterance and a second beamformed signal corresponding to a second utterance, wherein the first utterance and second utterance are overlapping in time; determining a first plurality of characteristics of the first beamformed signal; detecting, using the first plurality of characteristics, that speech is represented in the first beamformed signal; determining a second plurality of characteristics of the second beamformed signal; detecting, using the second plurality of characteristics, that speech is represented in the second beamformed signal; causing speech recognition to be performed on the first beamformed signal; performing at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal; selecting the second beamformed signal during performance of speech recognition on the first beamformed signal; and causing speech recognition to be performed on the second beamformed signal. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A computing device, comprising:
-
at least one processor; and a memory device including instructions which, when executed by the at least one processor, cause the computing device to; receive a plurality of audio signals from a microphone array; process the plurality of audio signals to generate a plurality of beamformed signals, the plurality of beamformed signals comprising a first beamformed signal corresponding to a first utterance and a second beamformed signal corresponding to a second utterance, wherein the first utterance and second utterance are overlapping in time; determine a first plurality of characteristics of the first beamformed signal; detect, using the first plurality of characteristics, that speech is represented in the first beamformed signal; determine a second plurality of characteristics of the second beamformed signal; detect, using the second plurality of characteristics, that speech is represented in the second beamformed signal; cause speech recognition to be performed on the first beamformed signal; perform at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal; select the second beamformed signal during performance of speech recognition on the first beamformed signal; and cause speech recognition to be performed on the second beamformed signal. - View Dependent Claims (11, 12, 13, 14)
-
Specification